[SRILM User List] ngram: sentence boundary markers in text file used with -ppl? [edit]

Andreas Stolcke stolcke at icsi.berkeley.edu
Sat May 25 15:11:40 PDT 2013


On 5/25/2013 1:37 PM, Sander Maijers wrote:
> Hi,
>
> Should one surround the sentences in the sentences file for ngram's 
> '-ppl' with <s> sos and </s> eos tokens? They are in the LM.
>
> I have tested it just now, and it seems that the sentence boundary 
> markers are inferred by ngram when left out, and adopted when put in. 
> Where is this documented?

In the man page 
<http://www.speech.sri.com/projects/srilm/manpages/ngram.1.html>.  The 
relevant options are

        -no-sos
               Disable the automatic insertion of start-of-sentence 
tokens for sentence probability computation.   The
               probability of the initial word is thus computed with an 
empty context.

        -no-eos
               Disable  the  automatic insertion of end-of-sentence 
tokens for sentence probability computation.  End-
               of-sentence is thus excluded from the total probability.


Andreas

>
> Best,
> Sander
> _______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://www.speech.sri.com/mailman/listinfo/srilm-user
> _______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://www.speech.sri.com/mailman/listinfo/srilm-user

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20130525/67c2fdb8/attachment.html>


More information about the SRILM-User mailing list