[SRILM User List] Question about SRILM and sentence boundary detection

L. Amber Wilcox-O'Hearn amber.wilcox.ohearn at gmail.com
Tue Feb 14 11:20:13 PST 2012


On Tue, Feb 14, 2012 at 9:41 AM, Andreas Stolcke
<stolcke at icsi.berkeley.edu> wrote:
> On 2/14/2012 4:54 AM, L. Amber Wilcox-O'Hearn wrote:
>>
>> I see.   I misunderstood the difference between -ppl and -counts.
>>
>> I did try this and the summary statistics at the end gave the correct
>> sum, but there weren't any statistics output before the escaped lines:
>>>
>>> cat testcounts | ngram -lm LM -escape "===" -counts - -unk
>>
>> ===
>> ===
>> ===
>> file -: 0 sentences, 4 words, 0 OOVs
>> 0 zeroprobs, logprob= -9.87606 ppl= 294.452 ppl1= 294.452
>>
>> Did I miss something?
>
> This is poorly documented.   The escape lines trigger output of "sentence
> level"  statistics.  At the end, you get the "file level" statistics.
> However, to be compatible with -ppl, sentence level stats are only output
> with -debug 1 or higher.  So your example will work as long as you also add
> -debug 1.

Ah, perfect.  Thank you very much!

-Amber



More information about the SRILM-User mailing list