[SRILM User List] Question about SRILM and sentence boundary detection

Andreas Stolcke stolcke at icsi.berkeley.edu
Tue Feb 14 08:41:01 PST 2012

On 2/14/2012 4:54 AM, L. Amber Wilcox-O'Hearn wrote:
> I see.   I misunderstood the difference between -ppl and -counts.
> I did try this and the summary statistics at the end gave the correct
> sum, but there weren't any statistics output before the escaped lines:
>> cat testcounts | ngram -lm LM -escape "===" -counts - -unk
> ===
> ===
> ===
> file -: 0 sentences, 4 words, 0 OOVs
> 0 zeroprobs, logprob= -9.87606 ppl= 294.452 ppl1= 294.452
> Did I miss something?
This is poorly documented.   The escape lines trigger output of 
"sentence level"  statistics.  At the end, you get the "file level" 
However, to be compatible with -ppl, sentence level stats are only 
output with -debug 1 or higher.  So your example will work as long as 
you also add -debug 1.


