[SRILM User List] Question about SRILM and sentence boundary detection
stolcke at icsi.berkeley.edu
Tue Feb 14 08:41:01 PST 2012
On 2/14/2012 4:54 AM, L. Amber Wilcox-O'Hearn wrote:
> I see. I misunderstood the difference between -ppl and -counts.
> I did try this and the summary statistics at the end gave the correct
> sum, but there weren't any statistics output before the escaped lines:
>> cat testcounts | ngram -lm LM -escape "===" -counts - -unk
> file -: 0 sentences, 4 words, 0 OOVs
> 0 zeroprobs, logprob= -9.87606 ppl= 294.452 ppl1= 294.452
> Did I miss something?
This is poorly documented. The escape lines trigger output of
"sentence level" statistics. At the end, you get the "file level"
However, to be compatible with -ppl, sentence level stats are only
output with -debug 1 or higher. So your example will work as long as
you also add -debug 1.
More information about the SRILM-User