[christophe.hauser at irisa.fr: Jelinek Mercer Smoothing]

Wed Apr 8 09:51:32 PDT 2009

On Tue, Mar 31, 2009 at 10:26:32AM -0700, Andreas Stolcke wrote:

> An example of the count-lm training procedure is given by  
> $SRILM/test/tests/ngram-count-lm/run-test .
>
> Andreas

hello,

I am trying to reproduce some experiments using SRILM.
I would like to apply Jelinek Mercer smoothing, but the perplexity results
I get are very weird : ways more than the results with no smoothing at
all. 

Here is what I did :

ngram-count -text training -lm lm -order $order  -write-vocab vocab
-write cfile 

ngram -ppl test -lm lm -order $order -vocab vocab -unk

file test: 1 sentences, 964 words, 41 OOVs
0 zeroprobs, logprob= -1445.86 ppl= 36.7102 ppl1= 36.8538

Then, if I use Jelinek Mercer smoothing

cat >countlm <<EOF
countmodulus 1
mixweights 0
.5 .5 .5 
counts cfile
EOF

ngram -count-lm -lm lmsmooth -order $order -ppl test -unk -vocab vocab

file test: 1 sentences, 964 words, 41 OOVs
0 zeroprobs, logprob= -2948.76 ppl= 1553.45 ppl1= 1565.86

The smoothed model perplexity over the test set is very high. Is there
something I did wrong ? 
I expected to get something around 5 bits/symbol on this test.

Also, I am not sure how to interpret perplexity results using SRILM : as
OOvs / zeroprobs are discarded, model estimation without smoothing gives finite
perplexity results where it is actually infinite. It is confusing
me, especially in the case where smoothing techniques are used : how to
accurately mesure smoothing benefits ?

PS : does -gtmin 0 -gtmax 0 totally disable discounting ?

Many thanks.

Kind regards,
-- 
Christophe