[christophe.hauser at irisa.fr: Jelinek Mercer Smoothing]
Christophe Hauser
christophe.hauser at irisa.fr
Wed Apr 8 09:51:32 PDT 2009
On Tue, Mar 31, 2009 at 10:26:32AM -0700, Andreas Stolcke wrote:
> An example of the count-lm training procedure is given by
> $SRILM/test/tests/ngram-count-lm/run-test .
>
> Andreas
hello,
I am trying to reproduce some experiments using SRILM.
I would like to apply Jelinek Mercer smoothing, but the perplexity results
I get are very weird : ways more than the results with no smoothing at
all.
Here is what I did :
ngram-count -text training -lm lm -order $order -write-vocab vocab
-write cfile
ngram -ppl test -lm lm -order $order -vocab vocab -unk
file test: 1 sentences, 964 words, 41 OOVs
0 zeroprobs, logprob= -1445.86 ppl= 36.7102 ppl1= 36.8538
Then, if I use Jelinek Mercer smoothing
cat >countlm <<EOF
countmodulus 1
mixweights 0
.5 .5 .5
counts cfile
EOF
ngram -count-lm -lm lmsmooth -order $order -ppl test -unk -vocab vocab
file test: 1 sentences, 964 words, 41 OOVs
0 zeroprobs, logprob= -2948.76 ppl= 1553.45 ppl1= 1565.86
The smoothed model perplexity over the test set is very high. Is there
something I did wrong ?
I expected to get something around 5 bits/symbol on this test.
Also, I am not sure how to interpret perplexity results using SRILM : as
OOvs / zeroprobs are discarded, model estimation without smoothing gives finite
perplexity results where it is actually infinite. It is confusing
me, especially in the case where smoothing techniques are used : how to
accurately mesure smoothing benefits ?
PS : does -gtmin 0 -gtmax 0 totally disable discounting ?
Many thanks.
Kind regards,
--
Christophe
More information about the SRILM-User
mailing list