[SRILM User List] make-big-lm produces different LM than ngram-count
Christian A. Mandery
mail at chrismandery.de
Tue Sep 7 08:59:57 PDT 2010
Hello,
I am trying to use the make-big-lm script in order to get a way of
building modified Kneser-Neys LMs that scale better with larger
corpora.
However, make-big-lm produces different LMs for me than ngram-count
although I am using the same parameters.
Not only probabilities and back-off values differ, also the LM build
with ngram-count countains more {2,3,4}-grams than the LM build with
make-big-lm.
I invoke ngram-count with this parameters:
ngram-count -order 4 -debug 4 -unk -map-unk "<UNK>" -vocab vocab-lm
-gt1min 1 -gt2min 2 -gt3min 2 -gt4min 2 -kndiscount1 -kndiscount2
-kndiscount3 -kndiscount4 -text corpus.gz -lm ngram-count.lm
And make-big-lm:
make-big-lm -read counts -name zzz-make-big-lm -order 4 -debug 4 -unk
-map-unk "<UNK>" -vocab vocab-lm -gt1min 1 -gt2min 2 -gt3min 2 -gt4min
2 -kndiscount1 -kndiscount2 -kndiscount3 -kndiscount4 -lm
make-big-lm.lm
Why are there differences in the generated LM using these two calls?
Best regards
Christian Mandery
PS: counts-new.gz is built using "ngram-count -text corpus.gz -write
counts -order 4 -sort", so nothing should go wrong there.
More information about the SRILM-User
mailing list