[SRILM User List] Usage of make-big-lm and -interpolate option

Stefan Fischer sfischer at ymail.com
Fri Jun 13 12:16:05 PDT 2014


I read that using make-big-lm is preferable to using ngram-count directly.
Even though my corpus is not very big, how do I switch from
ngram-count to make-big-lm?

This is what I'm using so far:
  ngram-count -order 3 -kndiscount -interpolate -unk -text
training.txt -vocab at_least_twice.txt -lm lm.arpa

Is this the right way to use make-big-lm?
Do I have to pass more options to ngram-count if am only interested in counts?
  ngram-count -write counts.gz -text training.txt
  make-big-lm -read counts.gz -order 3 -kndiscount -interpolate -unk
-text training.txt -vocab at_least_twice.txt -lm lm.arpa

My second question is w.r.t. to the -interpolate option.
I get the following warning several times:
  warning: 2.01524e-06 backoff probability mass left for ". dunno" --
disabling interpolation
Is this just for my informtion or is it a sign of using bad parameters?


More information about the SRILM-User mailing list