[SRILM User List] Usage of make-big-lm and -interpolate option

Sun Jun 15 19:07:24 PDT 2014

On 06/13/2014 12:16 PM, Stefan Fischer wrote:
> Hello,
>
> I read that using make-big-lm is preferable to using ngram-count directly.
> Even though my corpus is not very big, how do I switch from
> ngram-count to make-big-lm?
>
> This is what I'm using so far:
>    ngram-count -order 3 -kndiscount -interpolate -unk -text
> training.txt -vocab at_least_twice.txt -lm lm.arpa
>
> Is this the right way to use make-big-lm?
> Do I have to pass more options to ngram-count if am only interested in counts?
>    ngram-count -write counts.gz -text training.txt
>    make-big-lm -read counts.gz -order 3 -kndiscount -interpolate -unk
> -text training.txt -vocab at_least_twice.txt -lm lm.arpa
You did it right.

>
> My second question is w.r.t. to the -interpolate option.
> I get the following warning several times:
>    warning: 2.01524e-06 backoff probability mass left for ". dunno" --
> disabling interpolation
> Is this just for my informtion or is it a sign of using bad parameters?
It's just for information.  Sometimes there is no backoff probability 
mass left for lower-order ngram estimates, and it doesn't make sense to 
apply interpolation in that case, so the code falls back on standard KN 
smoothing (without interpolation).

Andreas