[SRILM User List] Usage of make-big-lm and -interpolate option

Andreas Stolcke stolcke at icsi.berkeley.edu
Sun Jun 15 19:07:24 PDT 2014


On 06/13/2014 12:16 PM, Stefan Fischer wrote:
> Hello,
>
> I read that using make-big-lm is preferable to using ngram-count directly.
> Even though my corpus is not very big, how do I switch from
> ngram-count to make-big-lm?
>
> This is what I'm using so far:
>    ngram-count -order 3 -kndiscount -interpolate -unk -text
> training.txt -vocab at_least_twice.txt -lm lm.arpa
>
> Is this the right way to use make-big-lm?
> Do I have to pass more options to ngram-count if am only interested in counts?
>    ngram-count -write counts.gz -text training.txt
>    make-big-lm -read counts.gz -order 3 -kndiscount -interpolate -unk
> -text training.txt -vocab at_least_twice.txt -lm lm.arpa
You did it right.

>
> My second question is w.r.t. to the -interpolate option.
> I get the following warning several times:
>    warning: 2.01524e-06 backoff probability mass left for ". dunno" --
> disabling interpolation
> Is this just for my informtion or is it a sign of using bad parameters?
It's just for information.  Sometimes there is no backoff probability 
mass left for lower-order ngram estimates, and it doesn't make sense to 
apply interpolation in that case, so the code falls back on standard KN 
smoothing (without interpolation).

Andreas



More information about the SRILM-User mailing list