command line for make-big-lm

Andreas Stolcke stolcke at speech.sri.com
Sun Jun 1 19:01:45 PDT 2008


In message <E1A7F2EA-516F-4B03-AF0E-36B4EAC28BAF at gmail.com>you wrote:
> I'm studying training-scripts to estimate a big LM for modified Kneser- 
> Ney.  Will this do the job:
> 
>   make-big-lm -name my-kn-model -read my.counts.gz -max-per-file  
> 10000000 -kndiscount 5

> -- is -kndiscount all what's needed to trigger KN estimation?  And the  
> number is the maximum order N, i.e. we don't need to repeat it from 1  
> up to N, like -kndiscount 1, -kndiscount 2, ...?

Not quite:  use

	-kndiscount -order 5

> -- also, how do I estimate -max-per-file for 16 GB RAM and 5-grams?

It really depends on your data, so it's hard to predict.
10000000 is the default actually, and 16GB is quite a bit of memory, so 
you should have no problem.

Andreas 




More information about the SRILM-User mailing list