[SRILM User List] How to train LM fast with large corpus

Andreas Stolcke stolcke at icsi.berkeley.edu
Wed Jul 18 05:13:23 PDT 2012


On 7/18/2012 3:40 AM, Meng Chen wrote:
> Hi, I want to ask how to train N-gram language model with SRILM if the 
> corpus is very large (100GB). Should I still use the command of 
> *ngram-count*? Or use *make-big-lm* instead? I also want to know if 
> there is any limitation of training corpus in vocabulary and size with 
> SRILM?
> Thanks!
Definitely make-big-lm.   Read the FAQ on handling large data.  You are 
limited by computer memory but it is not possible to give a hard limit, 
it depends on the properties of your data.

Andreas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20120718/b6f66e62/attachment.html>


More information about the SRILM-User mailing list