[SRILM User List] How to train LM fast with large corpus
Andreas Stolcke
stolcke at icsi.berkeley.edu
Wed Jul 18 05:13:23 PDT 2012
On 7/18/2012 3:40 AM, Meng Chen wrote:
> Hi, I want to ask how to train N-gram language model with SRILM if the
> corpus is very large (100GB). Should I still use the command of
> *ngram-count*? Or use *make-big-lm* instead? I also want to know if
> there is any limitation of training corpus in vocabulary and size with
> SRILM?
> Thanks!
Definitely make-big-lm. Read the FAQ on handling large data. You are
limited by computer memory but it is not possible to give a hard limit,
it depends on the properties of your data.
Andreas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20120718/b6f66e62/attachment.html>
More information about the SRILM-User
mailing list