[SRILM] FLM model training on large data

Sun Oct 22 10:50:56 PDT 2006

Hi, everybody!

Does anyone have any experience of building a Factored Language Model on large data? There is still no problem with, say, processing a file in FLM format containing 5 mln entries, but as far as I try to feed a 50 mln FLM corpus, it needs unfeasible amount of memory (since it loads everything in memory). 

Does anyone know if there are any tricks how to train an FLM model in this case? Something like building partial LMs and then merging with standard ngram-count... What could you suggest as a solution?

best regards,
Ilya

---------------------------------
 Try the all-new Yahoo! Mail . "The New Version is radically easier to use" – The Wall Street Journal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20061022/c8ec80fe/attachment.html>