[SRILM]: FLM

Sun May 21 13:34:22 PDT 2006

Hello,

I've been recently playing with the factored language models for the Czech language. The FLM module works perfectly with small subcorpora. However, when I try to train the model even on my heldout data (60 mln tokens), it takes huge amount of time to get the model trained (by now it's been two days I have it running). Memory problems can expected as well. So, there is almost no sense in trying to train LM on my training data (550 mln).
The question is: does anybody have experience in training FLMs on huge corpora: parallelizing tasks and so on? There is no direct way as with normal models (ngram-merge and make-big-lm features) - but are there some indirect ones?

thanks in advance,
ilya

Send instant messages to your online friends http://uk.messenger.yahoo.com 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20060521/17e8e9eb/attachment.html>