[SRILM User List] FLM Training takes too long!

Melvin Jose jmelvinjose73 at yahoo.com
Sun Oct 28 17:24:08 PDT 2012




Hey,

    I am presently working with Tamil - a morphologically rich language. I am trying to build an FLM with approximately 3 million entires but it seems to take more than a day and a half now. The FLM specification is

W : W(-1) W(-2) B(-1) S(-1) using generalized backoff. where B is word-base and S is suffix.


Below is the output of -debug 2


warning: distributing 0.0989813 left-over probability mass over all 577519 words
discarded 1 0x4-gram probs predicting pseudo-events
discarded 1587186 0x4-gram probs discounted to zero
discarded 1 0x8-gram probs predicting pseudo-events
discarded 1 0xc-gram probs predicting pseudo-events
discarded 4721615 0xc-gram probs discnounted to zero
Starting estimation of general graph-backoff node: LM 0 Node 0xC, children: 0x8 0x4
Finished estimation of multi-child graph-backoff node: LM 0 Node 0xC

This was the last message I received a day and a half ago. Is it normal for it to take soo long? I read that Katrin had no problem training on 5 million entries. Did it take so long? I am using a cluster in my lab to do the computation, so there shouln't be a problem with memory and computational power.

Is there any way by which I cantell the fngram-count to utilize as much memory as it wants or parallelize the computation?


Thanks,
Melvin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20121028/6c77c1c6/attachment.html>


More information about the SRILM-User mailing list