running time estimate of -lm

Andreas Stolcke stolcke at speech.sri.com
Tue Nov 6 00:54:50 PST 2007


Also, it isn't clear from the original message if counts were produced 
beforehand, or if ngram-count is in fact invoked directly on the 
billion-word corpus.  In that case it's no wonder it takes forever,
since it is probably paging itself to death.

Use make-batch-counts/merge-batch-counts, and make-big-lm as explained 
in the training-scripts(1) man page.

--Andreas

In message <102682.35796.qm at web25401.mail.ukl.yahoo.com>you wrote:
> Hi,
> 
> It's really worth using make-big-lm script (documented
> in training-scripts section of the manual) for
> training such huge models.
> 
> Ilya
> 
> --- Alexy Khrabrov <deliverable at gmail.com> wrote:
> 
> > I've launched ngram-count -order 2 -lm with a 1
> > billion word corpus a  
> > few days ago, and it's still going, after 4,600
> > minutes of CPU time  
> > (2.66 GHz Xeon 64-bit).  Originally it took about 8
> > GB of RAM, then  
> > decreased by about 25%, now is climbing back.  What
> > is the overall  
> > running time estimate of -lm without any other
> > options?  Simple runs  
> > for about 15 million words finished in about 15
> > minutes.
> > 
> > Cheers,
> > Alexy
> > 
> 
> 
> best regards,
> Ilya
> 
> 
>       ___________________________________________________________
> Yahoo! Answers - Got a question? Someone out there knows the answer. Try it
> now.
> http://uk.answers.yahoo.com/




More information about the SRILM-User mailing list