running time estimate of -lm
Andreas Stolcke
stolcke at speech.sri.com
Tue Nov 6 00:54:50 PST 2007
Also, it isn't clear from the original message if counts were produced
beforehand, or if ngram-count is in fact invoked directly on the
billion-word corpus. In that case it's no wonder it takes forever,
since it is probably paging itself to death.
Use make-batch-counts/merge-batch-counts, and make-big-lm as explained
in the training-scripts(1) man page.
--Andreas
In message <102682.35796.qm at web25401.mail.ukl.yahoo.com>you wrote:
> Hi,
>
> It's really worth using make-big-lm script (documented
> in training-scripts section of the manual) for
> training such huge models.
>
> Ilya
>
> --- Alexy Khrabrov <deliverable at gmail.com> wrote:
>
> > I've launched ngram-count -order 2 -lm with a 1
> > billion word corpus a
> > few days ago, and it's still going, after 4,600
> > minutes of CPU time
> > (2.66 GHz Xeon 64-bit). Originally it took about 8
> > GB of RAM, then
> > decreased by about 25%, now is climbing back. What
> > is the overall
> > running time estimate of -lm without any other
> > options? Simple runs
> > for about 15 million words finished in about 15
> > minutes.
> >
> > Cheers,
> > Alexy
> >
>
>
> best regards,
> Ilya
>
>
> ___________________________________________________________
> Yahoo! Answers - Got a question? Someone out there knows the answer. Try it
> now.
> http://uk.answers.yahoo.com/
More information about the SRILM-User
mailing list