running time estimate of -lm

Tue Nov 6 04:34:31 PST 2007

Indeed, the counts were not precomputed.  However there's enough  
memory, and ngram-count never used even a half of RAM yet with the  
bigrams of a billion word corpus .  No paging at all... Is there a  
hope it'll end after a few days, or I'll have to redo it following  
training-scripts(1)?

Cheers,
Alexy

On Nov 6, 2007, at 11:54 AM, Andreas Stolcke wrote:

>
> Also, it isn't clear from the original message if counts were produced
> beforehand, or if ngram-count is in fact invoked directly on the
> billion-word corpus.  In that case it's no wonder it takes forever,
> since it is probably paging itself to death.
>
> Use make-batch-counts/merge-batch-counts, and make-big-lm as explained
> in the training-scripts(1) man page.
>
> --Andreas
>
> In message <102682.35796.qm at web25401.mail.ukl.yahoo.com>you wrote:
>> Hi,
>>
>> It's really worth using make-big-lm script (documented
>> in training-scripts section of the manual) for
>> training such huge models.
>>
>> Ilya
>>
>> --- Alexy Khrabrov <deliverable at gmail.com> wrote:
>>
>>> I've launched ngram-count -order 2 -lm with a 1
>>> billion word corpus a
>>> few days ago, and it's still going, after 4,600
>>> minutes of CPU time
>>> (2.66 GHz Xeon 64-bit).  Originally it took about 8
>>> GB of RAM, then
>>> decreased by about 25%, now is climbing back.  What
>>> is the overall
>>> running time estimate of -lm without any other
>>> options?  Simple runs
>>> for about 15 million words finished in about 15
>>> minutes.
>>>
>>> Cheers,
>>> Alexy
>>>
>>
>>
>> best regards,
>> Ilya
>>
>>
>>       ___________________________________________________________
>> Yahoo! Answers - Got a question? Someone out there knows the  
>> answer. Try it
>> now.
>> http://uk.answers.yahoo.com/
>