running time estimate of -lm
Alexy Khrabrov
deliverable at gmail.com
Tue Nov 6 04:34:31 PST 2007
Indeed, the counts were not precomputed. However there's enough
memory, and ngram-count never used even a half of RAM yet with the
bigrams of a billion word corpus . No paging at all... Is there a
hope it'll end after a few days, or I'll have to redo it following
training-scripts(1)?
Cheers,
Alexy
On Nov 6, 2007, at 11:54 AM, Andreas Stolcke wrote:
>
> Also, it isn't clear from the original message if counts were produced
> beforehand, or if ngram-count is in fact invoked directly on the
> billion-word corpus. In that case it's no wonder it takes forever,
> since it is probably paging itself to death.
>
> Use make-batch-counts/merge-batch-counts, and make-big-lm as explained
> in the training-scripts(1) man page.
>
> --Andreas
>
> In message <102682.35796.qm at web25401.mail.ukl.yahoo.com>you wrote:
>> Hi,
>>
>> It's really worth using make-big-lm script (documented
>> in training-scripts section of the manual) for
>> training such huge models.
>>
>> Ilya
>>
>> --- Alexy Khrabrov <deliverable at gmail.com> wrote:
>>
>>> I've launched ngram-count -order 2 -lm with a 1
>>> billion word corpus a
>>> few days ago, and it's still going, after 4,600
>>> minutes of CPU time
>>> (2.66 GHz Xeon 64-bit). Originally it took about 8
>>> GB of RAM, then
>>> decreased by about 25%, now is climbing back. What
>>> is the overall
>>> running time estimate of -lm without any other
>>> options? Simple runs
>>> for about 15 million words finished in about 15
>>> minutes.
>>>
>>> Cheers,
>>> Alexy
>>>
>>
>>
>> best regards,
>> Ilya
>>
>>
>> ___________________________________________________________
>> Yahoo! Answers - Got a question? Someone out there knows the
>> answer. Try it
>> now.
>> http://uk.answers.yahoo.com/
>
More information about the SRILM-User
mailing list