billion word -lm finished

Alexy Khrabrov deliverable at gmail.com
Tue Nov 6 09:58:20 PST 2007


CPU-optimized (right after make World) -- these are 1.5.4b binaries  
from one day before 1.5.4 release.  Compiled with -march=nocona - 
mtune=nocona for the Xeons.  Did it time'd:

% time cat list | xargs cat | ngram-count -text - -order 2 -lm model-1
warning: discount coeff 1 is out of range: 0
cat list  0,00s user 0,00s system 0% cpu 6:41,85 total
xargs cat  0,66s user 15,31s system 2% cpu 11:23,54 total
ngram-count -text - -order 2 -lm model-1  350025,89s user 91,27s  
system 100% cpu 96:52:30,83 total

BTW, is the warning expected?  Am always getting it with simple -lm  
from scratch.
Cheers,
Alexy

On Nov 6, 2007, at 8:28 PM, Andreas Stolcke wrote:

>
> What version of the binaries did you use ?
> Cpu or space-optimized (_c) ?
>
> It would have been good to run this with the unix "time" command
> to get real and cpu time statistics.
>
> --Andreas
>
> In message <8EFD533C-A2E3-43A2-BAB1-6B3BC5804E0E at gmail.com>you wrote:
>> I'm glad to report that the full -lm model of -order 2 over a billion
>> words builds from scratch in about 100 CPU hours!
>>
>> Cheers,
>> Alexy
>




More information about the SRILM-User mailing list