billion word -lm finished
deliverable at gmail.com
Tue Nov 6 09:58:20 PST 2007
CPU-optimized (right after make World) -- these are 1.5.4b binaries
from one day before 1.5.4 release. Compiled with -march=nocona -
mtune=nocona for the Xeons. Did it time'd:
% time cat list | xargs cat | ngram-count -text - -order 2 -lm model-1
warning: discount coeff 1 is out of range: 0
cat list 0,00s user 0,00s system 0% cpu 6:41,85 total
xargs cat 0,66s user 15,31s system 2% cpu 11:23,54 total
ngram-count -text - -order 2 -lm model-1 350025,89s user 91,27s
system 100% cpu 96:52:30,83 total
BTW, is the warning expected? Am always getting it with simple -lm
On Nov 6, 2007, at 8:28 PM, Andreas Stolcke wrote:
> What version of the binaries did you use ?
> Cpu or space-optimized (_c) ?
> It would have been good to run this with the unix "time" command
> to get real and cpu time statistics.
> In message <8EFD533C-A2E3-43A2-BAB1-6B3BC5804E0E at gmail.com>you wrote:
>> I'm glad to report that the full -lm model of -order 2 over a billion
>> words builds from scratch in about 100 CPU hours!
More information about the SRILM-User