stolcke at speech.sri.com
Fri Sep 5 07:48:34 PDT 2008
Mirjam Sepesy Maučec wrote:
> I have my counts in Google directory structure (by make-google-ngrams).
> I would like to use make-big-lm (bacause ngram-count runs out of memory),
> but the script expects the switch -read (not -read-google)?
I believe this mailing list is meant for users of the CMU-Cambridge SLM
toolkit, but your question is obviously about SRILM.
Please join the srilm-user mailing list and ask your SRILM questions
there (see http://www.speech.sri.com/projects/srilm/#srilm-user for
Regarding your question: make-big-lm does not support the -read-google
option because its approach is incompatible with the google directory
However, you could enumerate all the count files under the google
directory, prepend "-read" to each, and give that long string of
arguments to make-big-lm.
make-big-lm `find /path/to/google-ngrams/data -name \*.gz \! -name
\*_cs.gz | xargs -n 1 echo "-read" ` other-options ....
assuming your OS allows command lines this long.
More information about the SRILM-User