-read-google

Andreas Stolcke stolcke at speech.sri.com
Fri Sep 5 07:48:34 PDT 2008


Mirjam Sepesy Maučec wrote:
> Hi,
>  
> I have my counts in Google directory structure (by make-google-ngrams).
> I would like to use make-big-lm (bacause ngram-count runs out of memory),
> but the script expects the switch -read (not -read-google)?
Mirjam,

I believe this mailing list is meant for users of the CMU-Cambridge SLM 
toolkit, but your question is obviously about SRILM.
Please join the srilm-user mailing list and ask your SRILM questions 
there (see http://www.speech.sri.com/projects/srilm/#srilm-user for 
instructions).

Regarding your question:  make-big-lm does not support the -read-google 
option because its approach is incompatible with the google directory 
structure.
However, you could enumerate all the count files under the google 
directory, prepend "-read" to each, and give that long string of 
arguments to make-big-lm.

    make-big-lm `find  /path/to/google-ngrams/data -name \*.gz  \! -name 
\*_cs.gz  | xargs -n 1 echo "-read" `  other-options ....

assuming your OS allows command lines this long.

Andreas

>  
> Thanks,
> Mirjam





More information about the SRILM-User mailing list