[SRILM User List] question about using the Google Web N-gram corpus to build an LM

HU Rile londis at 163.com
Thu Aug 15 00:15:31 PDT 2013


Hi,

I would like to build an LM using the Google Web 1T corpus. And I followed the steps on http://www-speech.sri.com/projects/srilm/manpages/srilm-faq.7.html. But when I used ngram-count to estimate the mixture weights, the program can not run and gave the response "google.countlm.0: line 22: reached EOF before \end\
format error in init-lm file".
I tried to add \end\ to the end of google.countlm.0, but it did not work.
Here is the content of my google.countlm.0: 
order 3
vocabsize 13588391
totalcount 1024908267229
countmodulus 40
mixweights 15
 0.5 0.5 0.5
 0.5 0.5 0.5
 0.5 0.5 0.5
 0.5 0.5 0.5
 0.5 0.5 0.5
 0.5 0.5 0.5
 0.5 0.5 0.5
 0.5 0.5 0.5
 0.5 0.5 0.5
 0.5 0.5 0.5
 0.5 0.5 0.5
 0.5 0.5 0.5
 0.5 0.5 0.5
 0.5 0.5 0.5
 0.5 0.5 0.5
 0.5 0.5 0.5
google-counts /home/hurile/googleweb1T/googleLM/


Could someone please tell me how can i solve the problem? Thanks a lot!


Rile Hu




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20130815/72d04ed6/attachment.html>


More information about the SRILM-User mailing list