[SRILM User List] question about using the Google Web N-gram corpus to build an LM
HU Rile
londis at 163.com
Thu Aug 15 00:15:31 PDT 2013
Hi,
I would like to build an LM using the Google Web 1T corpus. And I followed the steps on http://www-speech.sri.com/projects/srilm/manpages/srilm-faq.7.html. But when I used ngram-count to estimate the mixture weights, the program can not run and gave the response "google.countlm.0: line 22: reached EOF before \end\
format error in init-lm file".
I tried to add \end\ to the end of google.countlm.0, but it did not work.
Here is the content of my google.countlm.0:
order 3
vocabsize 13588391
totalcount 1024908267229
countmodulus 40
mixweights 15
0.5 0.5 0.5
0.5 0.5 0.5
0.5 0.5 0.5
0.5 0.5 0.5
0.5 0.5 0.5
0.5 0.5 0.5
0.5 0.5 0.5
0.5 0.5 0.5
0.5 0.5 0.5
0.5 0.5 0.5
0.5 0.5 0.5
0.5 0.5 0.5
0.5 0.5 0.5
0.5 0.5 0.5
0.5 0.5 0.5
0.5 0.5 0.5
google-counts /home/hurile/googleweb1T/googleLM/
Could someone please tell me how can i solve the problem? Thanks a lot!
Rile Hu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20130815/72d04ed6/attachment.html>
More information about the SRILM-User
mailing list