[SRILM User List] question about using the Google Web N-gram corpus to build an LM
Andreas Stolcke
stolcke at icsi.berkeley.edu
Thu Aug 15 14:15:14 PDT 2013
On 8/15/2013 12:15 AM, HU Rile wrote:
> Hi,
> I would like to build an LM using the Google Web 1T corpus. And I
> followed the steps on
> http://www-speech.sri.com/projects/srilm/manpages/srilm-faq.7.html.
> But when I used ngram-count to estimate the mixture weights, the
> program can not run and gave the response "google.countlm.0: line 22:
> reached EOF before \end\
> format error in init-lm file".
> I tried to add \end\ to the end of googl! e.countlm.0, but it did not
> work.
> Here is the content of my google.countlm.0:
> order 3
> vocabsize 13588391
> totalcount 1024908267229
> countmodulus 40
> mixweights 15
> 0.5 0.5 0.5
> 0.5 0.5 0.5
> 0.5 0.5 0.5
> 0.5 0.5 0.5
> 0.5 0.5 0.5
> 0.5 0.5 0.5
> 0.5 0.5 0.5
> 0.5 0.5 0.5
> 0.5 0.5 0.5
> 0.5 0.5 0.5
> 0.5 0.5 0.5
> 0.5 0.5 0.5
> 0.5 0.5 0.5
> 0.5 0.5 0.5
> 0.5 0.5 0.5
> 0.5 0.5 0.5
> google-counts /home/hurile/googleweb1T/google! LM/
>
> Could someone please tell me how can i so lve the problem? Thanks a lot!
>
> Rile Hu
>
You probably forgot the -count-lm option. Without it, ngram-count will
try to interpret the -lm file as a standard ngram LM (where the \end\
line is expected).
Andreas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20130815/9318d1c5/attachment.html>
More information about the SRILM-User
mailing list