Google language model

Andreas Stolcke stolcke at
Tue Feb 6 17:43:50 PST 2007

In message <200702062003.l16K33Jk028807 at>you wrote:
> Hi Andreas,
> I have been using SRILM for some time now and am interested in using it
> in conjunction with the Google language model.
> >From looking at the documentation and code, I can see that it reads the
> format, but do not see strategies to keep portions of the model in
> memory and others on disk, for example.  Obviously one would need to do
> something like this to hold the entire model.  However, I've also used
> and tweaked enough of the code to know you're a serious hacker, and that
> I might have missed something.
> One thought I had was to point ngram-count to the Google LM, then use a
> word list to filter only the n-grams that I need SRILM to estimate
> probabilities for.  Beyond that, I'm stumped.
> So, can you offer any feedback?  What are some strategies you recommend
> for using the Google LM?  

The Google LM (with nontrivial data size) is really meant to be used 
in conjunction with the -limit-vocab option, which restricts loading 
of parameters to a subset of the vocabulary (i.e., the subset used in your
test or tuning data).

An example of this appears in

BTW, there is no "Google LM" per se in SRILM.  You use the "CountLM" class,
and designate the counts to be read in Google format.
See the -count-lm option as described in ngram(1) man page.

Hope this clarifies things.


More information about the SRILM-User mailing list