[SRILM User List] How to interpolate big LMs?

Thu Aug 16 04:07:13 PDT 2012

Hi, suppose I have trained three big LMs: LM1 LM2 and LM3, each of which
has more than billions of ngrams. I wonder to know how to interpolate such
big LMs together. I found that the ngram command in SRILM would load all
the LMs in memory firstly, so it will reach the limitation of server's
memory. In such situation, how can I get the interpolation of big LMs?

Another question about training LM with large corpus. There are two methods:
1) I can pool all data to train a big LM0.
2) I can split the data into several parts, and train small LMs (eg. LM1
and LM2). Then interpolate them with average weight (eg. 0.5 X LM1 + 0.5 X
LM2 ) to get the final LM3.
All the cut-offs and smoothing algorithm are the same for both methods. So
does LM3 the same with LM0?

Thanks!

Meng CHRN
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20120816/0ee2fb8a/attachment.html>