[SRILM User List] ARPA format for Ngram LMs with Jelinek-Mercer smoothing

Ariya Rastrow ariya at jhu.edu
Fri May 27 11:25:13 PDT 2011

On Mon, May 23, 2011 at 1:45 AM, Andreas Stolcke
<stolcke at icsi.berkeley.edu>wrote:

> Ariya Rastrow wrote:
>> Hi,
>>  I have a question regarding building N-gram LMs with Jelinek-Mercer
>> smoothing. I have optimized the weights using my own scripts on some
>> held-out data and now I am trying to write out the ARPA backoff format of
>> the LM. I have the N-gram probabilities and the corresponding weights for
>> 1grams,2grams and 3grams. I was wondering if I could use SRILM toolkit to
>> get the ARPA representation of my LM. I have tried ngram script with
>> -count-lm option along with -write but then the script only writes out the
>> lm as a header file which is described under -count-lm option. I know this
>> is an easy task and one can use the weights as the backoff weights to get
>> the ARPA format. Any help would be appreciated.
> If you know how to create the count-LM then you're halfway there.
> To get a backoff LM you can first train a backoff LM using one of the
> standard LM smoothing methods (say GT, the default), then use the count-LM
> (previously created) to "rescore" the probabilities in the backoff LM (ngram
> -rescore-ngram option).    However, be aware this only approximates the
> interpolated LM, but the approximation is exact for all ngrams contained in
> the training data.
> Andreas
> The reason I wanted to get ARPA format for Jelinek-Mercer smoothed LM was
to be able to load it in a c++ code. I understand the ARPA format would be
an approximation as you mentioned. Can you please let me know what the best
way would be to load the N-grams and their probabilities along with the
interpolation weights in a c++ code and perhaps do the interpolation on the
fly? Basically my question is how to use Jelinek-Mercer LM in a c++ code
given the fact that I already have the weights and N-gram probabilities (I
can make the header file as in -count-lm)?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20110527/d5b32aab/attachment.html>

More information about the SRILM-User mailing list