[SRILM User List] ARPA format for Ngram LMs with Jelinek-Mercer smoothing

Andreas Stolcke stolcke at icsi.berkeley.edu
Sun May 22 22:45:21 PDT 2011


Ariya Rastrow wrote:
> Hi,
>   I have a question regarding building N-gram LMs with Jelinek-Mercer 
> smoothing. I have optimized the weights using my own scripts on some 
> held-out data and now I am trying to write out the ARPA backoff format 
> of the LM. I have the N-gram probabilities and the corresponding 
> weights for 1grams,2grams and 3grams. I was wondering if I could use 
> SRILM toolkit to get the ARPA representation of my LM. I have tried 
> ngram script with -count-lm option along with -write but then the 
> script only writes out the lm as a header file which is described 
> under -count-lm option. I know this is an easy task and one can use 
> the weights as the backoff weights to get the ARPA format. Any help 
> would be appreciated.
If you know how to create the count-LM then you're halfway there.

To get a backoff LM you can first train a backoff LM using one of the 
standard LM smoothing methods (say GT, the default), then use the 
count-LM (previously created) to "rescore" the probabilities in the 
backoff LM (ngram -rescore-ngram option).    However, be aware this only 
approximates the interpolated LM, but the approximation is exact for all 
ngrams contained in the training data.

Andreas

>
> Thanks,
> Ariya
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://www.speech.sri.com/mailman/listinfo/srilm-user



More information about the SRILM-User mailing list