[SRILM User List] ARPA format for Ngram LMs with Jelinek-Mercer smoothing
Andreas Stolcke
stolcke at icsi.berkeley.edu
Sun May 22 22:45:21 PDT 2011
Ariya Rastrow wrote:
> Hi,
> I have a question regarding building N-gram LMs with Jelinek-Mercer
> smoothing. I have optimized the weights using my own scripts on some
> held-out data and now I am trying to write out the ARPA backoff format
> of the LM. I have the N-gram probabilities and the corresponding
> weights for 1grams,2grams and 3grams. I was wondering if I could use
> SRILM toolkit to get the ARPA representation of my LM. I have tried
> ngram script with -count-lm option along with -write but then the
> script only writes out the lm as a header file which is described
> under -count-lm option. I know this is an easy task and one can use
> the weights as the backoff weights to get the ARPA format. Any help
> would be appreciated.
If you know how to create the count-LM then you're halfway there.
To get a backoff LM you can first train a backoff LM using one of the
standard LM smoothing methods (say GT, the default), then use the
count-LM (previously created) to "rescore" the probabilities in the
backoff LM (ngram -rescore-ngram option). However, be aware this only
approximates the interpolated LM, but the approximation is exact for all
ngrams contained in the training data.
Andreas
>
> Thanks,
> Ariya
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://www.speech.sri.com/mailman/listinfo/srilm-user
More information about the SRILM-User
mailing list