[SRILM]: linear interpolation of LMs
Andreas Stolcke
stolcke at speech.sri.com
Fri May 30 09:05:50 PDT 2008
Bert Reveil wrote:
> Dear Dr. Stolcke,
>
> I have recently been trying to evaluate linear combinations of LMs
> using your SRILM-toolkit. Therefore I used the following command form
>
> "ngram -debug 0 -lm LM1.arpa -lambda 0.6/0.7/... -mix-lm LM2.arpa
> -ppl some_text.txt"
>
> Although every run of this command returns plausible output, it also
> produces the following warning/error-line
>
> BOW numerator for context "" is -0.1 < 0
>
> At first I thought it might have been because I had some double spaces
> in my texts, but after correcting that the warning still
> remained...I've been looking this problem up on the mailing list, but
> I have found no priors, so I'm directing this question to you...have
> you got any idea what this warning means and how I can make it
> disappear? Maybe I'm not using the 'ngram'-program correctly?
They way you invoked ngram it merges the two LMs into a single new
backoff ngram model, and then uses that merged LM (this is also called
"static" interpolation).
In the merging step the backoff weights are recomputed to normalize the
merged probabilities. The message you are seeing indicates that the
unigram probabilities
add up to something > 1. This could be a problem with your original
LMs. Where those created by SRILM as well ? If so we need to investigate.
If you computed LM1 and LM2 by some other means you can use SRILM to
renormalize them individually before doing the interpolation:
ngram -lm LM1 -renorm -write-lm LM1norm
Separate from all this, you can do "dynamic" interpolation where the
mixed probabilities are computed on the fly . This is faster. Add the
option "-bayes 0" to your ngram options in the command you used.
Andreas
>
> With kind regards,
>
> Bert
More information about the SRILM-User
mailing list