[SRILM]: linear interpolation of LMs

Andreas Stolcke stolcke at speech.sri.com
Fri May 30 09:05:50 PDT 2008


Bert Reveil wrote:
> Dear Dr. Stolcke,
>
> I have recently been trying to evaluate linear combinations of LMs 
> using your SRILM-toolkit. Therefore I used the following command form
>
> "ngram  -debug 0  -lm LM1.arpa  -lambda 0.6/0.7/...  -mix-lm LM2.arpa  
> -ppl some_text.txt"
>
> Although every run of this command returns plausible output, it also 
> produces the following warning/error-line
>
> BOW numerator for context "" is -0.1 < 0
>
> At first I thought it might have been because I had some double spaces 
> in my texts, but after correcting that the warning still 
> remained...I've been looking this problem up on the mailing list, but 
> I have found no priors, so I'm directing this question to you...have 
> you got any idea what this warning means and how I can make it 
> disappear? Maybe I'm not using the 'ngram'-program correctly?
They way you invoked ngram it merges the two LMs into a single new 
backoff ngram model, and then uses that merged LM (this is also called 
"static" interpolation).
In the merging step the backoff weights are recomputed to normalize the 
merged probabilities.  The message you are seeing indicates that the 
unigram probabilities
add up to something > 1.   This could be a problem with your original 
LMs.  Where those created by SRILM as well ?  If so we need to investigate.
If you computed LM1 and LM2 by some other means you can use SRILM to 
renormalize them individually before doing the interpolation:

       ngram -lm LM1 -renorm -write-lm LM1norm

Separate from all this, you can do "dynamic" interpolation where the 
mixed probabilities are computed on the fly .  This is faster.  Add the 
option "-bayes 0" to your ngram options in the command you used.

Andreas

>
> With kind regards,
>
> Bert





More information about the SRILM-User mailing list