[SRILM User List] Adding n-grams to an existing LM

Joris Pelemans Joris.Pelemans at esat.kuleuven.be
Fri Nov 1 17:00:26 PDT 2013


Hello,

I have an existing 5-gram LM with KN discounting and I would like to add 
new words to it. To estimate reasonable n-gram probabilities for a new 
word, I am now using (a fraction of) the probabilities of a synonym of 
the word. I am simply replacing every occurrence of the synonym with the 
new word, copying the logprob (or slightly altering it in case of a 
fraction) and alpha and adding the new line to the LM. Obviously the 
resulting n-gram is no longer normalized. I thought I would be able to 
fix this relatively easily with:

ngram -lm src.arpa -order 5 -renorm -write-lm dest.arpa

but I get a lot of errors of the type "BOW numerator for context is ... 
< 0" and "BOW denominator for context is ... <= 0.

What do these errors mean, can I ignore them or is there a better way to 
renormalize my new LMs?

Thanks in advance,

Joris


More information about the SRILM-User mailing list