[SRILM User List] Adding n-grams to an existing LM
Joris Pelemans
Joris.Pelemans at esat.kuleuven.be
Fri Nov 1 17:00:26 PDT 2013
Hello,
I have an existing 5-gram LM with KN discounting and I would like to add
new words to it. To estimate reasonable n-gram probabilities for a new
word, I am now using (a fraction of) the probabilities of a synonym of
the word. I am simply replacing every occurrence of the synonym with the
new word, copying the logprob (or slightly altering it in case of a
fraction) and alpha and adding the new line to the LM. Obviously the
resulting n-gram is no longer normalized. I thought I would be able to
fix this relatively easily with:
ngram -lm src.arpa -order 5 -renorm -write-lm dest.arpa
but I get a lot of errors of the type "BOW numerator for context is ...
< 0" and "BOW denominator for context is ... <= 0.
What do these errors mean, can I ignore them or is there a better way to
renormalize my new LMs?
Thanks in advance,
Joris
More information about the SRILM-User
mailing list