SRILM BOW denominator warning
Andreas Stolcke
stolcke at
Thu Jan 17 18:25:11 PST 2008
David Gelbart wrote:
> Hello,
> I am trying to build a trigram LM for the OGI Numbers corpus, in which
> utterances are spoken strings of numbers such as 'eighty nine eighty
> eight'. Since there are no singletons, I am using Witten-Bell
> discounting instead of Good-Turing. ngram-count displays "BOW
> denominator for context... is zero" warnings. Does this mean the LM
> is broken? If I try adding "-gt3min 1 -gt2min 1" to the ngram-count
> options, I still see these warnings. Here is the ngram-count output:
> $ ngram-count -wbdiscount -text /u/gelbart/tmp/train.trans -order 3 \
> -lm /u/gelbart/tmp/numbers-wb.lm
> BOW denominator for context "seven" is zero; scaling probabilities to
> sum to 1
> BOW denominator for context "six" is zero; scaling probabilities to
> sum to 1
> BOW denominator for context "four" is zero; scaling probabilities to
> sum to 1
> BOW denominator for context "two" is zero; scaling probabilities to
> sum to 1
> In the generated language model, the log BOWs are zero for those four
> words:
> -1.156247 four 0
> -1.09725 seven 0
> -1.203041 six 0
> -1.029482 two 0
this happens when you have a small vocabulary and all words are observed
in a given context, so there is no backoff mass to distribute over
unseen words.
there is no need to do anything, the LM will work just fine.
this should probably be included in the FAQ for smoothing issues.
> Thanks,
> David
More information about the SRILM-User
mailing list