SRILM BOW denominator warning

Andreas Stolcke stolcke at speech.sri.com
Thu Jan 17 18:25:11 PST 2008


David Gelbart wrote:
> Hello,
>
> I am trying to build a trigram LM for the OGI Numbers corpus, in which 
> utterances are spoken strings of numbers such as 'eighty nine eighty 
> eight'.  Since there are no singletons, I am using Witten-Bell 
> discounting instead of Good-Turing.  ngram-count displays "BOW 
> denominator for context... is zero" warnings.  Does this mean the LM 
> is broken?  If I try adding "-gt3min 1 -gt2min 1" to the ngram-count 
> options, I still see these warnings.  Here is the ngram-count output:
>
> $ ngram-count -wbdiscount -text /u/gelbart/tmp/train.trans -order 3 \
>   -lm /u/gelbart/tmp/numbers-wb.lm
> BOW denominator for context "seven" is zero; scaling probabilities to 
> sum to 1
> BOW denominator for context "six" is zero; scaling probabilities to 
> sum to 1
> BOW denominator for context "four" is zero; scaling probabilities to 
> sum to 1
> BOW denominator for context "two" is zero; scaling probabilities to 
> sum to 1
>
> In the generated language model, the log BOWs are zero for those four 
> words:
>
> -1.156247       four    0
> -1.09725        seven   0
> -1.203041       six     0
> -1.029482       two     0
>
this happens when you have a small vocabulary and all words are observed 
in a given context, so there is no backoff mass to distribute over 
unseen words.

there is no need to do anything,   the LM will work just fine.

this should probably be included in the FAQ for smoothing issues.

Andreas

> Thanks,
> David





More information about the SRILM-User mailing list