[SRILM User List] Positive BOW vs. wordProbBO

Dávid Nemeskey nemeskeyd at gmail.com
Wed Sep 28 05:10:44 PDT 2016


Hi guys (esp. Andreas),

I have run into a strange problem. We are debugging our model by
looking at those words that have higher probability in the model than
to word in the text. Most of the time the probability I compute is
correct, but there is an exception: when the BOW is positive (which
surprised me at first, but then I found
http://www.speech.sri.com/pipermail/srilm-user/2004q2/000192.html), I
return a different set of words than the built-in method (used in
Ngram::updateRanks).

The reason, I believe, is that wordProbBO sets the BOW to log_10(1)
whenever it finds a matching n-gram. Which obviously makes sense, but
if the BOW of the previous n-1-gram is positive, it might happen that
P_{n-1}(w) > P_{n}(w).

I have two questions about such situations:
1. Which probability should be used in this case? Is it really that of
the more specific n-gram, and why?
2. What does it mean to have a positive BOW?
3. I tried to artificially create such a situation where the more
specific n-gram has a lower P than the n-1-gram, but failed; the model
somehow corrected the scores. I tried it like this:
A B C 1 (count)
A B D 1
B C 1000
B D 1
So how do positive BOWs come about?

Thanks,
Dávid



More information about the SRILM-User mailing list