[SRILM User List] Positive BOW vs. wordProbBO

Fri Oct 21 09:24:44 PDT 2016

Anyone has any idea about this? Thanks!

On Wed, Sep 28, 2016 at 2:10 PM, Dávid Nemeskey <nemeskeyd at gmail.com> wrote:
> Hi guys (esp. Andreas),
>
> I have run into a strange problem. We are debugging our model by
> looking at those words that have higher probability in the model than
> to word in the text. Most of the time the probability I compute is
> correct, but there is an exception: when the BOW is positive (which
> surprised me at first, but then I found
> http://www.speech.sri.com/pipermail/srilm-user/2004q2/000192.html), I
> return a different set of words than the built-in method (used in
> Ngram::updateRanks).
>
> The reason, I believe, is that wordProbBO sets the BOW to log_10(1)
> whenever it finds a matching n-gram. Which obviously makes sense, but
> if the BOW of the previous n-1-gram is positive, it might happen that
> P_{n-1}(w) > P_{n}(w).
>
> I have two questions about such situations:
> 1. Which probability should be used in this case? Is it really that of
> the more specific n-gram, and why?
> 2. What does it mean to have a positive BOW?
> 3. I tried to artificially create such a situation where the more
> specific n-gram has a lower P than the n-1-gram, but failed; the model
> somehow corrected the scores. I tried it like this:
> A B C 1 (count)
> A B D 1
> B C 1000
> B D 1
> So how do positive BOWs come about?
>
> Thanks,
> Dávid