[SRILM User List] Positive BOW vs. wordProbBO

Wed Oct 26 15:15:28 PDT 2016

See answers to David's original questions below.

On 10/22/2016 2:19 PM, Nickolay V. Shmyrev wrote:
> The important part is other probabilities, it is not that easy to get positive backoff from few counts. Try this file:
>
> ~~~~~
> a
> b
> b c
> b d
> e f
> e f
> ~~~~~
>
> ngram-count -text test.txt -lm test.lm -order 2
>
> gives
>
> \1-grams:
> ....
> -0.90309	f	0.146128
> ...
>
>
> 21.10.2016, 19:42, "Dávid Nemeskey" <nemeskeyd at gmail.com>:
>> Anyone has any idea about this? Thanks!
>>
>> On Wed, Sep 28, 2016 at 2:10 PM, Dávid Nemeskey <nemeskeyd at gmail.com> wrote:
>>>   Hi guys (esp. Andreas),
>>>
>>>   I have run into a strange problem. We are debugging our model by
>>>   looking at those words that have higher probability in the model than
>>>   to word in the text. Most of the time the probability I compute is
>>>   correct, but there is an exception: when the BOW is positive (which
>>>   surprised me at first, but then I found
>>>   http://www.speech.sri.com/pipermail/srilm-user/2004q2/000192.html), I
>>>   return a different set of words than the built-in method (used in
>>>   Ngram::updateRanks).
>>>
>>>   The reason, I believe, is that wordProbBO sets the BOW to log_10(1)
>>>   whenever it finds a matching n-gram. Which obviously makes sense, but
>>>   if the BOW of the previous n-1-gram is positive, it might happen that
>>>   P_{n-1}(w) > P_{n}(w).
>>>
>>>   I have two questions about such situations:
>>>   1. Which probability should be used in this case? Is it really that of
>>>   the more specific n-gram, and why?
Yes, the most specific ngram should be used in all cases (that's how 
back-off models are defined).
>>>   2. What does it mean to have a positive BOW?
The explanation is in the old post you found already:  it means the more 
specific ngrams listed in the model assign a lower overall probability 
mass than the corresponding backed-off probabilities. That's not common, 
but it can happen.  See the example that Nickolay constructed.
>>>   3. I tried to artificially create such a situation where the more
>>>   specific n-gram has a lower P than the n-1-gram, but failed; the model
>>>   somehow corrected the scores. I tried it like this:
>>>   A B C 1 (count)
>>>   A B D 1
>>>   B C 1000
>>>   B D 1
>>>   So how do positive BOWs come about?
Your example doesn't work because it is not a single ngram that triggers 
the positive BOW, but the sum of all ngram probabilities for a given 
context.  Here the sum
     p(C|B) + p(D|B)  > p(C) + p(D).
BTW, you cannot use the default smoothing (GT) because the contrived 
counts don't present suitable count-of-count statistics - you see a 
bunch of warnings about that.  Try it with -wbdiscount .

Andreas