[SRILM User List] Positive BOW vs. wordProbBO
Andreas Stolcke
stolcke at icsi.berkeley.edu
Wed Oct 26 15:15:28 PDT 2016
See answers to David's original questions below.
On 10/22/2016 2:19 PM, Nickolay V. Shmyrev wrote:
> The important part is other probabilities, it is not that easy to get positive backoff from few counts. Try this file:
>
> ~~~~~
> a
> b
> b c
> b d
> e f
> e f
> ~~~~~
>
> ngram-count -text test.txt -lm test.lm -order 2
>
> gives
>
> \1-grams:
> ....
> -0.90309 f 0.146128
> ...
>
>
> 21.10.2016, 19:42, "Dávid Nemeskey" <nemeskeyd at gmail.com>:
>> Anyone has any idea about this? Thanks!
>>
>> On Wed, Sep 28, 2016 at 2:10 PM, Dávid Nemeskey <nemeskeyd at gmail.com> wrote:
>>> Hi guys (esp. Andreas),
>>>
>>> I have run into a strange problem. We are debugging our model by
>>> looking at those words that have higher probability in the model than
>>> to word in the text. Most of the time the probability I compute is
>>> correct, but there is an exception: when the BOW is positive (which
>>> surprised me at first, but then I found
>>> http://www.speech.sri.com/pipermail/srilm-user/2004q2/000192.html), I
>>> return a different set of words than the built-in method (used in
>>> Ngram::updateRanks).
>>>
>>> The reason, I believe, is that wordProbBO sets the BOW to log_10(1)
>>> whenever it finds a matching n-gram. Which obviously makes sense, but
>>> if the BOW of the previous n-1-gram is positive, it might happen that
>>> P_{n-1}(w) > P_{n}(w).
>>>
>>> I have two questions about such situations:
>>> 1. Which probability should be used in this case? Is it really that of
>>> the more specific n-gram, and why?
Yes, the most specific ngram should be used in all cases (that's how
back-off models are defined).
>>> 2. What does it mean to have a positive BOW?
The explanation is in the old post you found already: it means the more
specific ngrams listed in the model assign a lower overall probability
mass than the corresponding backed-off probabilities. That's not common,
but it can happen. See the example that Nickolay constructed.
>>> 3. I tried to artificially create such a situation where the more
>>> specific n-gram has a lower P than the n-1-gram, but failed; the model
>>> somehow corrected the scores. I tried it like this:
>>> A B C 1 (count)
>>> A B D 1
>>> B C 1000
>>> B D 1
>>> So how do positive BOWs come about?
Your example doesn't work because it is not a single ngram that triggers
the positive BOW, but the sum of all ngram probabilities for a given
context. Here the sum
p(C|B) + p(D|B) > p(C) + p(D).
BTW, you cannot use the default smoothing (GT) because the contrived
counts don't present suitable count-of-count statistics - you see a
bunch of warnings about that. Try it with -wbdiscount .
Andreas
More information about the SRILM-User
mailing list