[SRILM User List] Positive BOW vs. wordProbBO
Nickolay V. Shmyrev
nshmyrev at yandex.ru
Sat Oct 22 14:19:14 PDT 2016
The important part is other probabilities, it is not that easy to get positive backoff from few counts. Try this file:
~~~~~
a
b
b c
b d
e f
e f
~~~~~
ngram-count -text test.txt -lm test.lm -order 2
gives
\1-grams:
....
-0.90309 f 0.146128
...
21.10.2016, 19:42, "Dávid Nemeskey" <nemeskeyd at gmail.com>:
> Anyone has any idea about this? Thanks!
>
> On Wed, Sep 28, 2016 at 2:10 PM, Dávid Nemeskey <nemeskeyd at gmail.com> wrote:
>> Hi guys (esp. Andreas),
>>
>> I have run into a strange problem. We are debugging our model by
>> looking at those words that have higher probability in the model than
>> to word in the text. Most of the time the probability I compute is
>> correct, but there is an exception: when the BOW is positive (which
>> surprised me at first, but then I found
>> http://www.speech.sri.com/pipermail/srilm-user/2004q2/000192.html), I
>> return a different set of words than the built-in method (used in
>> Ngram::updateRanks).
>>
>> The reason, I believe, is that wordProbBO sets the BOW to log_10(1)
>> whenever it finds a matching n-gram. Which obviously makes sense, but
>> if the BOW of the previous n-1-gram is positive, it might happen that
>> P_{n-1}(w) > P_{n}(w).
>>
>> I have two questions about such situations:
>> 1. Which probability should be used in this case? Is it really that of
>> the more specific n-gram, and why?
>> 2. What does it mean to have a positive BOW?
>> 3. I tried to artificially create such a situation where the more
>> specific n-gram has a lower P than the n-1-gram, but failed; the model
>> somehow corrected the scores. I tried it like this:
>> A B C 1 (count)
>> A B D 1
>> B C 1000
>> B D 1
>> So how do positive BOWs come about?
>>
>> Thanks,
>> Dávid
>
> _______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://mailman.speech.sri.com/cgi-bin/mailman/listinfo/srilm-user
More information about the SRILM-User
mailing list