[SRILM User List] Positive BOW vs. wordProbBO

Nickolay V. Shmyrev nshmyrev at yandex.ru
Sat Oct 22 14:19:14 PDT 2016


The important part is other probabilities, it is not that easy to get positive backoff from few counts. Try this file:

~~~~~
a
b
b c
b d
e f
e f
~~~~~

ngram-count -text test.txt -lm test.lm -order 2

gives

\1-grams:
....
-0.90309	f	0.146128
...


21.10.2016, 19:42, "Dávid Nemeskey" <nemeskeyd at gmail.com>:
> Anyone has any idea about this? Thanks!
>
> On Wed, Sep 28, 2016 at 2:10 PM, Dávid Nemeskey <nemeskeyd at gmail.com> wrote:
>>  Hi guys (esp. Andreas),
>>
>>  I have run into a strange problem. We are debugging our model by
>>  looking at those words that have higher probability in the model than
>>  to word in the text. Most of the time the probability I compute is
>>  correct, but there is an exception: when the BOW is positive (which
>>  surprised me at first, but then I found
>>  http://www.speech.sri.com/pipermail/srilm-user/2004q2/000192.html), I
>>  return a different set of words than the built-in method (used in
>>  Ngram::updateRanks).
>>
>>  The reason, I believe, is that wordProbBO sets the BOW to log_10(1)
>>  whenever it finds a matching n-gram. Which obviously makes sense, but
>>  if the BOW of the previous n-1-gram is positive, it might happen that
>>  P_{n-1}(w) > P_{n}(w).
>>
>>  I have two questions about such situations:
>>  1. Which probability should be used in this case? Is it really that of
>>  the more specific n-gram, and why?
>>  2. What does it mean to have a positive BOW?
>>  3. I tried to artificially create such a situation where the more
>>  specific n-gram has a lower P than the n-1-gram, but failed; the model
>>  somehow corrected the scores. I tried it like this:
>>  A B C 1 (count)
>>  A B D 1
>>  B C 1000
>>  B D 1
>>  So how do positive BOWs come about?
>>
>>  Thanks,
>>  Dávid
>
> _______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://mailman.speech.sri.com/cgi-bin/mailman/listinfo/srilm-user



More information about the SRILM-User mailing list