KN discounting and zeroton words
tanel.alumae at aqris.com
Mon Jun 13 06:50:00 PDT 2005
> The unigram probabilities for zeroton words are obtained by distributing
> the backoff mass left by the non-zeroton words evenly over all the zerotons
> (this corresponds to backing off to a uniform distribution).
> Now, if the number of zerotons is small they might actually get more
> probability than the low-count observed unigrams that way.
> The -interpolate1 option should prevent this since it distributes the
> backoff mass over ALL unigrams (adding to the probability of those words
> that were observed).
> Please check if this is the case, and if not, send me a test case so
> I can look into why it doesn't work as intended.
Yes, the -interpolate1 option prevents this from happening.
hanks for the help.
More information about the SRILM-User