KN discounting and zeroton words

Tanel Alumäe tanel.alumae at aqris.com
Mon Jun 13 06:50:00 PDT 2005


> The unigram probabilities for zeroton words are obtained by distributing 
> the backoff mass left by the non-zeroton words evenly over all the zerotons
> (this corresponds to backing off to a uniform distribution).
> Now, if the number of zerotons is small they might actually get more 
> probability than the low-count observed unigrams that way.
> 
> The -interpolate1 option should prevent this since it distributes the 
> backoff mass over ALL unigrams (adding to the probability of those words
> that were observed).
> Please check if this is the case, and if not, send me a test case so
> I can look into why it doesn't work as intended.


Yes, the -interpolate1 option prevents this from happening.

hanks for the help.

Tanel





More information about the SRILM-User mailing list