KN discounting and zeroton words

Tanel Alumäe tanel.alumae at aqris.com
Mon Jun 6 09:03:31 PDT 2005


Hello,

I've noticed that when using -kndiscount, the zeroton words (words that
are in the vocabulary but not in the training corpus) get a higher
unigram LM probability than words that actually occur (rarely) in the
training corpus. Shouldn't the zeroton words get the same unigram
probability as the words that are discounted to 0 using the -gt1min
option? 

With GT, WB and natural discounting, everything works as expected:
zeroton words get the same unigram probability as the words discounted
to 0.

Regards,
Tanel A.





More information about the SRILM-User mailing list