KN discounting and zeroton words
Tanel Alumäe
tanel.alumae at aqris.com
Mon Jun 6 09:03:31 PDT 2005
Hello,
I've noticed that when using -kndiscount, the zeroton words (words that
are in the vocabulary but not in the training corpus) get a higher
unigram LM probability than words that actually occur (rarely) in the
training corpus. Shouldn't the zeroton words get the same unigram
probability as the words that are discounted to 0 using the -gt1min
option?
With GT, WB and natural discounting, everything works as expected:
zeroton words get the same unigram probability as the words discounted
to 0.
Regards,
Tanel A.
More information about the SRILM-User
mailing list