Ney's absolute discounting and zeroton words

Mon Jun 13 08:20:09 PDT 2005

Hello,

I continue my quest with zeroton words. I want to control the amount of
probability that is distributed upon words that are in the vocabulary
but are not in the training corpus. It seems that Ney's absolute
discounting is good for that.

So, I started experimenting with the constant for Ney's discounting.
Here are the unigram probability for an unseen word, for different
discounting factors:
0.1       -1.410174
0.01      -2.410174
0.001     -3.410148
0.0001    -4.410249
0.00001   -5.409665
0.000001  -1.278751
0.0000001 -1.278753

As you see, there is a abrupt increase in probability when the constant
gets to 0.000001, which is unexpected. Is this how it should be or
caused by some numerical problems? I'm using SRILM on 32-bit x86
processor.

The numbers here are given for a small test set but I've seen similar
behaviour for large sets.

Regards,

Tanel