error in discount estimator for order 3

Andreas Stolcke stolcke at speech.sri.com
Thu Aug 3 23:37:19 PDT 2006


Rebecca Madsen wrote:
> Is there a reason why duplicating my data would give me the following 
> error:
>
> using ModKneserNey for 3-grams
> Kneser-Ney smoothing 3-grams
> n1 = 0
> n2 = 94762
> n3 = 0
> n4 = 37773
> one of required modified KneserNey count-of-counts is zero
> error in discount estimator for order 3
If you look at the formulae for KN discounting you see that they lead to
undefined values when n1 = 0. The same is true of GT discounting.
These dicsounting methods assume that the ngram distribution is "natural",
not manipulated like in your case.
>
> I can build a language model using the following command line with the
> normal data, but concatenating two copies of the data together gives
> me the discount estimator error.
That's completely expected (see above).  What are you trying to
accomplish by duplicating your data?  Obviously you are not adding
any information by doing so.

--Andreas





More information about the SRILM-User mailing list