error in discount estimator for order 3

Andreas Stolcke stolcke at
Thu Aug 3 23:37:19 PDT 2006

Rebecca Madsen wrote:
> Is there a reason why duplicating my data would give me the following 
> error:
> using ModKneserNey for 3-grams
> Kneser-Ney smoothing 3-grams
> n1 = 0
> n2 = 94762
> n3 = 0
> n4 = 37773
> one of required modified KneserNey count-of-counts is zero
> error in discount estimator for order 3
If you look at the formulae for KN discounting you see that they lead to
undefined values when n1 = 0. The same is true of GT discounting.
These dicsounting methods assume that the ngram distribution is "natural",
not manipulated like in your case.
> I can build a language model using the following command line with the
> normal data, but concatenating two copies of the data together gives
> me the discount estimator error.
That's completely expected (see above).  What are you trying to
accomplish by duplicating your data?  Obviously you are not adding
any information by doing so.


More information about the SRILM-User mailing list