# [SRILM User List] NgramCountLM Bug?

Andreas Stolcke stolcke at icsi.berkeley.edu
Fri Feb 24 19:42:40 PST 2012

On 2/24/2012 2:35 PM, Ariya Rastrow wrote:
>
> Hi,
> method). It seems to me there is a bug with the way the \lambda
> parameters are being estimated in the code. The problem is that the
> expectations for \lambda's (using EM) are being collected by iterating
> through N-grams of the held-out text. However, the count of the N-gram
> is not being taken into account for each N-gram (even though for
> calculating the log-probability of the held-out the wordProb is being
> multiplied by the count of the N-gram) during the call
> to LM::countsProb(...) by NgramCountLM::estimate(). In other words,
> the statistics for \lambda's are being collected as if each event is a
> singleton in the held-out data. The fix to this would be to pass
> *count from LM::countsProb(...) to NgramCountLM::wordProbTrain(...)
> such that the posteriors of \lambda get multiplied by that count.
>
Good catch!   That is indeed a bug.  Attached is s patch that should do
the right thing.

Andreas

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: ngramcountlm.patch
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20120224/37f59863/attachment.ksh>