[SRILM User List] Why modified Kneser-Ney much slower than Good-Turing using make-big-lm?
Andreas Stolcke
stolcke at icsi.berkeley.edu
Thu Aug 2 09:40:30 PDT 2012
On 8/2/2012 2:30 AM, Meng Chen wrote:
> Hi, I am training LM using *make-batch-counts*, *merge-batch-counts*
> and *make-big-lm*. I compared the modified Kneser-Ney and Good-Turing
> smoothing algorithm in *make-big-lm*, and found that the training
> speed is much slower by modified Kneser-Ney. I checked the debug
> information, and found that it run *make-kn-counts* and
> *merge-batch-counts*, which cost most of the time. I wonder if the
> extra two steps could run in *make-batch-counts*, so it could save
> much time.
KN is slower because it has to first compute the regular ngram counts,
then, in a second pass, make-kn-counts, which takes the merged ngram
counts as input. Because the counts have to be merged first (you are
counting the ngram types, not the token frequencies) you need to do it
in this order.
Andreas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20120802/dedc5b70/attachment.html>
More information about the SRILM-User
mailing list