[SRILM User List] About wbdiscount and meta-tag options
Andreas Stolcke
stolcke at icsi.berkeley.edu
Thu Jun 13 10:43:20 PDT 2013
On 6/13/2013 8:23 AM, Meng CHEN wrote:
> Hi, in make-big-lm command, it specifies -read-with-mincounts and
> -meta-tag by default. In the help page, it says "if -meta-tag is
> defined, these low-count N-grams will be converted to count-of-count
> N-grams, so that smoothing methods that need this information still
> work correctly". However, for wbdiscount, we don't need the
> count-of-count infomation to compute the discounting parameters. So,
> why does make-big-lm specify -meta-tag option for wbdiscount by
> default? Is that necessary? Can I remove it?(I tried that, and find
> the ngrams are the same in model, but the probability is different.)
> Thanks!
WB discounting requires the count of the distinct word types for each
context. That information can also be gotten from the meta-counts, and
that's why you're getting different results without -meta-tag.
BTW, I should update the man page to say that WB discounting is also
supported in make-big-lm.
Andreas
>
>
> Meng CHEN
>
>
>
> ?????MX
>
>
> _______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://www.speech.sri.com/mailman/listinfo/srilm-user
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20130613/5fc870c5/attachment.html>
More information about the SRILM-User
mailing list