[SRILM User List] About wbdiscount and meta-tag options

Andreas Stolcke stolcke at icsi.berkeley.edu
Thu Jun 13 10:43:20 PDT 2013


On 6/13/2013 8:23 AM, Meng CHEN wrote:
> Hi, in make-big-lm command, it specifies -read-with-mincounts and 
> -meta-tag by default. In the help page, it says "if -meta-tag is 
> defined, these low-count N-grams will be converted to count-of-count 
> N-grams, so that smoothing methods that need this information still 
> work correctly". However, for wbdiscount, we don't need the 
> count-of-count infomation to compute the discounting parameters. So, 
> why does make-big-lm specify -meta-tag option for wbdiscount by 
> default? Is that necessary? Can I remove it?(I tried that, and find 
> the ngrams are the same in model, but the probability is different.)
> Thanks!

WB discounting requires the count of the distinct word types for each 
context.  That information can also be gotten from the meta-counts, and 
that's why you're getting different results without -meta-tag.

BTW, I should update the man page to say that WB discounting is also 
supported in make-big-lm.

Andreas

>
>
> Meng CHEN
>
>
>
> ?????MX
>
>
> _______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://www.speech.sri.com/mailman/listinfo/srilm-user

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20130613/5fc870c5/attachment.html>


More information about the SRILM-User mailing list