[SRILM User List] make-big-lm with kn-/wb-discount

Andreas Stolcke stolcke at icsi.berkeley.edu
Thu Oct 31 16:02:03 PDT 2013


On 10/31/2013 6:21 PM, Sergey Zablotskiy wrote:
> Hi Everybody,
>
> is there any workaround to combine modified Kneser-Ney smoothing for 
> lower-order n-grams along with Witten-Bell smooting for higher-order 
> n-grams using the MAKE-BIG-LM training script?
>
> I am getting the following error/message:
> make-big-lm: must use one of GT, KN, or WB discounting for all orders
>
> while executing:
> >> make-big-lm -read ${count_file} -vocab ${vocab} -unk -order 4 \
>         -kndiscount1 -kndiscount2 -kndiscount3 -wbdiscount4 \
>         -interpolate -lm name.lm
>
> I can not use the kndiscount for 4-Gram because some counts of counts 
> are zero in my case.
>

1) It does not make sense to combine KN discounting for lower-order 
ngrams with some other method since the KN method of discounting the 
lower-order ngram is designed precisely to complement the discounting 
for the highest-order ngrams.

2) make-big-lm invokes a helper script called make-kn-discounts to 
compute the discounting factors based on the counts-of-counts.  It tries 
to fill in for missing (zero) counts-of-counts based on an empirical 
regularity in the counts-of-counts (the details are in Section 4 of this 
paper 
<http://www.speech.sri.com/cgi-bin/run-distill?papers/asru2007-mt-lm.ps.gz>).
If that mechanism doesn't work for some reason we should try to fix it.

Andreas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131101/7b34a072/attachment.html>


More information about the SRILM-User mailing list