[SRILM User List] make-big-lm with kn-/wb-discount
Andreas Stolcke
stolcke at icsi.berkeley.edu
Thu Oct 31 16:02:03 PDT 2013
On 10/31/2013 6:21 PM, Sergey Zablotskiy wrote:
> Hi Everybody,
>
> is there any workaround to combine modified Kneser-Ney smoothing for
> lower-order n-grams along with Witten-Bell smooting for higher-order
> n-grams using the MAKE-BIG-LM training script?
>
> I am getting the following error/message:
> make-big-lm: must use one of GT, KN, or WB discounting for all orders
>
> while executing:
> >> make-big-lm -read ${count_file} -vocab ${vocab} -unk -order 4 \
> -kndiscount1 -kndiscount2 -kndiscount3 -wbdiscount4 \
> -interpolate -lm name.lm
>
> I can not use the kndiscount for 4-Gram because some counts of counts
> are zero in my case.
>
1) It does not make sense to combine KN discounting for lower-order
ngrams with some other method since the KN method of discounting the
lower-order ngram is designed precisely to complement the discounting
for the highest-order ngrams.
2) make-big-lm invokes a helper script called make-kn-discounts to
compute the discounting factors based on the counts-of-counts. It tries
to fill in for missing (zero) counts-of-counts based on an empirical
regularity in the counts-of-counts (the details are in Section 4 of this
paper
<http://www.speech.sri.com/cgi-bin/run-distill?papers/asru2007-mt-lm.ps.gz>).
If that mechanism doesn't work for some reason we should try to fix it.
Andreas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131101/7b34a072/attachment.html>
More information about the SRILM-User
mailing list