[SRILM User List] [External Sender] A modified count = 0
Andreas Stolcke
stolcke at icsi.berkeley.edu
Fri Jan 11 13:16:26 PST 2019
Anna,
This is a common problem. You need to compute the statistics used by KN
before cutting off the vocabulary.
The make-big-lm
<http://www.speech.sri.com/projects/srilm/manpages/training-scripts.1.html>
wrapper script does this for you.
For explanation and more info see the SRILM FAQ
<http://www.speech.sri.com/projects/srilm/manpages/srilm-faq.7.html>,
under item C3 - d.
Andreas
On 1/11/2019 7:23 AM, Anna Bulusheva wrote:
> Hello,
>
> I try to estimate a LM with modified Kneser-Ney discounting (without
> "-interpolate") and I cut my vocabulary by removing words with count <
> 3. In my list of n-grams I have a n-gram "w1 w2", but I don't have any
> n-grams "* w1 w2". It means that a modified count of "w1 w2" = 0. So I
> don't understand how I must calculate prob("w1 w2"). Could you help
> me, please?
>
> P.S. The order of my LM is 3 and if I use SRILM to estimate this LM
> then there is n-gram "w1 w2" with some probability.
>
> Thank you,
>
> Anna Bulusheva
>
>
> _______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://mailman.speech.sri.com/cgi-bin/mailman/listinfo/srilm-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.speech.sri.com/pipermail/srilm-user/attachments/20190111/1f37039b/attachment.html>
More information about the SRILM-User
mailing list