[SRILM User List] [External Sender] A modified count = 0

Andreas Stolcke stolcke at icsi.berkeley.edu
Fri Jan 11 13:16:26 PST 2019


This is a common problem.  You need to compute the statistics used by KN 
before cutting off the vocabulary.

The make-big-lm 
wrapper script does this for you.

For explanation and more info see the SRILM FAQ 
under item C3 - d.


On 1/11/2019 7:23 AM, Anna Bulusheva wrote:
> Hello,
> I try to estimate a LM with modified Kneser-Ney discounting (without 
> "-interpolate") and I cut my vocabulary by removing words with count < 
> 3. In my list of n-grams I have a n-gram "w1 w2", but I don't have any 
> n-grams "* w1 w2". It means that a modified count of "w1 w2" = 0. So I 
> don't understand how I must calculate prob("w1 w2"). Could you help 
> me, please?
> P.S. The order of my LM is 3 and if I use SRILM to estimate this LM 
> then there is n-gram "w1 w2" with some probability.
> Thank you,
> Anna Bulusheva
> _______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://mailman.speech.sri.com/cgi-bin/mailman/listinfo/srilm-user

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.speech.sri.com/pipermail/srilm-user/attachments/20190111/1f37039b/attachment.html>

More information about the SRILM-User mailing list