From bulusheva at speechpro.com Fri Jan 11 07:23:52 2019 From: bulusheva at speechpro.com (Anna Bulusheva) Date: Fri, 11 Jan 2019 18:23:52 +0300 Subject: [SRILM User List] [External Sender] A modified count = 0 Message-ID: <0077e49e-fbcf-63e7-2406-66c6491ab845@speechpro.com> Hello, I try to estimate a LM with modified Kneser-Ney discounting (without "-interpolate") and I cut my vocabulary by removing words with count < 3. In my list of n-grams I have a n-gram "w1 w2", but I don't have any n-grams "* w1 w2". It means that a modified count of "w1 w2" = 0. So I don't understand how I must calculate prob("w1 w2"). Could you help me, please? P.S. The order of my LM is 3 and if I use SRILM to estimate this LM then there is n-gram "w1 w2" with some probability. Thank you, Anna Bulusheva From stolcke at icsi.berkeley.edu Fri Jan 11 13:16:26 2019 From: stolcke at icsi.berkeley.edu (Andreas Stolcke) Date: Fri, 11 Jan 2019 13:16:26 -0800 Subject: [SRILM User List] [External Sender] A modified count = 0 In-Reply-To: <0077e49e-fbcf-63e7-2406-66c6491ab845@speechpro.com> References: <0077e49e-fbcf-63e7-2406-66c6491ab845@speechpro.com> Message-ID: <81551bcd-65ce-b0a6-fd98-d4d90e160bbf@icsi.berkeley.edu> Anna, This is a common problem.  You need to compute the statistics used by KN before cutting off the vocabulary. The make-big-lm wrapper script does this for you. For explanation and more info see the SRILM FAQ , under item C3 - d. Andreas On 1/11/2019 7:23 AM, Anna Bulusheva wrote: > Hello, > > I try to estimate a LM with modified Kneser-Ney discounting (without > "-interpolate") and I cut my vocabulary by removing words with count < > 3. In my list of n-grams I have a n-gram "w1 w2", but I don't have any > n-grams "* w1 w2". It means that a modified count of "w1 w2" = 0. So I > don't understand how I must calculate prob("w1 w2"). Could you help > me, please? > > P.S. The order of my LM is 3 and if I use SRILM to estimate this LM > then there is n-gram "w1 w2" with some probability. > > Thank you, > > Anna Bulusheva > > > _______________________________________________ > SRILM-User site list > SRILM-User at speech.sri.com > http://mailman.speech.sri.com/cgi-bin/mailman/listinfo/srilm-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: