[SRILM User List] [External Sender] A modified count = 0

Andreas Stolcke stolcke at icsi.berkeley.edu
Fri Jan 11 13:16:26 PST 2019


Anna,

This is a common problem.  You need to compute the statistics used by KN 
before cutting off the vocabulary.

The make-big-lm 
<http://www.speech.sri.com/projects/srilm/manpages/training-scripts.1.html> 
wrapper script does this for you.

For explanation and more info see the SRILM FAQ 
<http://www.speech.sri.com/projects/srilm/manpages/srilm-faq.7.html>, 
under item C3 - d.

Andreas

On 1/11/2019 7:23 AM, Anna Bulusheva wrote:
> Hello,
>
> I try to estimate a LM with modified Kneser-Ney discounting (without 
> "-interpolate") and I cut my vocabulary by removing words with count < 
> 3. In my list of n-grams I have a n-gram "w1 w2", but I don't have any 
> n-grams "* w1 w2". It means that a modified count of "w1 w2" = 0. So I 
> don't understand how I must calculate prob("w1 w2"). Could you help 
> me, please?
>
> P.S. The order of my LM is 3 and if I use SRILM to estimate this LM 
> then there is n-gram "w1 w2" with some probability.
>
> Thank you,
>
> Anna Bulusheva
>
>
> _______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://mailman.speech.sri.com/cgi-bin/mailman/listinfo/srilm-user
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.speech.sri.com/pipermail/srilm-user/attachments/20190111/1f37039b/attachment.html>


More information about the SRILM-User mailing list