<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">Anna,</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">This is a common problem. You need to
compute the statistics used by KN before cutting off the
vocabulary.</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">The <a moz-do-not-send="true"
href="http://www.speech.sri.com/projects/srilm/manpages/training-scripts.1.html">make-big-lm</a>
wrapper script does this for you.</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">For explanation and more info see the <a
moz-do-not-send="true"
href="http://www.speech.sri.com/projects/srilm/manpages/srilm-faq.7.html">SRILM
FAQ</a>, under item C3 - d.</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">Andreas<br>
</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">On 1/11/2019 7:23 AM, Anna Bulusheva
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:0077e49e-fbcf-63e7-2406-66c6491ab845@speechpro.com">Hello,
<br>
<br>
I try to estimate a LM with modified Kneser-Ney discounting
(without "-interpolate") and I cut my vocabulary by removing words
with count < 3. In my list of n-grams I have a n-gram "w1 w2",
but I don't have any n-grams "* w1 w2". It means that a modified
count of "w1 w2" = 0. So I don't understand how I must calculate
prob("w1 w2"). Could you help me, please?
<br>
<br>
P.S. The order of my LM is 3 and if I use SRILM to estimate this
LM then there is n-gram "w1 w2" with some probability.
<br>
<br>
Thank you,
<br>
<br>
Anna Bulusheva
<br>
<br>
<br>
_______________________________________________
<br>
SRILM-User site list
<br>
<a class="moz-txt-link-abbreviated" href="mailto:SRILM-User@speech.sri.com">SRILM-User@speech.sri.com</a>
<br>
<a class="moz-txt-link-freetext" href="http://mailman.speech.sri.com/cgi-bin/mailman/listinfo/srilm-user">http://mailman.speech.sri.com/cgi-bin/mailman/listinfo/srilm-user</a>
<br>
<br>
</blockquote>
<p><br>
</p>
</body>
</html>