<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">On 5/24/2017 12:08 AM,
<a class="moz-txt-link-abbreviated" href="mailto:claude.vividsky@gmail.com">claude.vividsky@gmail.com</a> wrote:<br>
</div>
<blockquote type="cite"
cite="mid:002001d2d45c$8ad8cef0$a08a6cd0$@gmail.com">
<pre wrap="">Hi,
which command line parameters must be specified for ngram-count and ngram
when only trigram probabilities should be applied?
At the moment I use:
ngram-count -order 3 -gt3min 1 ...
ngram -order 3 ...
The documentation says on "-order":
Set the *maximal* N-gram order to be used ...
Does this mean that bigrams and unigrams will be used too with "-order 3"?
What means "use" here: Are bigrams and unigrams used only for discounting or
are they used for the calculation of probabilities too?
</pre>
</blockquote>
Claude,<br>
<br>
The standard model type in SRILM is a backoff ngram LM. That means
you always need the lower-order ngrams (unigrams, bigrams) for cases
where the highest-order ngram (trigram) in the test data is not
found in the training data, and therefore in the model itself.<br>
<br>
See <a moz-do-not-send="true"
href="http://www.speech.sri.com/projects/srilm/manpages/ngram-format.5.html">here</a>
for a description of the file format storing all orders of ngrams,
and <a moz-do-not-send="true"
href="http://www.speech.sri.com/projects/srilm/manpages/ngram-discount.7.html">here</a>
for a detailed description of how the parameters associated with
those ngrams are computed.<br>
<br>
If you want to disable the backing-off (i.e., smoothing) in a
trigram LM, use -gt3max 0. However, the file format will still
contain all the lower-order ngrams.<br>
<br>
Andreas<br>
<br>
<br>
<br>
<br>
<br>
</body>
</html>