[SRILM User List] How to use SRILM with trigrams only

Andreas Stolcke stolcke at icsi.berkeley.edu
Wed May 24 08:26:59 PDT 2017


On 5/24/2017 12:08 AM, claude.vividsky at gmail.com wrote:
> Hi,
>
> which command line parameters must be specified for ngram-count and ngram
> when only trigram probabilities should be applied?
>
> At the moment I use:
>
>    ngram-count -order 3 -gt3min 1 ...
>    ngram       -order 3 ...
>
>
> The documentation says on "-order":
>
>    Set the *maximal* N-gram order to be used ...
>
> Does this mean that bigrams and unigrams will be used too with "-order 3"?
>
> What means "use" here: Are bigrams and unigrams used only for discounting or
> are they used for the calculation of probabilities too?
>
Claude,

The standard model type in SRILM is a backoff ngram  LM.  That means you 
always need the lower-order ngrams (unigrams, bigrams) for cases where 
the highest-order ngram (trigram) in the test data is not found in the 
training data, and therefore in the model itself.

See here 
<http://www.speech.sri.com/projects/srilm/manpages/ngram-format.5.html> 
for a description of the file format storing all orders of ngrams, and 
here 
<http://www.speech.sri.com/projects/srilm/manpages/ngram-discount.7.html> 
for a detailed description of how the parameters associated with those 
ngrams are computed.

If you want to disable the backing-off (i.e., smoothing) in a trigram 
LM, use -gt3max 0.   However, the file format will still contain all the 
lower-order ngrams.

Andreas





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.speech.sri.com/pipermail/srilm-user/attachments/20170524/bb6c37fd/attachment.html>


More information about the SRILM-User mailing list