[SRILM User List] How to use SRILM with trigrams only
Andreas Stolcke
stolcke at icsi.berkeley.edu
Wed May 24 08:26:59 PDT 2017
On 5/24/2017 12:08 AM, claude.vividsky at gmail.com wrote:
> Hi,
>
> which command line parameters must be specified for ngram-count and ngram
> when only trigram probabilities should be applied?
>
> At the moment I use:
>
> ngram-count -order 3 -gt3min 1 ...
> ngram -order 3 ...
>
>
> The documentation says on "-order":
>
> Set the *maximal* N-gram order to be used ...
>
> Does this mean that bigrams and unigrams will be used too with "-order 3"?
>
> What means "use" here: Are bigrams and unigrams used only for discounting or
> are they used for the calculation of probabilities too?
>
Claude,
The standard model type in SRILM is a backoff ngram LM. That means you
always need the lower-order ngrams (unigrams, bigrams) for cases where
the highest-order ngram (trigram) in the test data is not found in the
training data, and therefore in the model itself.
See here
<http://www.speech.sri.com/projects/srilm/manpages/ngram-format.5.html>
for a description of the file format storing all orders of ngrams, and
here
<http://www.speech.sri.com/projects/srilm/manpages/ngram-discount.7.html>
for a detailed description of how the parameters associated with those
ngrams are computed.
If you want to disable the backing-off (i.e., smoothing) in a trigram
LM, use -gt3max 0. However, the file format will still contain all the
lower-order ngrams.
Andreas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.speech.sri.com/pipermail/srilm-user/attachments/20170524/bb6c37fd/attachment.html>
More information about the SRILM-User
mailing list