[SRILM User List] exact command for combining more than two language models in srilm
Andreas Stolcke
stolcke at icsi.berkeley.edu
Sat Aug 19 10:08:08 PDT 2017
On 8/19/2017 1:56 AM, Shreya Singh wrote:
> Hi,
> I would like to know whether there is a command for combining more
> than two language models in srilm. i know for only two lms the command
> is :
> ngram -order N -lm LM1 -mix-lm LM2 -lambda W -write-lm MIXLM
> Where N is the maximum ngram order in the two LMs, LM1, LM2 are the
> input models, W is the weight to give to LM1, and MIXLM is the merged
> model file.
>
> what should i use for more than two lms?
ngram -order N \
-lm LM0 -lambda W0 \
-mix-lm LM1 \
-mix-lm2 LM2 -mix-lambda2 W2 \
-mix-lm3 LM3 -mix-lambda3 W3 \
...
-mix-lm9 LM9 -mix-lambda9 W9 \
-write-lm MIXLM
As you can see there is no option for the weight of LM1, since that is
implicitly given by 1 minus the sum of the other weights.
Because this syntax is a little inconsistent and limited to 10 models,
there is also a more general mechanism, which reads the model
specification from a file. Here is the relevant section from the
ngram(1) man page:
> -read-mix-lms
> Read a list of linearly interpolated (mixture) LMs and
> their weights from the file specified with -lm, instead of gathering
> this information from the command line options
> above. Each line in file starts with the filename containing the
> component
> LM, followed by zero or more component-specific options:
>
> -weight W the prior weight given to the component LM
>
> -order N the maximal ngram order to use
>
> -type T the LM type, one of ARPA (the default),
> COUNTLM, MAXENT, LMCLIENT, or MSWEBLM
>
> -classes C the word class definitions for the
> component LM (which must be of type ARPA)
>
> -cache-served-ngrams
> enables client-side caching for LMs of
> type LMCLIENT or MSWEBLM.
>
> The global options -bayes, -bayes-scale, and
> -context-priors still apply with -read-mix-lms. When -bayes is NOT
> used, the
> interpolation is static by ngram merging, and forces all
> component LMs to be of type ARPA or MAXENT.
>
> -cache length
Note the file from which the model specification is read must be given
with the -lm file option.
Andreas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.speech.sri.com/pipermail/srilm-user/attachments/20170819/609abdbc/attachment.html>
More information about the SRILM-User
mailing list