[SRILM User List] exact command for combining more than two language models in srilm

Andreas Stolcke stolcke at icsi.berkeley.edu
Sat Aug 19 10:08:08 PDT 2017


On 8/19/2017 1:56 AM, Shreya Singh wrote:
> Hi,
> I would like to know whether there is a command for combining more 
> than two language models in srilm. i know for only two lms the command 
> is :
> ngram -order N  -lm LM1  -mix-lm LM2 -lambda W -write-lm MIXLM
> Where N is the maximum ngram order in the two LMs, LM1, LM2 are the
> input models, W is the weight to give to LM1, and MIXLM is the merged
> model file.
>
> what should i use for more than two lms?

ngram -order N \
         -lm LM0  -lambda W0 \
         -mix-lm LM1     \
         -mix-lm2 LM2 -mix-lambda2 W2 \
         -mix-lm3 LM3 -mix-lambda3 W3 \
         ...
         -mix-lm9 LM9 -mix-lambda9 W9 \
         -write-lm MIXLM

As you can see there is no option for the weight of LM1, since that is 
implicitly given by 1 minus the sum of the other weights.

Because this syntax is a little inconsistent and limited to 10 models, 
there is also a more general mechanism, which reads the model 
specification from a file.  Here is the relevant section from the 
ngram(1) man page:

>        -read-mix-lms
>               Read a list of linearly interpolated (mixture) LMs and 
> their weights from the file specified with -lm, instead  of gathering
>               this  information  from  the command line options 
> above.  Each line in file starts with the filename containing the 
> component
>               LM, followed by zero or more component-specific options:
>
>               -weight W      the prior weight given to the component LM
>
>               -order N       the maximal ngram order to use
>
>               -type T        the LM type, one of ARPA (the default), 
> COUNTLM, MAXENT, LMCLIENT, or MSWEBLM
>
>               -classes C     the word class definitions for the 
> component LM (which must be of type ARPA)
>
>               -cache-served-ngrams
>                              enables client-side caching for LMs of 
> type LMCLIENT or MSWEBLM.
>
>               The global options -bayes, -bayes-scale, and 
> -context-priors still apply with -read-mix-lms.  When -bayes is NOT  
> used,  the
>               interpolation is static by ngram merging, and forces all 
> component LMs to be of type ARPA or MAXENT.
>
>        -cache length
Note the file from which the model specification is read must be given 
with the -lm file   option.

Andreas


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.speech.sri.com/pipermail/srilm-user/attachments/20170819/609abdbc/attachment.html>


More information about the SRILM-User mailing list