[SRILM User List] Cache model, ngram server

Andreas Stolcke stolcke at icsi.berkeley.edu
Tue Jun 21 10:35:27 PDT 2011

Simon Andersson wrote:
> (I'm posting this message to both the SRILM and Sphinx lists...)
> What I want to do is to construct language models that can change
> according to application context. The context sensitive LM could be built
> by interpolating one or two trigram models (e.g., general background model
> + domain model) and a small unigram model (the 'cache' model).
> Would it not make sense to use the SRILM server feature for this?
There is some experimental code (not documented in the man page yet) to 
perform adaptive weighting of interpolated ngram LMs.

The class is AdaptiveMix (in AdaptiveMix.cc). A comment before the 
read() function documents the file format.

The ngram(1) options enabling its use are

{ OPT_TRUE, "adapt-mix", &adaptMix, "use adaptive mixture of n-grams 
model" },
{ OPT_FLOAT, "adapt-decay", &adaptDecay, "history likelihood decay 
factor" },
{ OPT_UINT, "adapt-iters", &adaptIters, "EM iterations for adaptive mix" },

What this does is it reestimates the mixture weight between LMs based on 
the history.
You could then also use the -cache option to add a unigram cache LM into 
the mix (but with a static mixture weight, given by -cache-lambda).

The issues about integration with the decoder raised by Nickolay sound 
more serious. I'm sorry I cannot help with those.


> - Simon
>> В Пнд, 20/06/2011 в 20:29 +0200, Simon Andersson пишет:
>>> Nickolay Shmyrev reports that he included the feature in Sphinx 4:
>>> http://nsh.nexiwave.com/2009/11/using-srilm-server-in-sphinx4.html
>>> (He also confirmed to me that it is not in PocketSphinx.)
>>> I'll use Nickolay's code as a reference when making a PocketSphinx
>>> version.
>> Hello Simon
>> If your goal is only to implement cache-based LM, using SRILM as a
>> server doesn't seem like an easy way and there are many important points
>> you need to care about:
>> 1. During initialization stage decoder requests *all* unigram
>> probabilities to build lextree. You definitely don't want them to be in
>> a cache and you need to disable cache for initialization.
>> 2. During the search the decoder stores unigram probabilities internally
>> in lextree. Most of the words are pruned before they reach leafs, so
>> cache on server will not help you since probabilities will be the same.
>> You need to adjust the weights inside the lextree.
>> 3. You need to reset cache somehow
>> Well, I suggest you to discuss this implementation thing on
>> cmusphinx-devel mailing list instead since this is not really a SRILM
>> issue.

More information about the SRILM-User mailing list