[SRILM User List] Cache model, ngram server

Tue Jun 21 10:35:27 PDT 2011

Simon Andersson wrote:
> (I'm posting this message to both the SRILM and Sphinx lists...)
>
> What I want to do is to construct language models that can change
> according to application context. The context sensitive LM could be built
> by interpolating one or two trigram models (e.g., general background model
> + domain model) and a small unigram model (the 'cache' model).
>
> Would it not make sense to use the SRILM server feature for this?
>   
There is some experimental code (not documented in the man page yet) to 
perform adaptive weighting of interpolated ngram LMs.

The class is AdaptiveMix (in AdaptiveMix.cc). A comment before the 
read() function documents the file format.

The ngram(1) options enabling its use are

{ OPT_TRUE, "adapt-mix", &adaptMix, "use adaptive mixture of n-grams 
model" },
{ OPT_FLOAT, "adapt-decay", &adaptDecay, "history likelihood decay 
factor" },
{ OPT_UINT, "adapt-iters", &adaptIters, "EM iterations for adaptive mix" },

What this does is it reestimates the mixture weight between LMs based on 
the history.
You could then also use the -cache option to add a unigram cache LM into 
the mix (but with a static mixture weight, given by -cache-lambda).

The issues about integration with the decoder raised by Nickolay sound 
more serious. I'm sorry I cannot help with those.

Andreas

> - Simon
>
>
>   
>> Ð’ ÐŸÐ½Ð´, 20/06/2011 Ð² 20:29 +0200, Simon Andersson Ð¿Ð¸ÑˆÐµÑ‚:
>>     
>>> Nickolay Shmyrev reports that he included the feature in Sphinx 4:
>>>
>>> http://nsh.nexiwave.com/2009/11/using-srilm-server-in-sphinx4.html
>>>
>>> (He also confirmed to me that it is not in PocketSphinx.)
>>>
>>> I'll use Nickolay's code as a reference when making a PocketSphinx
>>> version.
>>>       
>> Hello Simon
>>
>> If your goal is only to implement cache-based LM, using SRILM as a
>> server doesn't seem like an easy way and there are many important points
>> you need to care about:
>>
>> 1. During initialization stage decoder requests *all* unigram
>> probabilities to build lextree. You definitely don't want them to be in
>> a cache and you need to disable cache for initialization.
>>
>> 2. During the search the decoder stores unigram probabilities internally
>> in lextree. Most of the words are pruned before they reach leafs, so
>> cache on server will not help you since probabilities will be the same.
>> You need to adjust the weights inside the lextree.
>>
>> 3. You need to reset cache somehow
>>
>> Well, I suggest you to discuss this implementation thing on
>> cmusphinx-devel mailing list instead since this is not really a SRILM
>> issue.
>>
>>
>>     
>
>