[SRILM User List] Cache model, ngram server
Andreas Stolcke
stolcke at icsi.berkeley.edu
Tue Jun 21 10:35:27 PDT 2011
Simon Andersson wrote:
> (I'm posting this message to both the SRILM and Sphinx lists...)
>
> What I want to do is to construct language models that can change
> according to application context. The context sensitive LM could be built
> by interpolating one or two trigram models (e.g., general background model
> + domain model) and a small unigram model (the 'cache' model).
>
> Would it not make sense to use the SRILM server feature for this?
>
There is some experimental code (not documented in the man page yet) to
perform adaptive weighting of interpolated ngram LMs.
The class is AdaptiveMix (in AdaptiveMix.cc). A comment before the
read() function documents the file format.
The ngram(1) options enabling its use are
{ OPT_TRUE, "adapt-mix", &adaptMix, "use adaptive mixture of n-grams
model" },
{ OPT_FLOAT, "adapt-decay", &adaptDecay, "history likelihood decay
factor" },
{ OPT_UINT, "adapt-iters", &adaptIters, "EM iterations for adaptive mix" },
What this does is it reestimates the mixture weight between LMs based on
the history.
You could then also use the -cache option to add a unigram cache LM into
the mix (but with a static mixture weight, given by -cache-lambda).
The issues about integration with the decoder raised by Nickolay sound
more serious. I'm sorry I cannot help with those.
Andreas
> - Simon
>
>
>
>> В Пнд, 20/06/2011 в 20:29 +0200, Simon Andersson пишет:
>>
>>> Nickolay Shmyrev reports that he included the feature in Sphinx 4:
>>>
>>> http://nsh.nexiwave.com/2009/11/using-srilm-server-in-sphinx4.html
>>>
>>> (He also confirmed to me that it is not in PocketSphinx.)
>>>
>>> I'll use Nickolay's code as a reference when making a PocketSphinx
>>> version.
>>>
>> Hello Simon
>>
>> If your goal is only to implement cache-based LM, using SRILM as a
>> server doesn't seem like an easy way and there are many important points
>> you need to care about:
>>
>> 1. During initialization stage decoder requests *all* unigram
>> probabilities to build lextree. You definitely don't want them to be in
>> a cache and you need to disable cache for initialization.
>>
>> 2. During the search the decoder stores unigram probabilities internally
>> in lextree. Most of the words are pruned before they reach leafs, so
>> cache on server will not help you since probabilities will be the same.
>> You need to adjust the weights inside the lextree.
>>
>> 3. You need to reset cache somehow
>>
>> Well, I suggest you to discuss this implementation thing on
>> cmusphinx-devel mailing list instead since this is not really a SRILM
>> issue.
>>
>>
>>
>
>
More information about the SRILM-User
mailing list