[SRILM User List] Cache model, ngram server

Mon Jun 20 12:49:12 PDT 2011

(I'm posting this message to both the SRILM and Sphinx lists...)

What I want to do is to construct language models that can change
according to application context. The context sensitive LM could be built
by interpolating one or two trigram models (e.g., general background model
+ domain model) and a small unigram model (the 'cache' model).

Would it not make sense to use the SRILM server feature for this?

- Simon

> Ð’ ÐŸÐ½Ð´, 20/06/2011 Ð² 20:29 +0200, Simon Andersson Ð¿Ð¸ÑˆÐµÑ‚:
>> Nickolay Shmyrev reports that he included the feature in Sphinx 4:
>>
>> http://nsh.nexiwave.com/2009/11/using-srilm-server-in-sphinx4.html
>>
>> (He also confirmed to me that it is not in PocketSphinx.)
>>
>> I'll use Nickolay's code as a reference when making a PocketSphinx
>> version.
>
> Hello Simon
>
> If your goal is only to implement cache-based LM, using SRILM as a
> server doesn't seem like an easy way and there are many important points
> you need to care about:
>
> 1. During initialization stage decoder requests *all* unigram
> probabilities to build lextree. You definitely don't want them to be in
> a cache and you need to disable cache for initialization.
>
> 2. During the search the decoder stores unigram probabilities internally
> in lextree. Most of the words are pruned before they reach leafs, so
> cache on server will not help you since probabilities will be the same.
> You need to adjust the weights inside the lextree.
>
> 3. You need to reset cache somehow
>
> Well, I suggest you to discuss this implementation thing on
> cmusphinx-devel mailing list instead since this is not really a SRILM
> issue.
>
>