[SRILM User List] linear interpolation of different vocabulary language models

Tue Jan 8 23:34:39 PST 2013

On 1/8/2013 6:07 PM, Marta Ruiz wrote:
> Thanks Andreas, two more questions
>
>
>     1. Create a word-based version of each model.  For example, you
>     can construct a POS-based LM and combine it with a class
>     membership mapping (in classes-format, see man page) to get a
>     word-level POS-based model.   Similar with lemma-based LMs (the
>     lemmas are effectively word classes).
>
>
> which is the instruction to do this?

1. You create the class-to-word mapping file (in the format described 
here 
<http://www.speech.sri.com/projects/srilm/manpages/classes-format.5.html>) 
to reflect either your POS-to-word or lemma-to-word mapping.
2. Process the training data to replace the words with POS or lemmas, as 
appropriate.
3. Train the ngram portion of the LM by running ngram-count on the 
training data represented as a sequence of POS tags / lemmas (from step 2).

>     2. Then interpolate the models using
>
>         ngram -bayes 0 -lm LM1 -mix-lm LM2 -mix-lm2 LM3 .... -lambda
>     ... -mix-lambda2 ... -classes CLASSES
>
>     where CLASSES is a classes-format(5) file defining the union of
>     all the word classes used in the various component models.
>
>
> to find the lambdas can I use the compute-best-mix, can't I?
Exactly.

Andreas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20130108/2e152197/attachment.html>