[SRILM User List] linear interpolation of different vocabulary language models
Andreas Stolcke
stolcke at icsi.berkeley.edu
Tue Jan 8 23:34:39 PST 2013
On 1/8/2013 6:07 PM, Marta Ruiz wrote:
> Thanks Andreas, two more questions
>
>
> 1. Create a word-based version of each model. For example, you
> can construct a POS-based LM and combine it with a class
> membership mapping (in classes-format, see man page) to get a
> word-level POS-based model. Similar with lemma-based LMs (the
> lemmas are effectively word classes).
>
>
> which is the instruction to do this?
1. You create the class-to-word mapping file (in the format described
here
<http://www.speech.sri.com/projects/srilm/manpages/classes-format.5.html>)
to reflect either your POS-to-word or lemma-to-word mapping.
2. Process the training data to replace the words with POS or lemmas, as
appropriate.
3. Train the ngram portion of the LM by running ngram-count on the
training data represented as a sequence of POS tags / lemmas (from step 2).
> 2. Then interpolate the models using
>
> ngram -bayes 0 -lm LM1 -mix-lm LM2 -mix-lm2 LM3 .... -lambda
> ... -mix-lambda2 ... -classes CLASSES
>
> where CLASSES is a classes-format(5) file defining the union of
> all the word classes used in the various component models.
>
>
> to find the lambdas can I use the compute-best-mix, can't I?
Exactly.
Andreas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20130108/2e152197/attachment.html>
More information about the SRILM-User
mailing list