[SRILM User List] linear interpolation of different vocabulary language models
stolcke at icsi.berkeley.edu
Tue Jan 8 23:34:39 PST 2013
On 1/8/2013 6:07 PM, Marta Ruiz wrote:
> Thanks Andreas, two more questions
> 1. Create a word-based version of each model. For example, you
> can construct a POS-based LM and combine it with a class
> membership mapping (in classes-format, see man page) to get a
> word-level POS-based model. Similar with lemma-based LMs (the
> lemmas are effectively word classes).
> which is the instruction to do this?
1. You create the class-to-word mapping file (in the format described
to reflect either your POS-to-word or lemma-to-word mapping.
2. Process the training data to replace the words with POS or lemmas, as
3. Train the ngram portion of the LM by running ngram-count on the
training data represented as a sequence of POS tags / lemmas (from step 2).
> 2. Then interpolate the models using
> ngram -bayes 0 -lm LM1 -mix-lm LM2 -mix-lm2 LM3 .... -lambda
> ... -mix-lambda2 ... -classes CLASSES
> where CLASSES is a classes-format(5) file defining the union of
> all the word classes used in the various component models.
> to find the lambdas can I use the compute-best-mix, can't I?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the SRILM-User