[SRILM User List] linear interpolation of different vocabulary language models
stolcke at icsi.berkeley.edu
Tue Jan 8 12:15:57 PST 2013
On 1/8/2013 3:15 AM, Marta Ruiz wrote:
> Dear all,
> How can I interpolate language models built on the same text but with
> different vocabularies. I mean, I have a text with words, lemmas and PoS,
> how can I interpolate the language models.
You cannot interpolate models that use different types of vocabularies.
(You could interpolate models that are all word-based but where there
are differences in the sets of words occurring in the component models.
The words that are not occurring in some submodel would implicitly have
probability zero in that submodel).
So what you need to do is:
1. Create a word-based version of each model. For example, you can
construct a POS-based LM and combine it with a class membership mapping
(in classes-format, see man page) to get a word-level POS-based model.
Similar with lemma-based LMs (the lemmas are effectively word classes).
2. Then interpolate the models using
ngram -bayes 0 -lm LM1 -mix-lm LM2 -mix-lm2 LM3 .... -lambda ...
-mix-lambda2 ... -classes CLASSES
where CLASSES is a classes-format(5) file defining the union of all the
word classes used in the various component models.
More information about the SRILM-User