# [Q] on mix-lm?

Anand Venkataraman anand at speech.sri.com
Wed Oct 2 22:49:49 PDT 2002

Dear Woosung,

>I am doing some experiments using interpolated LMs, and
>I've noticed that mixed LMs give slightly different
>PPLs from PPLs that should be. I mean, PPLs calculated
>by getting weighted sums after getting respective
>models' word probs.  Do you have any documentations or
>explanations how that 'mix-lm' works in your toolkit or
>how it is different from the correct way?

There is no one "correct way".  But I presume you mean
by that the unmixed estimation procedure.

mix-lm simply does \sum_i \lambda_i P(w_i|h_i) where
the probability is the backed-off ngram word level
probability.  You can in fact calculate this value by
hand quite easily from the individual ngram -ppl
outputs using the above expression.

However, there is a slight nuance involved.  One should
generally use lambdas that were estimated to maximize
the likelihood of some held out data in the domain.
The awk script compute-best-mix will do this for you.

You can also calculate a sentence level mixture
similarly interpolated with tuned weights (see
compute-best-sentence-mix).  This uses sentence level
probabilities (as for instance obtained from ngram
-debug 1 -ppl).

>experiments, mix-lm gives better results when the
>baseline model (before mixing) is good (PPL less than
>300), but it gives worse results when it is not good
>(PPL above 500).
>

Regardless of the quality of the lms, the mixed
likelihood on the held out set should alwasy be at
least as much as the likelihood of most likely
component likelihood becaues the EM procedure to
compute the best weights maximises this quantity.  Of
course the test set likelihood (and conseqnetly -PPL)
may not necessarily higher, but usually is.

hope this helps.

&