Andreas Stolcke stolcke at
Wed Jul 24 09:01:30 PDT 2002


your email to the list did not go through because at the time you
sent it you were not subscribed to the list (to prevent spam we 
only allow list members to post).

Regarding your question: indeed the perplexity of the mixed LM should
be much closer to what compute-best-mix outputs.

There are two ways to create an interpolated model:

"on-the-fly" 	this is the traditional approach: you keep the component
		models separate, and compute the interpolated probabilties
		when you evaluate the model

		The command for this is 

			ngram -lm ... -mix-lm ... -lambda L -bayes 0

"merged"	you create a single static model that implements an
		approximation to the on-the-fly method

		The command for this is 

			ngram -lm ... -mix-lm ... -lambda L 

		(no -bayes option).
		The -write-lm option outputs the merged model if desired.

In the "merged" case you only get an approximation because in general
it is not possible to create a single back-off model that exactly
implements the mixed probabilties of the two component models (without
expanding out all possible N-grams and effectively bypassing the 
backoff mechanism).

As explained in the ICSLP paper, the "merged" approach is usually slightly
better than the traditional interpolation.  However, it only works if 
you have two models of the same type (both word-based or both class-based).
When you merge a word-based and a class-based model the approximation 
doesn't work anymore.  I suspect that's what you did in your experiment.
Rerun ngram with the -bayes 0 option and see if you get the perplexity 
you expect.


In message <002101c2332c$9d6cb540$7b081b93 at>you wrote:
> Hi Andreas,
> Before five days I sent an e-mail at srilm-user at 
> and I still haven't receive an answer.
> I repeat it here. Please inform me...
> Hi
> I interpolate a 3-gram with a class 3-gram
> The output of compute-best-mix is:
> compute-best-mix debug2-LM1 debug2-LM2
> iteration 19, lambda = (0.849536 0.150464), ppl = 150.787
> The PP of the interpolated model at the held-out data I used to take 
> debug2-LM1 and debug2-LM2 is 169.52
> This ain't to be the same with the output of compute-best-mix, eg 150.787?
> Do I something wrong?
> Regards,
> Dimitris

More information about the SRILM-User mailing list