[SRILM User List] Inconsistency between mix-lm and compute-best-mix ?

Anoop Deoras adeoras at jhu.edu
Fri Apr 29 12:47:43 PDT 2011


   I am trying to interpolate two LMs and I see inconsistency in the  
outputs when 2 different methods are used
for interpolation.

I will explain my setup :

I have two LMs: LM1 and LM2 and I have a text corpus TEXT

Step 1: produce debug file using ngram tool with debug=2 option using  
LM1 and LM2.
Lets call them DEBUG1 and DEBUG2

	ngram -lm LM1 -order 4 -unk -vocab VOCAB -ppl TEXT -debug 2 > DEBUG1
	ngram -lm LM2 -order 4 -unk -vocab VOCAB -ppl TEXT -debug 2 > DEBUG2

Step 2: Get the optimal weights using the command:
	compute-best-mix DEBUG1 DEBUG2
	Let the final best perplexity obtained be denoted as PPL_Step2
	Let the weights be LAMBDA, 1-LAMBDA
	Thus LAMBDA corresponds to LM1.

Step3 : Combine LM1 and LM2 linearly with the weights found above and  
compute the PPL
	ngram -lm LM1 -order 4 -unk -vocab VOCAB -ppl TEXT -mix-lm LM2 - 
lambda LAMBDA
	Let the perplexity obtained be denoted as PPL_Step3

For my setup, PPL_Step3 turns out to be greater than PPL_Step2 and I  
don't understand why ?
Am I missing something while combining the models ?
Any pointers would be useful.

Thanks and Regards

