[SRILM User List] Count-lm reference request

E otheremailid at aol.com
Mon Sep 30 22:46:33 PDT 2013


I'm trying to understand the meaning of "google.count.lm0" file as given in FAQ section on creating LM from Web1T corpus. From what I read in Sec 11.4.1 Deleted Interpolation Smoothing in Spoken Language Processing, by Huang et al. 
(equation 11.22) bigram case

P(w_i | w_{i-1}) = \lambda * P_{MLE}(w_i | w_{i-1}) + (1 - \lambda) * P(w_i)

They call \lambda's as the mixture weights. I wonder if they are conceptually the same as the ones used in google.countlm. If so why are they arranged in a 15x5 matrix? Where can I read more about the same? 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131001/66bf4519/attachment.html>

More information about the SRILM-User mailing list