[SRILM User List] Count-lm reference request
Andreas Stolcke
stolcke at icsi.berkeley.edu
Tue Oct 1 10:24:41 PDT 2013
On 9/30/2013 10:46 PM, E wrote:
> Hello,
>
> I'm trying to understand the meaning of "google.count.lm0" file as
> given in FAQ section on creating LM from Web1T corpus. From what I
> read in Sec 11.4.1 Deleted Interpolation Smoothing in Spoken Language
> Processing, by Huang et al.
> (equation 11.22) bigram case
>
> P(w_i | w_{i-1}) = \lambda * P_{MLE}(w_i | w_{i-1}) + (1 - \lambda) *
> P(w_i)
>
> They call \lambda's as the mixture weights. I wonder if they are
> conceptually the same as the ones used in google.countlm. If so why
> are they arranged in a 15x5 matrix? Where can I read more about the same?
I don't have access to the book chapter you cite, but from the equation
it looks like a single fixed interpolation weight is used.
In the SRILM count-lm implementation you have separate lambdas assigned
to different groups of context ngrams, as a function of the frequency of
those contexts. This is what is called "Jelinek-Mercer" smoothing in
http://acl.ldc.upenn.edu/P/P96/P96-1041.pdf , where the bucketing of the
contexts is done based on frequency (as suggested in the paper). The
specifics are spelled out in the ngram(1) man page. The relevant bits are:
mixweights M
w01 w02 ... w0N
w11 w12 ... w1N
...
wM1 wM2 ... wMN
countmodulus m
M specifies the number of mixture weight bins (minus
1). m is
the width of a mixture weight bin. Thus, wij is the
mixture weight used to interpolate an j-th order
maximum-likelihood estimate with lower-order estimates
given that the (j-1)-gram context has been seen
with a frequency between i*m and (i+1)*m-1 times. (For
contexts with frequency greater than M*m, the
i=M weights are used.)
Andreas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131001/1803f045/attachment.html>
More information about the SRILM-User
mailing list