[SRILM User List] ngram-count's ARPA N-gram LM extensions beyond "\end\" marker
Andreas Stolcke
stolcke at icsi.berkeley.edu
Mon Jun 24 19:44:07 PDT 2013
On 6/24/2013 10:41 AM, Sander Maijers wrote:
>
> Based on the equations you described to me and the code, I do not see
> the fundamental difference with skip N-gram model and Jelinek-Mercer
> smoothing / deleted interpolation (Chen & Goodman, 1999, eqn. 4 p.
> 364). In the skip LM the skip probabilities substitute the lambda
> weights in the Jelinek-Mercer equation, and are estimated in the
> perhaps special way you explained. Is there something I miss?
Jelinek-Mercer is a way to smooth N-gram probabilities by combining
estimates based on different suffixes of the history, e.g.
p(w|w1 w2 w3) = l1 * p'(w|w1 w2 w3) + l2 * p'(w|w1 w2) + l3 * p'(w|w1)
+ l4 * p'(w) + l5 / N (N = size of vocabulary)
where p'(.) is a maximum-likelihood estimate.
In skip-Ngram modeling, by contrast, you combine different histories
that differ by skipping a word, e.g.
p(w | w1 w2 w3 w4) = l1 * p'(w | w1 w2 w3) + l2 * p'(w | w2 w3 w4)
where p'(.) now is smoothed estimate based.
The only similarity is that they both use linear interpolation of an
underlying probability estimator to arrive at a better estimator. That's
not saying much. Linear interpolation is extremely widely used in all
sorts of probability models.
Andreas
More information about the SRILM-User
mailing list