[SRILM User List] Cutoff, probabilities, and backoffs
Andreas Stolcke
stolcke at icsi.berkeley.edu
Mon Dec 17 12:46:15 PST 2012
On 12/17/2012 1:41 AM, Mohammed Mediani wrote:
> Could anybody please tell me how the probabilities and the backoff
> weights are computed in case we use -gtmin (with -kndiscount).
> Following Chen's paper and the ngram-count man pages, I was unable to
> reproduce the same results as ngram-count.
As I explained in a previous email, the -gtmin parameter doesn't change
the way discounting is computed. It just eliminates ngrams from the
model AFTER you compute their probabilities. Of course this frees up
probability mass, which is then reallocated using the backoff mechanism
(that is, the backoff weights change as a result). You can think of the
process in three steps, plus the 0th step that is particular to KN methods:
0. Replace the lower-order counts based on the ngram type frequencies
(if you use the -write option you can save these modified counts to a
file to see what the effect is).
1. compute discounts for each ngram, and then their probabilities (use
ngram-count -debug 4 to get a detailed record of the quantities involved
in this step)
2. remove ngrams due to the -gtmin (or entropy pruning criterion, if
specified)
3. compute backoff weights (to normalize the model).
Andreas
More information about the SRILM-User
mailing list