[SRILM User List] Cutoff, probabilities, and backoffs

Andreas Stolcke stolcke at icsi.berkeley.edu
Mon Dec 17 12:46:15 PST 2012


On 12/17/2012 1:41 AM, Mohammed Mediani wrote:
> Could anybody please tell me how the probabilities and the backoff 
> weights are computed in case we use -gtmin (with -kndiscount). 
> Following Chen's paper and the ngram-count man pages, I was unable to 
> reproduce the same results as ngram-count.

As I explained in a previous email, the -gtmin parameter doesn't change 
the way discounting is computed.  It just eliminates ngrams from the 
model AFTER you compute their probabilities.  Of course this frees up 
probability mass, which is then reallocated using the backoff mechanism 
(that is, the backoff weights change as a result).  You can think of the 
process in three steps, plus the 0th step that is particular to KN methods:

0. Replace the lower-order counts based on the ngram type frequencies 
(if you use the -write option you can save these modified counts to a 
file to see what the effect is).
1. compute discounts for each ngram, and then their probabilities (use 
ngram-count -debug 4 to get a detailed record of the quantities involved 
in this step)
2. remove ngrams due to the -gtmin (or entropy pruning criterion, if 
specified)
3. compute backoff weights (to normalize the model).

Andreas



More information about the SRILM-User mailing list