[SRILM User List] Cutoff, probabilities, and backoffs

Mohammed Mediani medmediani at gmail.com
Mon Dec 17 13:52:18 PST 2012


Thank you very much Andreas,
In fact, I have done all what you have just suggested.
- Modify the counts
- Compute smoothing parameters (discount constants)
- Compute the probabilities
- Remove the rare ngrams according to gtmin
- Compute the backoffs.

I get the exact numbers for both probabilities and backoffs if no gtmin
specified. But in the presence of cutoffs, I get a bit different numbers
(e.g if gt3min=2 I get slightly different backoffs for 2-grams). I thought
I did something wrong, since I still can't get the Backoffs correctly. If
there is no special attention to be paid to different cases, the I just
need to  look more into it.

Once again, many many thanks for your kind help and great cooperation.
Mohammed

On Mon, Dec 17, 2012 at 9:46 PM, Andreas Stolcke
<stolcke at icsi.berkeley.edu>wrote:

> On 12/17/2012 1:41 AM, Mohammed Mediani wrote:
>
>> Could anybody please tell me how the probabilities and the backoff
>> weights are computed in case we use -gtmin (with -kndiscount). Following
>> Chen's paper and the ngram-count man pages, I was unable to reproduce the
>> same results as ngram-count.
>>
>
> As I explained in a previous email, the -gtmin parameter doesn't change
> the way discounting is computed.  It just eliminates ngrams from the model
> AFTER you compute their probabilities.  Of course this frees up probability
> mass, which is then reallocated using the backoff mechanism (that is, the
> backoff weights change as a result).  You can think of the process in three
> steps, plus the 0th step that is particular to KN methods:
>
> 0. Replace the lower-order counts based on the ngram type frequencies (if
> you use the -write option you can save these modified counts to a file to
> see what the effect is).
> 1. compute discounts for each ngram, and then their probabilities (use
> ngram-count -debug 4 to get a detailed record of the quantities involved in
> this step)
> 2. remove ngrams due to the -gtmin (or entropy pruning criterion, if
> specified)
> 3. compute backoff weights (to normalize the model).
>
> Andreas
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20121217/4b49bbd4/attachment.html>


More information about the SRILM-User mailing list