Entropy Pruning

Sun Nov 18 08:29:04 PST 2007

In message <473F18D1.5050201 at cslab.kecl.ntt.co.jp>you wrote:
> Dear Dr. Stolcke,
> 
> Hello, I'm Daichi Mochihashi, a researcher in NTT Communication Science
> Labs, Japan.
> Until recently, I was involved in the language modeling team
> at ATR Spoken Language Communication Research Laboratories, perhaps you may k
> now.
> 
> Lately, I developed a novel pruning method for variable-order ngrams
> and want to compare it with your entropy pruning as the Gold standard.
> However, in spite of the description in the SRILM paper, I found that
> the entropy pruning method is not implemented but replaced by
> a heuristic algorithm in the current SRILM distribution.

That is not correct.  The exact algorithm described in the paper is
implemented in Ngram::pruneProbs() in NgramLM.cc.  It is activated by
the ngram-count and ngram -prune options.

> 
> Is there any previous version of SRILM that supports entropy pruning?
> or could you kindly send me a version of VarNgram.cc or any code
> that you have used in the experiment?

VarNgram.cc was a research effort that performs pruning during the 
estimation step to eliminate redunant Ngrams from the start, using a
Hoeffding bound criterion, which happens not to work very well.

Note that the standard Ngram class supports "variable" N-gram models
already, since any mix of N-grams of different lengths is allowed.
So stick to the Ngram class, and do not use the ngram-count -varprune
option, which trigger the use of the VarNgram class.

I'm sorry that the naming of classes must have been confusing.

Andreas