Adding-One smoothing

Andreas Stolcke stolcke at speech.sri.com
Thu Oct 18 12:11:48 PDT 2007



--Andreas

In message <20071018172737.GE1499 at die.upm.es>you wrote:
>  
> > Add-delta smoothing is implemented in the latest version of SRILM.
> > Try downloading the 1.5.4 (beta) version.  The options are 
> > 
> > 	-addsmooth d
> > 	-addsmooth1 d
> > 	-addsmooth2 d
> > 	etc.
> > 
> > where d is the constant to add to each count.
> 
> Thanks Prof. for this new release and your quick answer. I will test it.
> 
> > I'm not sure exactly what method you are asking about, but deleted
> > interpolation is implemented as the smoothing method used by the
> > ngram-count -count-lm option.  ngram -count-lm is used to evaluate such
> > an LM.  
> 
> currently the SW we have implements something like this:
> 
> P(w|h) = lambda_trig * P_3(w|h) + (1-lambda_trig)[lambda_big(P_2(w|h) + (1-la
> mbda_big)[lambda_unig(P(w) + (1-lambda_unig)P(zerogram)]]
> 
> In all cases, the probability is calculated using the adding-delta smoothing 
> technique. 

That is a combination of additive smoothing and deleted interpolation
that is not currently implemented in SRILM.

>  
> It is important to mention that in this equation, there is a global lambda_tr
> ig, lambda_big and lambda_unig values (i.e. this is like having just one bin,
>  not as proposed by Jelinek where there is a different lambda for different b
> ins). 
> 
> Previously, I had tried to use the -count-lm using the following configuratio
> n file:
> 
> order 3 
> vocabsize 1002 
> totalcount 74883 
> mixweights 0 
> 0.5 0.5 0.5 
> countmodulus 1 
> counts train.counts
> 
> and after applying the EM algorithm I obtained the following values:
> 
> order 3
> mixweights 0
>  0.932452 0.894774 0.994639
> countmodulus 1
> vocabsize 1002
> totalcount 74883
> counts train.counts
> 
> but my PPL results were not as good as using the SW we have. 
>  
> Is it something wrong with the configuration file? or the problem is related 
> with using Good-Turing instead of Adding-delta?

There is nothing wrong with it.  The difference is that in SRILM the
underlying probability estimates (as in standard deleted inteprolation)
are simple maximum likelihood estimates (without Good Turing smoothing).

It would be very straightforward to include optional add-delta smoothing
to the -count-lm model, since all the quantities needed are readily avaialable.
You just have to add some code to get the delta parameter from the LM
file (similar to what's already there for the other parameters) and modify
line 373 in NgramCountLM.cc to implement the add-delta formula.

If you do this please send me your changes!

Andreas 




More information about the SRILM-User mailing list