[SRILM User List] Why does -addsmooth still has discounting effects?

Mon May 27 10:48:03 PDT 2013

On 5/27/2013 12:19 AM, 贺天行 wrote:
> The manual wrote:
> *-addsmooth*/ D/
>     Smooth by adding /D /to each N-gram count. This is usually a poor
>     smoothing method, included mainly for instructional purposes.
>     	/p/(/a/_/z/) = (/c/(/a/_/z/) +/D/) / (/c/(/a/_) +/D/  /n/(*))
> My script is:
>  ngram-count -write allcnt -order 3 -debug 2 -text test_htx.dat 
> -addsmooth 0 -lm lmtest
> The the debug wrote:
> test_htx.dat: line 3: 2 sentences, 6 words, 0 OOVs
> 0 zeroprobs, logprob= 0 ppl= 1 ppl1= 1
> using AddSmooth for 1-grams
> using AddSmooth for 2-grams
> using AddSmooth for 3-grams
> discarded 1 2-gram contexts containing pseudo-events
> discarded 2 3-gram contexts containing pseudo-events
> discarded 6 3-gram probs discounted to zero
> writing 6 1-grams
> writing 8 2-grams
> writing 0 3-grams
> So there's still discounting, I'm confused that why addsmooth still 
> has discounting?

You also have to change that mincount parameter to include all trigrams, 
even those that occur only once.

ngram-count -write allcnt -order 3 -debug 2 -text test_htx.dat 
-addsmooth 0 *-gt3min 1*  -lm lmtest

The default is -gt3min 2 .

Andreas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20130527/892b5e4f/attachment.html>