[SRILM User List] Why does -addsmooth still has discounting effects?
Andreas Stolcke
stolcke at icsi.berkeley.edu
Mon May 27 10:48:03 PDT 2013
On 5/27/2013 12:19 AM, 贺天行 wrote:
> The manual wrote:
> *-addsmooth*/ D/
> Smooth by adding /D /to each N-gram count. This is usually a poor
> smoothing method, included mainly for instructional purposes.
> /p/(/a/_/z/) = (/c/(/a/_/z/) +/D/) / (/c/(/a/_) +/D/ /n/(*))
> My script is:
> ngram-count -write allcnt -order 3 -debug 2 -text test_htx.dat
> -addsmooth 0 -lm lmtest
> The the debug wrote:
> test_htx.dat: line 3: 2 sentences, 6 words, 0 OOVs
> 0 zeroprobs, logprob= 0 ppl= 1 ppl1= 1
> using AddSmooth for 1-grams
> using AddSmooth for 2-grams
> using AddSmooth for 3-grams
> discarded 1 2-gram contexts containing pseudo-events
> discarded 2 3-gram contexts containing pseudo-events
> discarded 6 3-gram probs discounted to zero
> writing 6 1-grams
> writing 8 2-grams
> writing 0 3-grams
> So there's still discounting, I'm confused that why addsmooth still
> has discounting?
You also have to change that mincount parameter to include all trigrams,
even those that occur only once.
ngram-count -write allcnt -order 3 -debug 2 -text test_htx.dat
-addsmooth 0 *-gt3min 1* -lm lmtest
The default is -gt3min 2 .
Andreas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20130527/892b5e4f/attachment.html>
More information about the SRILM-User
mailing list