[SRILM User List] Please help me understand the debug info of the -interpolate -kndiscount

贺天行 cloudygooseg at gmail.com
Wed May 29 06:07:57 PDT 2013


Hello, I'm trying to understand how does SRILM gives us the output in the
lm file, but I can not figure out how these numbers come from.

ngram-count -order 2 -gt1min 1 -gt2min 1 -gt3min 1 -text test_htx.dat
-write1 cnt1 -write2 cnt2 -write3 cnt3 -kndiscount1 -kndiscount2
-kndiscount3 -debug 5 -lm lmtest2
test_htx.dat: line 22: 22 sentences, 67 words, 0 OOVs
0 zeroprobs, logprob= 0 ppl= 1 ppl1= 1
using ModKneserNey for 1-grams
modifying 1-gram counts for Kneser-Ney smoothing
Kneser-Ney smoothing 1-grams
n1 = 2
n2 = 4
n3 = 4
n4 = 4
D1 = 0.2
D2 = 1.4
D3+ = 2.2
using ModKneserNey for 2-grams
Kneser-Ney smoothing 2-grams
n1 = 34
n2 = 10
n3 = 3
n4 = 3
D1 = 0.62963
D2 = 1.43333
D3+ = 0.481481
CONTEXT  WORD </s> NUMER 9 DENOM 52 DISCOUNT 0.755556 LPROB -0.883494
CONTEXT  WORD Alice NUMER 3 DENOM 52 DISCOUNT 0.266667 LPROB -1.81291
                                                               ........
In the lm file:
-99 <s> 0.1888525
-1.309463 Alice -0.02817659
                                                               .........
I'm trying to understand the line
CONTEXT  WORD Alice NUMER 3 DENOM 52 DISCOUNT 0.266667 LPROB -1.81291
I know the NUMBER 3 means
c(* Alice)=3
I can't figure out the other parameters, and how are they calculated, and
how are the result
-1.309463 Alice -0.02817659
calculated

I have referred to Chen's paper and SRILM ngram-discount manual, but I
still don't know what's going on

This is my cnt1 file
<s> 22
</s> 9
Alice 3
loves 4
Bob 2
also 3
Kai 2
KaiKai 3
KK 3
hates 2
YY 5
Miss 4
MM 1
b3 4
a3 4
c3 1
d3 2

Thank you very much.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20130529/c4819788/attachment.html>


More information about the SRILM-User mailing list