[SRILM User List] ppl output from ngram interpret

jian zhang zhangj at computing.dcu.ie
Wed Apr 16 03:20:28 PDT 2014


Hi Andreas,

I am confused about the ppl output from ngram.
The following are the outputs from two sentences,

resumption of the session
p( resumption | <s> ) = [1gram] 6.41856e-07 [ -6.19256 ]
p( of | resumption ...) = [2gram] 0.547254 [ -0.261811 ]
*p( the | of ...) = [2gram] 0.0826684 [ -1.08266 ]*
p( session | the ...) = [1gram] 1.21666e-06 [ -5.91483 ]
p( </s> | session ...) = [1gram] 0.00150439 [ -2.82264 ]
1 sentences, 4 words, 0 OOVs
0 zeroprobs, logprob= -16.2745 ppl= 1798.46 ppl1= 11711.9
4 words, rank1= 0.25 rank5= 0.5 rank10= 0.5
5 words+sents, rank1wSent= 0.2 rank5wSent= 0.4 rank10wSent= 0.4 qloss=
0.899274 absloss= 0.873714

you have requested a debate on this subject in the course of the next few
days , during this part-session .
p( you | <s> ) = [2gram] 0.000716442 [ -3.14482 ]
p( have | you ...) = [2gram] 0.0179397 [ -1.74618 ]
p( requested | have ...) = [1gram] 6.43992e-06 [ -5.19112 ]
p( a | requested ...) = [1gram] 0.00378035 [ -2.42247 ]
p( debate | a ...) = [2gram] 0.000358849 [ -3.44509 ]
p( on | debate ...) = [2gram] 0.0598839 [ -1.22269 ]
p( this | on ...) = [2gram] 0.00443142 [ -2.35346 ]
p( subject | this ...) = [2gram] 9.54276e-05 [ -4.02033 ]
p( in | subject ...) = [2gram] 0.0436281 [ -1.36023 ]
p( the | in ...) = [2gram] 0.147714 [ -0.830578 ]
p( course | the ...) = [3gram] 0.00139691 [ -2.85483 ]
p( of | course ...) = [3gram] 0.579381 [ -0.237035 ]
*p( the | of ...) = [2gram] 0.0762541 [ -1.11774 ]*
p( next | the ...) = [3gram] 0.00123622 [ -2.9079 ]
p( few | next ...) = [3gram] 0.0245328 [ -1.61025 ]
p( days | few ...) = [2gram] 0.00340647 [ -2.46769 ]
p( , | days ...) = [2gram] 0.15756 [ -0.802555 ]
p( during | , ...) = [2gram] 0.000749831 [ -3.12504 ]
p( this | during ...) = [3gram] 0.0352358 [ -1.45302 ]
p( <unk> | this ...) = [1gram] 9.0905e-07 [ -6.04141 ]
p( . | <unk> ...) = [1gram] 0.0254746 [ -1.59389 ]
p( </s> | . ...) = [2gram] 0.809733 [ -0.091658 ]
1 sentences, 21 words, 0 OOVs
0 zeroprobs, logprob= -50.04 ppl= 188.168 ppl1= 241.466
21 words, rank1= 0.142857 rank5= 0.428571 rank10= 0.47619
22 words+sents, rank1wSent= 0.181818 rank5wSent= 0.454545 rank10wSent= 0.5
qloss= 0.930912 absloss= 0.909386

My two questions:
1. There are 2-gram p( the | of ...) computed from both sentences, why they
have different probability (first sentence gives 0.0826684, second sentence
gives 0.0762541)?
2. Is there a parameter setting for ngram which is able to print out the
actual tokens instead of ellipsis.

Thanks,

Jian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20140416/1fe95408/attachment.html>


More information about the SRILM-User mailing list