[SRILM User List] Question about output of "ngram -ppl -debug 2" for class-based LM model

Mon Jan 24 00:10:50 PST 2011

Hello, all.

I made a class LM(bigram) and caluculated ppl of some testdata by this command in shell script, 

"ngram -order 2 -lm ${CLASS_LM_NAME} -ppl ${TEST} -debug 2 -classes ${CLASS_FILE}".

I can get output of -debug 2. A part of that is like this..

The term is generally applied to behavior within civil governments , but politics has been observed in other grou
p interactions , including corporate , academic , and religious institutions .
        p( The | <s> )  = [OOV][2gram] 0.00520962 [ -2.28319 ]
        p( term | The ...)      = [OOV][1gram][OOV][2gram] 0.000536365 [ -3.27054 ]
        p( is | term ...)       = [OOV][1gram][OOV][2gram] 0.0139987 [ -1.85391 ]
        p( generally | is ...)  = [OOV][1gram][OOV][2gram] 0.000171588 [ -3.76551 ]
        p( applied | generally ...)     = [OOV][1gram][OOV][2gram] 0.000122932 [ -3.91033 ]
        p( to | applied ...)    = [OOV][1gram][OOV][2gram] 0.0811208 [ -1.09087 ]
        p( behavior | to ...)   = [OOV][1gram][OOV][2gram] 6.12967e-05 [ -4.21256 ]
        p( within | behavior ...)       = [OOV][1gram][OOV][2gram] 0.000763519 [ -3.11718 ]
        p( civil | within ...)  = [OOV][1gram][OOV][2gram] 4.96081e-05 [ -4.30445 ]
        p( <unk> | civil ...)   = [1gram][1gram] 0.0156937 [ -1.80427 ]
        p( , | <unk> ...)       = [OOV][1gram] 0.0149661 [ -1.82489 ]
        p( but | , ...)         = [OOV][1gram][OOV][2gram] 0.00500311 [ -2.30076 ]
        p( politics | but ...)  = [OOV][1gram][OOV][2gram] 4.8048e-05 [ -4.31833 ]
        p( has | politics ...)  = [OOV][1gram][OOV][1gram] 0.000661878 [ -3.17922 ]
        p( been | has ...)      = [OOV][1gram][OOV][2gram] 0.00721624 [ -2.14169 ]
        p( observed | been ...)         = [OOV][1gram][OOV][1gram] 1.12884e-05 [ -4.94737 ]
        p( in | observed ...)   = [OOV][1gram][1gram][OOV][2gram][1gram] 0.0144335 [ -1.84063 ]
        p( other | in ...)      = [OOV][1gram][OOV][2gram][OOV][2gram] 0.00162061 [ -2.79032 ]
        p( group | other ...)   = [OOV][1gram][OOV][2gram] 0.000567602 [ -3.24596 ]
        p( <unk> | group ...)   = [1gram][1gram] 0.0150167 [ -1.82343 ]
        p( , | <unk> ...)       = [OOV][1gram] 0.0149661 [ -1.82489 ]
        p( including | , ...)   = [OOV][1gram][OOV][2gram] 0.000755534 [ -3.12175 ]
        p( corporate | including ...)   = [OOV][1gram][OOV][2gram] 5.59105e-05 [ -4.25251 ]
        p( , | corporate ...)   = [OOV][1gram][OOV][1gram] 0.0222226 [ -1.65321 ]
        p( academic | , ...)    = [OOV][1gram][OOV][2gram] 4.36976e-05 [ -4.35954 ]
        p( , | academic ...)    = [OOV][1gram][OOV][1gram] 0.0222226 [ -1.65321 ]
        p( and | , ...)         = [OOV][1gram][OOV][2gram] 0.0787025 [ -1.10401 ]
        p( religious | and ...)         = [OOV][1gram][OOV][2gram] 6.80949e-05 [ -4.16689 ]
        p( institutions | religious ...)        = [OOV][1gram][OOV][2gram] 0.000141801 [ -3.84832 ]
        p( . | institutions ...)        = [OOV][1gram][OOV][2gram] 0.0110882 [ -1.95514 ]
        p( </s> | . ...)        = [1gram][2gram] 0.979002 [ -0.00921631 ]
1 sentences, 30 words, 0 OOVs
0 zeroprobs, logprob= -85.9741 ppl= 593.414 ppl1= 734.18

I can understand how these probs were caluculated for most of the lines, but I can't analyze this line

  p( in | observed ...)   = [OOV][1gram][1gram][OOV][2gram][1gram] 0.0144335 [ -1.84063 ]

Will you tell me the meaning of this line? How this prob were caluculated from my class-based LM? 

-- 
Yasuo Suzuki
4th year undergrad at Shinoda Laboratory
Department of Computer Science
Tokyo Institute of Technology
suzuki at ks.cs.titech.ac.jp