[SRILM User List] Question about output of "ngram -ppl -debug 2" for class-based LM model
suzuki yasuo
suzuki at ks.cs.titech.ac.jp
Mon Jan 24 00:10:50 PST 2011
Hello, all.
I made a class LM(bigram) and caluculated ppl of some testdata by this command in shell script,
"ngram -order 2 -lm ${CLASS_LM_NAME} -ppl ${TEST} -debug 2 -classes ${CLASS_FILE}".
I can get output of -debug 2. A part of that is like this..
The term is generally applied to behavior within civil governments , but politics has been observed in other grou
p interactions , including corporate , academic , and religious institutions .
p( The | <s> ) = [OOV][2gram] 0.00520962 [ -2.28319 ]
p( term | The ...) = [OOV][1gram][OOV][2gram] 0.000536365 [ -3.27054 ]
p( is | term ...) = [OOV][1gram][OOV][2gram] 0.0139987 [ -1.85391 ]
p( generally | is ...) = [OOV][1gram][OOV][2gram] 0.000171588 [ -3.76551 ]
p( applied | generally ...) = [OOV][1gram][OOV][2gram] 0.000122932 [ -3.91033 ]
p( to | applied ...) = [OOV][1gram][OOV][2gram] 0.0811208 [ -1.09087 ]
p( behavior | to ...) = [OOV][1gram][OOV][2gram] 6.12967e-05 [ -4.21256 ]
p( within | behavior ...) = [OOV][1gram][OOV][2gram] 0.000763519 [ -3.11718 ]
p( civil | within ...) = [OOV][1gram][OOV][2gram] 4.96081e-05 [ -4.30445 ]
p( <unk> | civil ...) = [1gram][1gram] 0.0156937 [ -1.80427 ]
p( , | <unk> ...) = [OOV][1gram] 0.0149661 [ -1.82489 ]
p( but | , ...) = [OOV][1gram][OOV][2gram] 0.00500311 [ -2.30076 ]
p( politics | but ...) = [OOV][1gram][OOV][2gram] 4.8048e-05 [ -4.31833 ]
p( has | politics ...) = [OOV][1gram][OOV][1gram] 0.000661878 [ -3.17922 ]
p( been | has ...) = [OOV][1gram][OOV][2gram] 0.00721624 [ -2.14169 ]
p( observed | been ...) = [OOV][1gram][OOV][1gram] 1.12884e-05 [ -4.94737 ]
p( in | observed ...) = [OOV][1gram][1gram][OOV][2gram][1gram] 0.0144335 [ -1.84063 ]
p( other | in ...) = [OOV][1gram][OOV][2gram][OOV][2gram] 0.00162061 [ -2.79032 ]
p( group | other ...) = [OOV][1gram][OOV][2gram] 0.000567602 [ -3.24596 ]
p( <unk> | group ...) = [1gram][1gram] 0.0150167 [ -1.82343 ]
p( , | <unk> ...) = [OOV][1gram] 0.0149661 [ -1.82489 ]
p( including | , ...) = [OOV][1gram][OOV][2gram] 0.000755534 [ -3.12175 ]
p( corporate | including ...) = [OOV][1gram][OOV][2gram] 5.59105e-05 [ -4.25251 ]
p( , | corporate ...) = [OOV][1gram][OOV][1gram] 0.0222226 [ -1.65321 ]
p( academic | , ...) = [OOV][1gram][OOV][2gram] 4.36976e-05 [ -4.35954 ]
p( , | academic ...) = [OOV][1gram][OOV][1gram] 0.0222226 [ -1.65321 ]
p( and | , ...) = [OOV][1gram][OOV][2gram] 0.0787025 [ -1.10401 ]
p( religious | and ...) = [OOV][1gram][OOV][2gram] 6.80949e-05 [ -4.16689 ]
p( institutions | religious ...) = [OOV][1gram][OOV][2gram] 0.000141801 [ -3.84832 ]
p( . | institutions ...) = [OOV][1gram][OOV][2gram] 0.0110882 [ -1.95514 ]
p( </s> | . ...) = [1gram][2gram] 0.979002 [ -0.00921631 ]
1 sentences, 30 words, 0 OOVs
0 zeroprobs, logprob= -85.9741 ppl= 593.414 ppl1= 734.18
I can understand how these probs were caluculated for most of the lines, but I can't analyze this line
p( in | observed ...) = [OOV][1gram][1gram][OOV][2gram][1gram] 0.0144335 [ -1.84063 ]
Will you tell me the meaning of this line? How this prob were caluculated from my class-based LM?
--
Yasuo Suzuki
4th year undergrad at Shinoda Laboratory
Department of Computer Science
Tokyo Institute of Technology
suzuki at ks.cs.titech.ac.jp
More information about the SRILM-User
mailing list