[SRILM User List] Question about output of "ngram -ppl -debug 2" for class-based LM model

Mon Jan 24 14:21:50 PST 2011

suzuki yasuo wrote:
> Hello, all.
>
> I made a class LM(bigram) and caluculated ppl of some testdata by this command in shell script, 
>
> "ngram -order 2 -lm ${CLASS_LM_NAME} -ppl ${TEST} -debug 2 -classes ${CLASS_FILE}".
>
> I can get output of -debug 2. A part of that is like this..
>  
>
> The term is generally applied to behavior within civil governments , but politics has been observed in other grou
> p interactions , including corporate , academic , and religious institutions .
>         p( The | <s> )  = [OOV][2gram] 0.00520962 [ -2.28319 ]
>         p( term | The ...)      = [OOV][1gram][OOV][2gram] 0.000536365 [ -3.27054 ]
>         p( is | term ...)       = [OOV][1gram][OOV][2gram] 0.0139987 [ -1.85391 ]
>         p( generally | is ...)  = [OOV][1gram][OOV][2gram] 0.000171588 [ -3.76551 ]
>         p( applied | generally ...)     = [OOV][1gram][OOV][2gram] 0.000122932 [ -3.91033 ]
>         p( to | applied ...)    = [OOV][1gram][OOV][2gram] 0.0811208 [ -1.09087 ]
>         p( behavior | to ...)   = [OOV][1gram][OOV][2gram] 6.12967e-05 [ -4.21256 ]
>         p( within | behavior ...)       = [OOV][1gram][OOV][2gram] 0.000763519 [ -3.11718 ]
>         p( civil | within ...)  = [OOV][1gram][OOV][2gram] 4.96081e-05 [ -4.30445 ]
>         p( <unk> | civil ...)   = [1gram][1gram] 0.0156937 [ -1.80427 ]
>         p( , | <unk> ...)       = [OOV][1gram] 0.0149661 [ -1.82489 ]
>         p( but | , ...)         = [OOV][1gram][OOV][2gram] 0.00500311 [ -2.30076 ]
>         p( politics | but ...)  = [OOV][1gram][OOV][2gram] 4.8048e-05 [ -4.31833 ]
>         p( has | politics ...)  = [OOV][1gram][OOV][1gram] 0.000661878 [ -3.17922 ]
>         p( been | has ...)      = [OOV][1gram][OOV][2gram] 0.00721624 [ -2.14169 ]
>         p( observed | been ...)         = [OOV][1gram][OOV][1gram] 1.12884e-05 [ -4.94737 ]
>         p( in | observed ...)   = [OOV][1gram][1gram][OOV][2gram][1gram] 0.0144335 [ -1.84063 ]
>         p( other | in ...)      = [OOV][1gram][OOV][2gram][OOV][2gram] 0.00162061 [ -2.79032 ]
>         p( group | other ...)   = [OOV][1gram][OOV][2gram] 0.000567602 [ -3.24596 ]
>         p( <unk> | group ...)   = [1gram][1gram] 0.0150167 [ -1.82343 ]
>         p( , | <unk> ...)       = [OOV][1gram] 0.0149661 [ -1.82489 ]
>         p( including | , ...)   = [OOV][1gram][OOV][2gram] 0.000755534 [ -3.12175 ]
>         p( corporate | including ...)   = [OOV][1gram][OOV][2gram] 5.59105e-05 [ -4.25251 ]
>         p( , | corporate ...)   = [OOV][1gram][OOV][1gram] 0.0222226 [ -1.65321 ]
>         p( academic | , ...)    = [OOV][1gram][OOV][2gram] 4.36976e-05 [ -4.35954 ]
>         p( , | academic ...)    = [OOV][1gram][OOV][1gram] 0.0222226 [ -1.65321 ]
>         p( and | , ...)         = [OOV][1gram][OOV][2gram] 0.0787025 [ -1.10401 ]
>         p( religious | and ...)         = [OOV][1gram][OOV][2gram] 6.80949e-05 [ -4.16689 ]
>         p( institutions | religious ...)        = [OOV][1gram][OOV][2gram] 0.000141801 [ -3.84832 ]
>         p( . | institutions ...)        = [OOV][1gram][OOV][2gram] 0.0110882 [ -1.95514 ]
>         p( </s> | . ...)        = [1gram][2gram] 0.979002 [ -0.00921631 ]
> 1 sentences, 30 words, 0 OOVs
> 0 zeroprobs, logprob= -85.9741 ppl= 593.414 ppl1= 734.18
>
> I can understand how these probs were caluculated for most of the lines, but I can't analyze this line
>
>   p( in | observed ...)   = [OOV][1gram][1gram][OOV][2gram][1gram] 0.0144335 [ -1.84063 ]
>
> Will you tell me the meaning of this line? How this prob were caluculated from my class-based LM? 
>   
Each term in brackets [OOV] [1gram] ... corresponds to one way to parse 
the the word as part of a class expansion, as as a plain word.
For example, you see

     p( The | <s> )  = [OOV][2gram] 0.00520962 [ -2.28319 ]

because first word could be generated by the LM as a bigram <s> The, or 
as <s> CLASS with "The" being a member of CLASS.
I suspect your LM doesn't contain "The" as a vocabulary item independent 
of CLASS, hence the first parse yields the [OOV] label.

One you get to the second word you have more ways to predict the next 
word, because now the history also has multiple parses.

In general, the predicted probabilities for all parses are added up to 
arrive at the total conditional probability.

So disable this type of processing (multiple parses) you can use the 
-simple-classes option, but that only works if word-class membership is 
unambiugous.

Andreas

 -classes newlabels+spell.classes

>
>
>