[SRILM User List] A problem with expanding class-based LMs

Fri Dec 16 01:47:57 PST 2011

Hi Andreas,

I have a class-based LM, which gives a particular perplexity value on 
the test set:

ngram -ppl test.fold3.txt -lm 2-gram.class.dd150.fold3.lm -classes 
class.dd150.fold3.defs -order 2 -vocab ../all.wlist

file test.fold3.txt: 1397 sentences, 37403 words, 0 OOVs
427 zeroprobs, logprob= -72617.1 ppl= 78.0551 ppl1= 92.0235

I expanded it and got a word-level model:

ngram -lm 2-gram.class.dd150.fold3.lm -classes class.dd150.fold3.defs 
-order 2 -write-lm 2-gram.class.dd150.expanded_exact.fold3.lm 
-expand-classes 2 -expand-exact 2 -vocab ../all.wlist

But the new model provides different result:

ngram -ppl test.fold3.txt -lm 2-gram.class.dd150.expanded_exact.fold3.lm 
-order 2 -vocab ../all.wlist

file test.fold3.txt: 1397 sentences, 37403 words, 0 OOVs
0 zeroprobs, logprob= -78108.4 ppl= 103.063 ppl1= 122.544

You can see there is no more zeroprobs in the new one, which .affects 
the perplexity.

I can show you detailed output from both models:

Class-based:

<s> gruess gott frau traub </s>
         p( gruess | <s> )       = [OOV][2gram] 0.0167159 [ -1.77687 ]
         p( gott | gruess ...)   = [OOV][1gram][OOV][2gram] 0.658525 [ 
-0.181428 ]
         p( frau | gott ...)     = [OOV][1gram][OOV][2gram] 0.119973 [ 
-0.920917 ]
         p( traub | frau ...)    = [OOV][OOV] 0 [ -inf ]
         p( </s> | traub ...)    = [1gram] 0.0377397 [ -1.4232 ]
1 sentences, 4 words, 0 OOVs
1 zeroprobs, logprob= -4.30242 ppl= 11.9016 ppl1= 27.1731

And the same sentence with expanded LM:

<s> gruess gott frau traub </s>
         p( gruess | <s> )       = [2gram] 0.0167159 [ -1.77687 ]
         p( gott | gruess ...)   = [2gram] 0.658525 [ -0.181428 ]
         p( frau | gott ...)     = [2gram] 0.119973 [ -0.920917 ]
         p( traub | frau ...)    = [1gram] 3.84699e-14 [ -13.4149 ]
         p( </s> | traub ...)    = [1gram] 0.0377397 [ -1.4232 ]
1 sentences, 4 words, 0 OOVs
0 zeroprobs, logprob= -17.7173 ppl= 3495.1 ppl1= 26873.5

 From my point of view it looks like a computational error, such a small 
probabilities should be treated as zero.
BTW, how can zero probabilities appear there? They should be smoothed, 
right?

I divided my corpus on 10 folds and performed these actions on all of 
them. With 6 folds everything is fine, perplexities are almost the same 
for both models, but with other 4 parts I have such a problem.

I would be greatly appreciated for any help.

Sincerely yours,
Dmytro Prylipko.