[SRILM User List] A problem with expanding class-based LMs

Wed Dec 21 17:03:51 PST 2011

My guess is that your class definitions contain multiple words per 
expansion, such as "GREETING"  expanding to "gruess gott".    In that 
case a bigram expansion of the LM will not have as much predictive power 
as the original class bigram LM.
Try using -expand-classes 3 (or even higher).

Andreas

Dmytro Prylipko wrote:
> Hi Andreas,
>
> I have a class-based LM, which gives a particular perplexity value on 
> the test set:
>
> ngram -ppl test.fold3.txt -lm 2-gram.class.dd150.fold3.lm -classes 
> class.dd150.fold3.defs -order 2 -vocab ../all.wlist
>
> file test.fold3.txt: 1397 sentences, 37403 words, 0 OOVs
> 427 zeroprobs, logprob= -72617.1 ppl= 78.0551 ppl1= 92.0235
>
> I expanded it and got a word-level model:
>
> ngram -lm 2-gram.class.dd150.fold3.lm -classes class.dd150.fold3.defs 
> -order 2 -write-lm 2-gram.class.dd150.expanded_exact.fold3.lm 
> -expand-classes 2 -expand-exact 2 -vocab ../all.wlist
>
>
> But the new model provides different result:
>
> ngram -ppl test.fold3.txt -lm 
> 2-gram.class.dd150.expanded_exact.fold3.lm -order 2 -vocab ../all.wlist
>
> file test.fold3.txt: 1397 sentences, 37403 words, 0 OOVs
> 0 zeroprobs, logprob= -78108.4 ppl= 103.063 ppl1= 122.544
>
> You can see there is no more zeroprobs in the new one, which .affects 
> the perplexity.
>
>
> I can show you detailed output from both models:
>
> Class-based:
>
> <s> gruess gott frau traub </s>
>         p( gruess | <s> )       = [OOV][2gram] 0.0167159 [ -1.77687 ]
>         p( gott | gruess ...)   = [OOV][1gram][OOV][2gram] 0.658525 [ 
> -0.181428 ]
>         p( frau | gott ...)     = [OOV][1gram][OOV][2gram] 0.119973 [ 
> -0.920917 ]
>         p( traub | frau ...)    = [OOV][OOV] 0 [ -inf ]
>         p( </s> | traub ...)    = [1gram] 0.0377397 [ -1.4232 ]
> 1 sentences, 4 words, 0 OOVs
> 1 zeroprobs, logprob= -4.30242 ppl= 11.9016 ppl1= 27.1731
>
>
> And the same sentence with expanded LM:
>
> <s> gruess gott frau traub </s>
>         p( gruess | <s> )       = [2gram] 0.0167159 [ -1.77687 ]
>         p( gott | gruess ...)   = [2gram] 0.658525 [ -0.181428 ]
>         p( frau | gott ...)     = [2gram] 0.119973 [ -0.920917 ]
>         p( traub | frau ...)    = [1gram] 3.84699e-14 [ -13.4149 ]
>         p( </s> | traub ...)    = [1gram] 0.0377397 [ -1.4232 ]
> 1 sentences, 4 words, 0 OOVs
> 0 zeroprobs, logprob= -17.7173 ppl= 3495.1 ppl1= 26873.5
>
>
> From my point of view it looks like a computational error, such a 
> small probabilities should be treated as zero.
> BTW, how can zero probabilities appear there? They should be smoothed, 
> right?
>
> I divided my corpus on 10 folds and performed these actions on all of 
> them. With 6 folds everything is fine, perplexities are almost the 
> same for both models, but with other 4 parts I have such a problem.
>
> I would be greatly appreciated for any help.
>
> Sincerely yours,
> Dmytro Prylipko.
>
> _______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://www.speech.sri.com/mailman/listinfo/srilm-user