[SRILM User List] perplexity results

Tue Jan 24 04:14:56 PST 2017

Hello. I have a question regarding perplexity. I am using srilm to compute the perplexity of some sentences using a LM trained on a big corpus. Given a sentence and a LM, the perplexity tells how well that sentence fits to the language (as far as i understood). And the lower the perplexity, the better the sentence fits.
$NGRAMCOUNT_FILE -order 5 -interpolate -kndiscount -unk -text Wikipedia.en-es.es -lm lm/lmodel_es.lm
$NGRAM_FILE -order 5 -debug 1 -unk -lm lm/lmodel_es.lm -ppl testlabeled.en-es.es  > perplexity_es_testlabeled.ppl
I did the same on EN and on ES and here are some results I got:
Sixty-six parent coordinators were laid off," the draft complaint says, "and not merely excessed.1 sentences, 14 words, 0 OOVs0 zeroprobs, logprob= -62.106 ppl= 13816.6 ppl1= 27298.9
Mexico's Enrique Pena Nieto faces tough start1 sentences, 7 words, 0 OOVs0 zeroprobs, logprob= -39.1759 ppl= 78883.7 ppl1= 394964
The NATO mission officially ended Oct. 31.1 sentences, 7 words, 0 OOVs0 zeroprobs, logprob= -29.2706 ppl= 4558.57 ppl1= 15188.6
Sesenta y seis padres coordinadores fueron despedidos," el proyecto de denuncia, dice, "y no simplemente excessed.1 sentences, 16 words, 0 OOVs0 zeroprobs, logprob= -57.0322 ppl= 2263.79 ppl1= 3668.72
México Enrique Peña Nieto enfrenta duras comienzo1 sentences, 7 words, 0 OOVs0 zeroprobs, logprob= -29.5672 ppl= 4964.71 ppl1= 16744.7

Why are the perplexities for the EN sentences so big? The smallest ppl i get for an EN sentence is about 250. The spanish sentences have some errors, so i was expecting big ppl numbers. Should i change something in the way i compute the lms?
Thank you very much!!

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.speech.sri.com/pipermail/srilm-user/attachments/20170124/9faa4845/attachment.html>