[SRILM User List] perplexity results

Dávid Nemeskey nemeskeyd at gmail.com
Tue Jan 24 04:57:58 PST 2017


Hi,

it is hard to tell without knowing e.g. the training set. But I would
try running ngram with higher values for -debug. I think even -debug 2
tells you the logprob of the individual words. That could be a start.
I actually added another debug level (100), where I print the 5 most
likely candidates (requires a "forward trie" in addition to the
default "backwards" one to be of usable speed) to get a sense of the
proportions and how the model and the text differs.

Also, just wondering. Is the training corpus bilingual (en-es)?

Best,
Dávid Nemeskey

On Tue, Jan 24, 2017 at 1:14 PM, Stefy D. <tsuki_stefy at yahoo.com> wrote:
> Hello. I have a question regarding perplexity. I am using srilm to compute
> the perplexity of some sentences using a LM trained on a big corpus. Given a
> sentence and a LM, the perplexity tells how well that sentence fits to the
> language (as far as i understood). And the lower the perplexity, the better
> the sentence fits.
>
> $NGRAMCOUNT_FILE -order 5 -interpolate -kndiscount -unk -text
> Wikipedia.en-es.es -lm lm/lmodel_es.lm
>
> $NGRAM_FILE -order 5 -debug 1 -unk -lm lm/lmodel_es.lm -ppl
> testlabeled.en-es.es  > perplexity_es_testlabeled.ppl
>
> I did the same on EN and on ES and here are some results I got:
>
> Sixty-six parent coordinators were laid off," the draft complaint says, "and
> not merely excessed.
> 1 sentences, 14 words, 0 OOVs
> 0 zeroprobs, logprob= -62.106 ppl= 13816.6 ppl1= 27298.9
>
> Mexico's Enrique Pena Nieto faces tough start
> 1 sentences, 7 words, 0 OOVs
> 0 zeroprobs, logprob= -39.1759 ppl= 78883.7 ppl1= 394964
>
> The NATO mission officially ended Oct. 31.
> 1 sentences, 7 words, 0 OOVs
> 0 zeroprobs, logprob= -29.2706 ppl= 4558.57 ppl1= 15188.6
>
> Sesenta y seis padres coordinadores fueron despedidos," el proyecto de
> denuncia, dice, "y no simplemente excessed.
> 1 sentences, 16 words, 0 OOVs
> 0 zeroprobs, logprob= -57.0322 ppl= 2263.79 ppl1= 3668.72
>
> México Enrique Peña Nieto enfrenta duras comienzo
> 1 sentences, 7 words, 0 OOVs
> 0 zeroprobs, logprob= -29.5672 ppl= 4964.71 ppl1= 16744.7
>
>
> Why are the perplexities for the EN sentences so big? The smallest ppl i get
> for an EN sentence is about 250. The spanish sentences have some errors, so
> i was expecting big ppl numbers. Should i change something in the way i
> compute the lms?
>
> Thank you very much!!
>
>
>
> _______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://mailman.speech.sri.com/cgi-bin/mailman/listinfo/srilm-user



More information about the SRILM-User mailing list