perplexity evaluation

Andreas Stolcke stolcke at
Tue Dec 3 08:48:13 PST 2002

In message <B0793DB946E52942A49C1E8152A1358C8E3781 at>you wrot
> Hi all, 
> I'm a new user of the toolkit and I need a little bit support in order to
> understand how the perplexity is computed and why it is different from the
> expected value.
> For instance, I have the training data in the file train.text that contain
> only a line:
> <s> a b c </s>
> and the vocabulary (train.vocab) that contains all these words, and I want
> to generate a LM based on unigram only and to evaluate it on the same
> training data. I don't want any discounting strategy to be applied. 
> Here are the commands I used:
> ngram-count -order 1 -vocab train.vocab -text train.text -lm gt1max
> 0
> ngram -lm -debug 2 -vocab train.vocab -ppl train.text > out.ppl
> So, according to the theory, the expected value for perplexity is PP=3 if
> the context cues are not taken into account. This is also what one can get
> using CMU toolkit. 
> Using this toolkit and the above commands what I've got actually, is PP=4.
> Looking inside of the created arpa model , I could see that </s> has the
> same probability as any of the real word (a, b,c). 
> Does anybody could explain me why is like this? Did I make a mistake or is
> something that miss me?

You didn't make a mistake and this is the right answer as far as I can tell.
</s> needs to get a probability in order to be able to compute 
a probability for the whole "sentence".

Are you saying that the CMU software doesn't give any probabiliy to </s> ?
that would be quite odd.

Maybe someone on this list who is more familiar with the CMU toolkit can
contribute an explanation.


More information about the SRILM-User mailing list