[SRILM User List] Calculate perplexity

Andreas Stolcke stolcke at icsi.berkeley.edu
Thu Mar 27 10:44:54 PDT 2014


On 3/27/2014 7:38 AM, Laatar Rim wrote:
> Dear Andreas ,
> to calculate perplexity i do this :
> lenovo at ubuntu:~/Documents/srilm$ ngram -lm class_based_model 
> '/home/lenovo/Documents/srilm/ML_N_Class/IN_SRILM' -ppl 
> '/home/lenovo/Documents/srilm/ML_N_Class/titi.txt'
> titi.txt is my training data
> 1- i should calculate perplexity elso in my test data ?
Yes, in fact, perplexity is usually reported on test data (data not used 
in training the model) since otherwise you get a very biased estimate.

> 2- how can i interpretate this result :
> file /home/lenovo/Documents/srilm/ML_N_Class/titi.txt: 18657 
> sentences, 66817 words, 5285 OOVs
> 0 zeroprobs, logprob= -259950 ppl= 1744.69 ppl1= 16773.8
>  what is the difference between ppl and ppl1 ??
OOVs is the count of  words that don't occur in the vocabulary 
(technically, that are mapped to <unk>) and have zero probability.
zeroprobs refers to any other words that have zero probability.
These counts are reported because they are not included in the 
perplexity computation.

ppl is the standard perplexity where end-of-sentence tokens (</s>) are 
counted in the denominator. ppl1 is the same thing but </s> tokens are 
not counted in the denominator.

Andreas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20140327/42289e10/attachment.html>


More information about the SRILM-User mailing list