[SRILM User List] Calculate perplexity

Fri Mar 28 10:18:11 PDT 2014

On 3/28/2014 2:05 AM, Laatar Rim wrote:
> thanks
> so my file  replace-word-with-class sould not contain the words from 
> data test ?

Knowing which words should be in class should be considered part of the 
training process, or comes from prior knowledge.
If you application gives you the class membership of the words in the 
test data then you can add it, otherwise it would be "training on test 
data".

Andreas

> ----
> Cordialement
>
> *Rim LAATAR *
> Ingénieur  Informatique, de l'École Nationale d’Ingénieurs de 
> Sfax(ENIS <http://www.enis.rnu.tn/>)
> Étudiante en mastère de recherche, Système d'Information & Nouvelles 
> Technologies à laFSEGS <http://www.fsegs.rnu.tn/> --Option TALN
> Site web:Rim LAATAR BEN SAID 
> <https://sites.google.com/site/rimlaatarbnsaid/>
> Tel: (+216) 99 64 74 98
> ----
>
>
> Le Jeudi 27 mars 2014 18h44, Andreas Stolcke 
> <stolcke at icsi.berkeley.edu> a écrit :
> On 3/27/2014 7:38 AM, Laatar Rim wrote:
>> Dear Andreas ,
>> to calculate perplexity i do this :
>> lenovo at ubuntu:~/Documents/srilm$ ngram -lm class_based_model 
>> '/home/lenovo/Documents/srilm/ML_N_Class/IN_SRILM' -ppl 
>> '/home/lenovo/Documents/srilm/ML_N_Class/titi.txt'
>> titi.txt is my training data
>> 1- i should calculate perplexity elso in my test data ?
> Yes, in fact, perplexity is usually reported on test data (data not 
> used in training the model) since otherwise you get a very biased 
> estimate.
>
>> 2- how can i interpretate this result :
>> file /home/lenovo/Documents/srilm/ML_N_Class/titi.txt: 18657 
>> sentences, 66817 words, 5285 OOVs
>> 0 zeroprobs, logprob= -259950 ppl= 1744.69 ppl1= 16773.8
>>  what is the difference between ppl and ppl1 ??
> OOVs is the count of  words that don't occur in the vocabulary 
> (technically, that are mapped to <unk>) and have zero probability.
> zeroprobs refers to any other words that have zero probability.
> These counts are reported because they are not included in the 
> perplexity computation.
>
> ppl is the standard perplexity where end-of-sentence tokens (</s>) are 
> counted in the denominator. ppl1 is the same thing but </s> tokens are 
> not counted in the denominator.
>
>
> Andreas
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20140328/cee45f91/attachment.html>