[SRILM User List] Calculate perplexity
Andreas Stolcke
stolcke at icsi.berkeley.edu
Fri Mar 28 10:18:11 PDT 2014
On 3/28/2014 2:05 AM, Laatar Rim wrote:
> thanks
> so my file replace-word-with-class sould not contain the words from
> data test ?
Knowing which words should be in class should be considered part of the
training process, or comes from prior knowledge.
If you application gives you the class membership of the words in the
test data then you can add it, otherwise it would be "training on test
data".
Andreas
> ----
> Cordialement
>
> *Rim LAATAR *
> Ingénieur Informatique, de l'École Nationale d’Ingénieurs de
> Sfax(ENIS <http://www.enis.rnu.tn/>)
> Étudiante en mastère de recherche, Système d'Information & Nouvelles
> Technologies à laFSEGS <http://www.fsegs.rnu.tn/> --Option TALN
> Site web:Rim LAATAR BEN SAID
> <https://sites.google.com/site/rimlaatarbnsaid/>
> Tel: (+216) 99 64 74 98
> ----
>
>
> Le Jeudi 27 mars 2014 18h44, Andreas Stolcke
> <stolcke at icsi.berkeley.edu> a écrit :
> On 3/27/2014 7:38 AM, Laatar Rim wrote:
>> Dear Andreas ,
>> to calculate perplexity i do this :
>> lenovo at ubuntu:~/Documents/srilm$ ngram -lm class_based_model
>> '/home/lenovo/Documents/srilm/ML_N_Class/IN_SRILM' -ppl
>> '/home/lenovo/Documents/srilm/ML_N_Class/titi.txt'
>> titi.txt is my training data
>> 1- i should calculate perplexity elso in my test data ?
> Yes, in fact, perplexity is usually reported on test data (data not
> used in training the model) since otherwise you get a very biased
> estimate.
>
>> 2- how can i interpretate this result :
>> file /home/lenovo/Documents/srilm/ML_N_Class/titi.txt: 18657
>> sentences, 66817 words, 5285 OOVs
>> 0 zeroprobs, logprob= -259950 ppl= 1744.69 ppl1= 16773.8
>> what is the difference between ppl and ppl1 ??
> OOVs is the count of words that don't occur in the vocabulary
> (technically, that are mapped to <unk>) and have zero probability.
> zeroprobs refers to any other words that have zero probability.
> These counts are reported because they are not included in the
> perplexity computation.
>
> ppl is the standard perplexity where end-of-sentence tokens (</s>) are
> counted in the denominator. ppl1 is the same thing but </s> tokens are
> not counted in the denominator.
>
>
> Andreas
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20140328/cee45f91/attachment.html>
More information about the SRILM-User
mailing list