can't get right counts-entropy

SAI TANG HUANG sai_tang_huang at hotmail.com
Mon Jan 21 04:59:49 PST 2008


Hi,

I have created a counts file and a back-off LM file from a text file with sentences with the following command:

sai at uk-notebook:~/Desktop$ ngram-count -text Merged_File.txt -lm lm_file -write count_file 

Then I ran the ngram program with -counts here is the output:

sai at uk-notebook:~/Desktop$ ngram -lm lm_file -counts count_file 
file count_file: 23640 sentences, 460074 words, 0 OOVs
7880 zeroprobs, logprob= -1.03103e+06 ppl= 146.821 ppl1= 190.575
sai at uk-notebook:~/Desktop$ 

I fail to understand the output. I read the the -counts command does something with a counts file (that would be my count_file). I don't understand why there's 7880 zeroprobs. When I run the ngram with -ppl I get:

sai at uk-notebook:~/Desktop$ ngram -lm lm_file -debug 0 -ppl Merged_File.txt 
file Merged_File.txt: 7880 sentences, 153358 words, 0 OOVs
0 zeroprobs, logprob= -270778 ppl= 47.7932 ppl1= 58.2985
sai at uk-notebook:~/Desktop$ 

Why does the -ppl yield 0 zeroprobs and the -counts give me 7880 zeroprobs? Also why are the ppl and ppl1 values different from the -ppl ?

If there is a more detailed manual or document describing these values then I'm willing to read it.

Thanks a lot,

Sai
_________________________________________________________________
Tecnología, moda, motor, viajes,…suscríbete a nuestros boletines para estar siempre a la última
http://newsletters.msn.com/hm/maintenanceeses.asp?L=ES&C=ES&P=WCMaintenance&Brand=WL&RU=http%3a%2f%2fmail.live.com



More information about the SRILM-User mailing list