[SRILM User List] perplexity results

Stef M mstefd22 at gmail.com
Tue Jan 24 05:46:12 PST 2017


Hello David.

Thank you very much for answering. I am not sure if you received my reply
as the yahoo servers have problems right now so i switched to gmail (sorry
if you received already the email).


I used Wikipedia parallel corpus en-es for training the two lms (
http://opus.lingfil.uu.se/Wikipedia.php, 1.8M sentence pairs). I used the
-debug 2 as you said and below are the results. Could you please help me
understand why the perplexity numbers are so high for the EN sentences
since they are well formed? For testing spanish i used machine translated
output so i was expecting big numbers for ppl. Thank you!


Sixty-six parent coordinators were laid off," the draft complaint says,
"and not merely excessed.
p( Sixty-six | <s> )  = [1gram] 2.16995e-09 [ -8.66355 ]
p( parent | Sixty-six ...)  = [1gram] 1.0949e-05 [ -4.96063 ]
p( coordinators | parent ...)  = [1gram] 3.37871e-07 [ -6.47125 ]
p( were | coordinators ...)  = [1gram] 0.00120231 [ -2.91998 ]
p( laid | were ...)  = [2gram] 0.000696035 [ -3.15737 ]
p( off," | laid ...)  = [1gram] 2.33407e-08 [ -7.63189 ]
p( the | off," ...)  = [2gram] 0.0469306 [ -1.32854 ]
p( draft | the ...)  = [2gram] 7.67904e-05 [ -4.11469 ]
p( complaint | draft ...)  = [1gram] 8.13141e-07 [ -6.08983 ]
p( says, | complaint ...)  = [1gram] 1.17395e-05 [ -4.93035 ]
p( "and | says, ...)  = [2gram] 0.00147669 [ -2.83071 ]
p( not | "and ...)  = [1gram] 0.000275198 [ -3.56035 ]
p( merely | not ...)  = [2gram] 0.00173666 [ -2.76029 ]
p( <unk> | merely ...)  = [1gram] 0.0796503 [ -1.09881 ]
p( </s> | <unk> ...)  = [1gram] 0.0258359 [ -1.58778 ]
1 sentences, 14 words, 0 OOVs
0 zeroprobs, logprob= -62.106 ppl= 13816.6 ppl1= 27298.9


Mexico's Enrique Pena Nieto faces tough start
p( Mexico's | <s> )  = [2gram] 1.31547e-06 [ -5.88092 ]
p( Enrique | Mexico's ...)  = [1gram] 1.34348e-05 [ -4.87177 ]
p( Pena | Enrique ...)  = [1gram] 1.83116e-06 [ -5.73727 ]
p( Nieto | Pena ...)  = [1gram] 1.6622e-06 [ -5.77932 ]
p( faces | Nieto ...)  = [1gram] 1.61354e-05 [ -4.79222 ]
p( tough | faces ...)  = [1gram] 2.80928e-06 [ -5.5514 ]
p( start | tough ...)  = [1gram] 2.90611e-05 [ -4.53669 ]
p( </s> | start ...)  = [1gram] 0.00941231 [ -2.0263 ]
1 sentences, 7 words, 0 OOVs
0 zeroprobs, logprob= -39.1759 ppl= 78883.7 ppl1= 394964



The NATO mission officially ended Oct. 31.
p( The | <s> )  = [2gram] 0.143584 [ -0.842893 ]
p( NATO | The ...)  = [3gram] 5.55208e-06 [ -5.25554 ]
p( mission | NATO ...)  = [1gram] 3.10877e-05 [ -4.50741 ]
p( officially | mission ...)  = [1gram] 2.81221e-05 [ -4.55095 ]
p( ended | officially ...)  = [2gram] 0.00976927 [ -2.01014 ]
p( Oct. | ended ...)  = [1gram] 2.4073e-07 [ -6.61847 ]
p( 31. | Oct. ...)  = [1gram] 3.60453e-06 [ -5.44315 ]
p( </s> | 31. ...)  = [2gram] 0.907671 [ -0.0420717 ]
1 sentences, 7 words, 0 OOVs
0 zeroprobs, logprob= -29.2706 ppl= 4558.57 ppl1= 15188.6
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.speech.sri.com/pipermail/srilm-user/attachments/20170124/6783cc2b/attachment.html>


More information about the SRILM-User mailing list