[SRILM User List] reproduce Penn Treebank KN5 results

Wed Jul 9 07:03:27 PDT 2014

Hi all,

I'm trying to reproduce some reported N-gram perplexity results on the 
Penn Treebank with SRILM, but somehow my results are always different by 
a large degree. Since I will be interpolating with these models and 
comparing the interpolated model with others, I would really prefer to 
start on the same level :-).

The data set I'm using is the one that comes with Mikolov's RNNLM 
toolkit and applies the same processing of data as used in many LM 
papers, including "Empirical Evaluation and Combination of Advanced 
Language Modeling Techniques". In that paper, Mikolov et al report a KN5 
perplexity of 141.2. It's not entirely clear (1) whether they ignore OOV 
words or simply use the <unk> probability; and (2) whether it's a 
back-off or interpolated model, but I assume the latter as this has been 
reported as best many times. They do report using SRILM and no count 
cut-offs.

I have tried building the same model in many ways:

*regular:* ngram-count -order 5 -text data/penn/ptb.train.txt -lm 
models/ptb.train_5-gram_kn.arpa2 -kndiscount -interpolate
*open vocab:* ngram-count -order 5 -text data/penn/ptb.train.txt -lm 
models/ptb.train_5-gram_kn.arpa3 -kndiscount -interpolate -unk
*no sentence markers:* ngram-count -order 5 -text 
data/penn/ptb.train.txt -lm models/ptb.train_5-gram_kn.arpa4 -kndiscount 
-interpolate -no-sos -no-eos
*open vocab + no sentence markers:* ngram-count -order 5 -text 
data/penn/ptb.train.txt -lm models/ptb.train_5-gram_kn.arpa5 -kndiscount 
-interpolate -unk -no-sos -no-eos
*back-off (just in case**):* ngram-count -order 5 -text 
data/penn/ptb.train.txt -lm models/ptb.train_5-gram_kn.arpa5 -kndiscount 
-unk

None of them however, give me a perplexity of 141.2:

<jpeleman at spchcl23:~/exp/025> ngram -ppl data/penn/ptb.test.txt -lm 
models/ptb.train_5-gram_kn.arpa2 -order 5
file data/penn/ptb.test.txt: 3761 sentences, 78669 words, 4794 OOVs
0 zeroprobs, logprob= -172723 ppl= 167.794 ppl1= 217.791

<jpeleman at spchcl23:~/exp/025> ngram -ppl data/penn/ptb.test.txt -lm 
models/ptb.train_5-gram_kn.arpa3 -order 5 -unk
file data/penn/ptb.test.txt: 3761 sentences, 78669 words, 0 OOVs
0 zeroprobs, logprob= -178859 ppl= 147.852 ppl1= 187.743

<jpeleman at spchcl23:~/exp/025> ngram -ppl data/penn/ptb.test.txt -lm 
models/ptb.train_5-gram_kn.arpa4 -order 5
file data/penn/ptb.test.txt: 3761 sentences, 78669 words, 4794 OOVs
0 zeroprobs, logprob= -179705 ppl= 206.4 ppl1= 270.74

<jpeleman at spchcl23:~/exp/025> ngram -ppl data/penn/ptb.test.txt -lm 
models/ptb.train_5-gram_kn.arpa5 -order 5 -unk
file data/penn/ptb.test.txt: 3761 sentences, 78669 words, 0 OOVs
0 zeroprobs, logprob= -186444 ppl= 182.746 ppl1= 234.414

<jpeleman at spchcl23:~/exp/025> ngram -ppl data/penn/ptb.test.txt -lm 
models/ptb.train_5-gram_kn.arpa5 -order 5 -unk
file data/penn/ptb.test.txt: 3761 sentences, 78669 words, 0 OOVs
0 zeroprobs, logprob= -181381 ppl= 158.645 ppl1= 202.127

So... what am I missing here? 147.852 is close, but still not quite 141.2.

Joris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20140709/33d396a0/attachment.html>