Perplexity calculation: Strange behavior
Stefan Hahn
hahn at i6.informatik.rwth-aachen.de
Wed Aug 31 11:31:45 PDT 2005
Hi!
During some language modeling using the SRI Toolkit (V.1.4.3 and V.1.4.5) on
i686 Intel GNU/Linux I encountered some strange behavior concerning perplexity
calculation:
For any order greater than 3, the perplexity calculated with ngram seems to be
fixed and wrong.
For example, I used Defoe's "Robinson Crusoe" to create modified Kneser-Ney
discounted Language Models for orders 1 up to 6 and calculated the perplexity
for the same text using "ngram" and our own software:
+------------------------+
I perplexity I
+-------+-------------+----------+
I order | SRI-Toolkit I our Tool I
+-------+-------------+----------+
I 1 I 394.79 I 394.794 I
+-------+-------------+----------+
I 2 I 68.0706 I 68.071 I
+-------+-------------+----------+
I 3 I 54.29 I 54.2903 I
+-------+-------------+----------+
I 4 I 57.1554 I 52.6306 I
+-------+-------------+----------+
I 5 I 57.1554 I 52.6502 I
+-------+-------------+----------+
I 6 I 57.1554 I 52.7033 I
+-------+-------------+----------+
The script I used to download "Robinson Crusoe", create the LMs and
SRI-results:
wget "http://www-i6.informatik.rwth-aachen.de/~gollan/make-lm-01.sh"
chmod a+x make-lm-01.sh
./make-lm-01.sh
Is there any error in my script?
Thanks,
Stefan
More information about the SRILM-User
mailing list