Perplexity calculation: Strange behavior
Andreas Stolcke
stolcke at speech.sri.com
Wed Aug 31 13:06:53 PDT 2005
In message <200508312031.45859.hahn at i6.informatik.rwth-aachen.de>you wrote:
> Hi!
>
> During some language modeling using the SRI Toolkit (V.1.4.3 and V.1.4.5) on
> i686 Intel GNU/Linux I encountered some strange behavior concerning perplexit
> y
> calculation:
> For any order greater than 3, the perplexity calculated with ngram seems to b
> e
> fixed and wrong.
> For example, I used Defoe's "Robinson Crusoe" to create modified Kneser-Ney
> discounted Language Models for orders 1 up to 6 and calculated the perplexity
>
> for the same text using "ngram" and our own software:
>
> +------------------------+
> I perplexity I
> +-------+-------------+----------+
> I order | SRI-Toolkit I our Tool I
> +-------+-------------+----------+
> I 1 I 394.79 I 394.794 I
> +-------+-------------+----------+
> I 2 I 68.0706 I 68.071 I
> +-------+-------------+----------+
> I 3 I 54.29 I 54.2903 I
> +-------+-------------+----------+
> I 4 I 57.1554 I 52.6306 I
> +-------+-------------+----------+
> I 5 I 57.1554 I 52.6502 I
> +-------+-------------+----------+
> I 6 I 57.1554 I 52.7033 I
> +-------+-------------+----------+
I haven't looked at your script, but my guess is that you didn't specify
the -order option when evaluating the LM. The default is to only use
up to trigram probabilities regardless of what is in the LM file.
(That's for historical reasons.) So of course you get same result for
any LM order >=4 . Also, because of KN, you are getting a degradation
relative to the trigram, as the lower-order probabilities are optimized
to minimize the higher-order estimates.
If this is not the case then we may have a bug, but I can assure you that
we use order >= 4 all the time.
--Andreas
>
> The script I used to download "Robinson Crusoe", create the LMs and
> SRI-results:
>
> wget "http://www-i6.informatik.rwth-aachen.de/~gollan/make-lm-01.sh"
> chmod a+x make-lm-01.sh
> ./make-lm-01.sh
>
> Is there any error in my script?
> Thanks,
> Stefan
More information about the SRILM-User
mailing list