Perplexity calculation: Strange behavior
Stefan Hahn
hahn at i6.informatik.rwth-aachen.de
Thu Sep 1 02:52:29 PDT 2005
Hi again!
Your guess was perfectly right, I simply overlooked to specify the -order
option for perplexity calculation....
Thanks again,
Stefan
> In message <200508312031.45859.hahn at i6.informatik.rwth-aachen.de>you wrote:
> > Hi!
> >
> > During some language modeling using the SRI Toolkit (V.1.4.3 and V.1.4.5)
> > on i686 Intel GNU/Linux I encountered some strange behavior concerning
> > perplexit y
> > calculation:
> > For any order greater than 3, the perplexity calculated with ngram seems
> > to b e
> > fixed and wrong.
> > For example, I used Defoe's "Robinson Crusoe" to create modified
> > Kneser-Ney discounted Language Models for orders 1 up to 6 and calculated
> > the perplexity
> >
> > for the same text using "ngram" and our own software:
> >
> > +------------------------+
> > I perplexity I
> > +-------+-------------+----------+
> > I order | SRI-Toolkit I our Tool I
> > +-------+-------------+----------+
> > I 1 I 394.79 I 394.794 I
> > +-------+-------------+----------+
> > I 2 I 68.0706 I 68.071 I
> > +-------+-------------+----------+
> > I 3 I 54.29 I 54.2903 I
> > +-------+-------------+----------+
> > I 4 I 57.1554 I 52.6306 I
> > +-------+-------------+----------+
> > I 5 I 57.1554 I 52.6502 I
> > +-------+-------------+----------+
> > I 6 I 57.1554 I 52.7033 I
> > +-------+-------------+----------+
>
> I haven't looked at your script, but my guess is that you didn't specify
> the -order option when evaluating the LM. The default is to only use
> up to trigram probabilities regardless of what is in the LM file.
> (That's for historical reasons.) So of course you get same result for
> any LM order >=4 . Also, because of KN, you are getting a degradation
> relative to the trigram, as the lower-order probabilities are optimized
> to minimize the higher-order estimates.
>
> If this is not the case then we may have a bug, but I can assure you that
> we use order >= 4 all the time.
>
> --Andreas
>
> > The script I used to download "Robinson Crusoe", create the LMs and
> > SRI-results:
> >
> > wget "http://www-i6.informatik.rwth-aachen.de/~gollan/make-lm-01.sh"
> > chmod a+x make-lm-01.sh
> > ./make-lm-01.sh
> >
> > Is there any error in my script?
> > Thanks,
> > Stefan
More information about the SRILM-User
mailing list