[SRILM User List] Perplexity of ngram-class during inducing

Mon Apr 12 14:19:52 PDT 2010

Dear Dr. Andreas

I have a question regarding to the perplexity of ngram-class.

The command I used was: ngram-class -debug 2 -text TEXT -vocab VOCAB
-numclasses NUM -classes OUTPUT

The output file will contain a perplexity and PPL1 inside, what does the
perplexity stands for in class inducing? It seems that such perplexity
was calculated during the class clustering process (merging), but what
are the parameters it uses (e.g. -text and -lm)?

In the manual, it said that "...minimize perplexity of a class-based
N-gram model given the provided word N-gram count". But to my
understanding, there are few steps needed to use the class-based N-gram
model:

(a) use ngram-class to induce classes
(b) use replace-words-with-classes to replace both the TEXT and VOCAB
(c) follow the same method we used to estimate n-gram word-based model
LM, in order to get the class-based model LM, which will give us P(C_i |
C_i-2 C_i-1 ...)
(d) use this LM to calculate the perplexity: ngram -ppl TEST_SET -lm LM
-class CLASS_DEFINITION, which give us P( wi | ci )

Is the perplexity in ngram-class correlates with the perplexity in step
(d)? Or where could I get more detail definition about it?

Thanks for your help in advance.

Best Regards

Tzu-Chiang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20100412/b08df032/attachment.html>