[SRILM User List] Class-based LM
Andreas Stolcke
stolcke at icsi.berkeley.edu
Wed Jul 27 14:30:16 PDT 2011
Maral Sh. wrote:
> Dear Andreas,
> I finally trained both my LMs. The funny thing is, with POS-tagged LM
> the perplexity is about 22 , with a word-based LM the perplexity is
> 303 and with automatic clustering the perplexity goes up to 325. I was
> wondering if this is normal or have I done something wrong in the
> process of training my models?! how can I know what mistake I have
> made and where?!
>
> Best regards,
> Maral
>
It is possible that you just don't have enough data to learn good word
classes. As a sanity check you could include the test set in your
training set for class induction. You might also get better results if
you exclude the least frequent words (say, all words occurring less than
5 times) from the induction, and put them into a separate class (which
you have to define and add to the eventual class definition file by hand).
Andreas
More information about the SRILM-User
mailing list