[SRILM User List] Class-based LM

Andreas Stolcke stolcke at icsi.berkeley.edu
Wed Jul 27 14:30:16 PDT 2011


Maral Sh. wrote:
> Dear Andreas,
> I finally trained both my LMs. The funny thing is, with POS-tagged LM 
> the perplexity is about 22 , with a word-based LM the perplexity is 
> 303 and with automatic clustering the perplexity goes up to 325. I was 
> wondering if this is normal or have I done something wrong in the 
> process of training my models?! how can I know what mistake I have 
> made and where?!
>
> Best regards,
> Maral
>
It is possible that you just don't have enough data to learn good word 
classes.  As a sanity check you could include the test set in your 
training set for class induction.   You might also get better results if 
you exclude the least frequent words (say, all words occurring less than 
5 times) from the induction, and put them into a separate class (which 
you have to define and add to the eventual class definition file by hand).

Andreas


More information about the SRILM-User mailing list