[SRILM User List] classes-format question

Fabian - fabian_in_hongkong at hotmail.com
Wed Apr 20 01:08:41 PDT 2011

Hi,I'm still experimenting with class-based (actually POS) LMs. I use my own 61 classes/PoS. I built a class LM which works fine for decoding. But I also want to compute the perplexity. If I built a mapping file like mentioned in the classes-format manual page (with probabilities=1) I get a ppl of 8.So I computed the probabilities for mapping class x to word j as followed:
# word j in class x---------------------------#occurences of class x
Now I get a ppl of ~1300. This seems a bit high!?
I have a total of 20k mappings with a vocab of 12k! The LM is an interpolation of a pure 3g class LM and a 3g word LM. The word LM has usually a ppl of ~500. The ASR Error rate of the word based and interpolated are similar though.
Can you help me?Thanks,Fabian 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20110420/3325d32d/attachment.html>

More information about the SRILM-User mailing list