[SRILM User List] classes-format question

Andreas Stolcke stolcke at icsi.berkeley.edu
Wed Apr 20 13:59:59 PDT 2011


Fabian - wrote:
> Hi,
> I'm still experimenting with class-based (actually POS) LMs. I use my 
> own 61 classes/PoS. I built a class LM which works fine for decoding. 
> But I also want to compute the perplexity. If I built a mapping file 
> like mentioned in the classes-format manual page (with 
> probabilities=1) I get a ppl of 8.
You mean when you replace all the words with their class labels?

> So I computed the probabilities for mapping class x to word j as followed:
>
> # word j in class x
> ---------------------------
> #occurences of class x
>
> Now I get a ppl of ~1300. This seems a bit high!?
It depends.  You might have to smooth these probabilities, just like 
ngram probabilities. 
Try

# word j in class x + 1
---------------------------
#occurences of class x + # classes


>
> I have a total of 20k mappings with a vocab of 12k! The LM is an 
> interpolation of a pure 3g class LM and a 3g word LM. The word LM has 
> usually a ppl of ~500. The ASR Error rate of the word based and 
> interpolated are similar though.
Make sure you use -bayes 0 when interpolating word and class-based LMs.  
You should not merge LMs of different types statically (without -bayes).

Andreas

>
> Can you help me?
> Thanks,
> Fabian
> ------------------------------------------------------------------------
>
> _______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://www.speech.sri.com/mailman/listinfo/srilm-user



More information about the SRILM-User mailing list