[SRILM User List] classes-format question

Andreas Stolcke stolcke at icsi.berkeley.edu
Wed Apr 20 21:03:55 PDT 2011


Andreas Stolcke wrote:
> Fabian - wrote:
>> Hi,
>> I'm still experimenting with class-based (actually POS) LMs. I use my 
>> own 61 classes/PoS. I built a class LM which works fine for decoding. 
>> But I also want to compute the perplexity. If I built a mapping file 
>> like mentioned in the classes-format manual page (with 
>> probabilities=1) I get a ppl of 8.
> You mean when you replace all the words with their class labels?
>
>> So I computed the probabilities for mapping class x to word j as 
>> followed:
>>
>> # word j in class x
>> ---------------------------
>> #occurences of class x
>>
>> Now I get a ppl of ~1300. This seems a bit high!?
> It depends.  You might have to smooth these probabilities, just like 
> ngram probabilities. Try
>
> # word j in class x + 1
> ---------------------------
> #occurences of class x + # classes
Correction:   the add-1 smoothing formula for class membership should read:

# word j in class x + 1
---------------------------
#occurences of class x + # word-types


Andreas




More information about the SRILM-User mailing list