[SRILM User List] classes-format question
Andreas Stolcke
stolcke at icsi.berkeley.edu
Wed Apr 20 21:03:55 PDT 2011
Andreas Stolcke wrote:
> Fabian - wrote:
>> Hi,
>> I'm still experimenting with class-based (actually POS) LMs. I use my
>> own 61 classes/PoS. I built a class LM which works fine for decoding.
>> But I also want to compute the perplexity. If I built a mapping file
>> like mentioned in the classes-format manual page (with
>> probabilities=1) I get a ppl of 8.
> You mean when you replace all the words with their class labels?
>
>> So I computed the probabilities for mapping class x to word j as
>> followed:
>>
>> # word j in class x
>> ---------------------------
>> #occurences of class x
>>
>> Now I get a ppl of ~1300. This seems a bit high!?
> It depends. You might have to smooth these probabilities, just like
> ngram probabilities. Try
>
> # word j in class x + 1
> ---------------------------
> #occurences of class x + # classes
Correction: the add-1 smoothing formula for class membership should read:
# word j in class x + 1
---------------------------
#occurences of class x + # word-types
Andreas
More information about the SRILM-User
mailing list