Q: probabilities calculation

Bing Jing iris_jing_2000 at yahoo.com
Fri Oct 25 10:47:07 PDT 2002

Hello there,

Does anyone know how the SRI tool generate
unigram probabilities for the words that NOT
occur in the training transcript but covered
by the training dictionary? As I read
the NgramLM.cc, I think all those words are
assigned a probability as LogP_Zero, but it 
seems to me that this value is various regarding
different LMs. 

I used two sets of quite small transcription to
train LMs, and use the same training dictionary (
46K). The number of unique words in trans1 and trans2
are 620 and 700, respectively. And for those words
that covered by the lexicon but now in the training
trans, the unigram probabilities are -5.337341 and 
-5.383736, respectively. I still can't figure out how
these two numbers are generated. 

Thanks in advance!


Do you Yahoo!?
Y! Web Hosting - Let the expert host your web site

More information about the SRILM-User mailing list