Q: probabilities calculation

Fri Oct 25 10:47:07 PDT 2002

Hello there,

Does anyone know how the SRI tool generate
unigram probabilities for the words that NOT
occur in the training transcript but covered
by the training dictionary? As I read
the NgramLM.cc, I think all those words are
assigned a probability as LogP_Zero, but it 
seems to me that this value is various regarding
different LMs. 

I used two sets of quite small transcription to
train LMs, and use the same training dictionary (
46K). The number of unique words in trans1 and trans2
are 620 and 700, respectively. And for those words
that covered by the lexicon but now in the training
trans, the unigram probabilities are -5.337341 and 
-5.383736, respectively. I still can't figure out how
these two numbers are generated. 

Thanks in advance!

Bing

__________________________________________________
Do you Yahoo!?
Y! Web Hosting - Let the expert host your web site
http://webhosting.yahoo.com/