Disambig n-best scores

Andreas Stolcke stolcke at speech.sri.com
Tue Mar 30 15:58:02 PST 2004


In message <009501c4166e$a0b50cd0$34284484 at cs.technion.ac.il>you wrote:
> Hi,
> 
> How is path score in disambig with n-best option calculated?
> 
> For example, suppose that I have the sentence:
> 
> W1 W2 
> Which is tagged with T1 T2
> 
> Then I calculated the path probability as follows:
> 
> Log10 [ P(T1|<s>)*P(T2|T1)*P(<\s>|T2)*P(W1|T1)*P(W2|T2) ]
> 
> I got it "almost right" . I checked for two paths:
> For one I got -20.549 (while disambig returned -120.549)
> For the other I got -20.837 (while disambig returned -120.837)
> 
> What is the reason for this difference? Should I always ignore the "1"
> after the "-"?

The -100 comes from an OOV word.  When the LM returns a probability of 0
AND the word is not in the LM it is considered an OOV.  To allow the 
probability computation to go on a large negative, but finite, log probability
of -100 is substituted (cf. the constant LogP_PseudoZero in disambig.cc).

--Andreas 




More information about the SRILM-User mailing list