Disambig n-best scores
Andreas Stolcke
stolcke at speech.sri.com
Tue Mar 30 15:58:02 PST 2004
In message <009501c4166e$a0b50cd0$34284484 at cs.technion.ac.il>you wrote:
> Hi,
>
> How is path score in disambig with n-best option calculated?
>
> For example, suppose that I have the sentence:
>
> W1 W2
> Which is tagged with T1 T2
>
> Then I calculated the path probability as follows:
>
> Log10 [ P(T1|<s>)*P(T2|T1)*P(<\s>|T2)*P(W1|T1)*P(W2|T2) ]
>
> I got it "almost right" . I checked for two paths:
> For one I got -20.549 (while disambig returned -120.549)
> For the other I got -20.837 (while disambig returned -120.837)
>
> What is the reason for this difference? Should I always ignore the "1"
> after the "-"?
The -100 comes from an OOV word. When the LM returns a probability of 0
AND the word is not in the LM it is considered an OOV. To allow the
probability computation to go on a large negative, but finite, log probability
of -100 is substituted (cf. the constant LogP_PseudoZero in disambig.cc).
--Andreas
More information about the SRILM-User
mailing list