[SRILM User List] How do you calculate perplexity given a test sentence?

Fri May 18 13:45:48 PDT 2012

On 5/18/2012 10:39 AM, Burkay Gur wrote:
> This is still not clear to me. When we calculate the perplexity of a 
> language
> model alone, we just take p as the language model itself. This tells 
> us how
> perplexed is that language model.
>
> This is H(p) = - Sum_i(p_i*log(p_i))
>
> Now when we introduce a test sentence, I am not sure what we are 
> calculating. In
> your example you are not mentioning q in the equation.
>
> H(p,q) = -Sum_i(p_i * log(q_i))

First, exchange p and q, if p is your LM, so you have

H(p,q) = -Sum_i(q_i * log(p_i))

q_i is approximated by the empirical distribution of words in the test 
data.   So effectively, q_i =  number of occurrences of word i / length 
of test corpus.
Of course for many (most) words q_i will be zero (they don't occur in 
the test data).

With this approximation you get

H(p,q)  = - Sum_j log (p_j)

where j now ranges over the word occurrences (tokens, not types) in the 
test set, and p_j is the probability of the j-th word.

Andreas