About two PPLs
Andreas Stolcke
stolcke at speech.sri.com
Fri Feb 28 21:38:08 PST 2003
In message <002101c2dfb4$8801ec90$6314ce80 at speechwork>you wrote:
> This is a multi-part message in MIME format.
>
> ------=_NextPart_000_001E_01C2DF71.79CB99C0
> Content-Type: text/plain;
> charset="gb2312"
> Content-Transfer-Encoding: quoted-printable
>
> Hi,
> I have installed srilm successfully, thanks a lot! Now I have a =
> small question about PPL output:
> when I run "ngram" to count PPL of a testing text, there are two =
> ppls output: ppl and ppl1, what's the difference of them?=20
> =A3=A8I can't find this from the documents).
ppl is the perplexity normalized over all input tokens,
ppl1 is omits end-of-sentence tokens from the denominator.
ppl1 is more meaningful for comparing texts that differ in their
sentence segmentations.
BTW, this will be documented in the man page for the next release.
--Andreas
More information about the SRILM-User
mailing list