[SRILM User List] effect of ngram -vocab and -limit-vocab on ppl calculations

zeeshan khan zeeshankhans at gmail.com
Mon Feb 7 15:24:39 PST 2011


Hi all,

I wanted to share my observation regarding the SRILM toolkit's calculation
of perplexities and the effect of  -vocab and -limit-vocab on it, and wanted
to know why this happens.

SRILM toolkit's ngram tool gives 3 different perplexities of the SAME text
if these options are used as follows.

P1: ngram -unk -map-unk '[UNKNOWN]'  -order 4 -lm <LM-FILE> -ppl <TEXT-FILE>
: gives the highest perplexity value

P2: ngram -unk -map-unk '[UNKNOWN]' -vocab <VOCAB-FILE> -order 4 -lm
<LM-FILE> -ppl <TEXT-FILE> : gives perplexity value lesser than P1 and
greater than P3.

P3: ngram -unk -map-unk '[UNKNOWN]' -vocab <VOCAB-FILE> -limit-vocab -order
4 -lm <LM-FILE> -ppl <TEXT-FILE> : gives perplexity value smaller than both
P1 and P2.

Can anyone tell me why this happens ? I thought the effect of -vocab and
-limit-vocab options is only on memory usage.

Just for information, the VOCAB files are generated from lattice files
generated during a recognition process.

Thanks and Regards,

Zeeshan.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20110208/5ec37f53/attachment.html>


More information about the SRILM-User mailing list