[SRILM User List] effect of ngram -vocab and -limit-vocab on ppl calculations
zeeshan khan
zeeshankhans at gmail.com
Mon Feb 7 15:24:39 PST 2011
Hi all,
I wanted to share my observation regarding the SRILM toolkit's calculation
of perplexities and the effect of -vocab and -limit-vocab on it, and wanted
to know why this happens.
SRILM toolkit's ngram tool gives 3 different perplexities of the SAME text
if these options are used as follows.
P1: ngram -unk -map-unk '[UNKNOWN]' -order 4 -lm <LM-FILE> -ppl <TEXT-FILE>
: gives the highest perplexity value
P2: ngram -unk -map-unk '[UNKNOWN]' -vocab <VOCAB-FILE> -order 4 -lm
<LM-FILE> -ppl <TEXT-FILE> : gives perplexity value lesser than P1 and
greater than P3.
P3: ngram -unk -map-unk '[UNKNOWN]' -vocab <VOCAB-FILE> -limit-vocab -order
4 -lm <LM-FILE> -ppl <TEXT-FILE> : gives perplexity value smaller than both
P1 and P2.
Can anyone tell me why this happens ? I thought the effect of -vocab and
-limit-vocab options is only on memory usage.
Just for information, the VOCAB files are generated from lattice files
generated during a recognition process.
Thanks and Regards,
Zeeshan.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20110208/5ec37f53/attachment.html>
More information about the SRILM-User
mailing list