[SRILM User List] Interpreting ngram -ppl output in case of backoff
Sander Maijers
S.N.Maijers at student.ru.nl
Thu May 30 08:38:24 PDT 2013
Hi,
I have trained a baseline N-gram LM like so:
vocab %s -unk -map-unk '[unk]' -prune %s -debug 1 -order 3 -text %s
-sort -lm %s
Suppose I have the following line to ngram -ppl -debug 3 -map-unk [unk]
... :
( Heijn | Albert ...) = 0.210084 [ -0.677607 ]
This bigram is not in my LM. My pronunciation lexicon contains both
words, but only in lower case. I believe that the bigram that would be
looked up in this case by ngram is the one for "[unk] [unk]":
-0.5549474 [unk] [unk] -0.2222121
I do not understand precisely how to confirm this with the logprob
between brackets reported by ngram. When the applicable N-gram *is* in
the LM, the logprobs do not match between the ARPA line and the ngram
output either, but this must be due to discounting applied by default.
The man page for ngram with arguments -debug 2 -ppl says:
"Probabilities for each word, plus LM-dependent details about backoff
used etc., are printed.".
Where should I look for the backoff details in my ngram output to asses
the role of backoff, including the backing off as happening in LMs
generated with the -skip option?
Best,
Sander
More information about the SRILM-User
mailing list