[SRILM User List] compute perplexity
Andreas Stolcke
stolcke at icsi.berkeley.edu
Thu Mar 27 10:25:35 PDT 2014
On 3/19/2014 10:57 AM, Stefy D. wrote:
> Dear Andreas,
> thank you very much for replying.
> I trained both LMs using the "-unk" option like this:
> $NGRAMCOUNT_FILE -order 3 -interpolate -kndiscount -unk -text
> $WORKING_DIR"lm_a/lmodel.lm"
That explains who you are not getting OOVs reported in the ppl output.
Unknown words are mapped to <unk> and thus the LM has a probability for
> For the OOV rate I created a vocabulary list for the training data and
> I used the unigram counts of the test set and the compute-oov-rate
> script like this:
> $NGRAMCOUNT_FILE -order 1 -write-vocab "vocabularyTargetUnigram.txt"
> -text $WORKING_DIR$OUT_CORPUS"lowercased."$TARGET -sort
> $NGRAMCOUNT_FILE -order 1 -text $WORKING_DIR"test.lowercased."$TARGET
> -write "unigramCounts_testdatal.txt" -sort
> $OOVCOUNT_FILE vocabularyTargetUnigram.txt unigramCounts_testdata.txt
> This is how I got that OOV rate mentioned in the first mail. Could you
> please let me know if I used the right commands to compute that?
You did it right.
> You said I should train LM_A using the vocabulary of corpus A + corpus
> X so that the perplexities can be compared. So I should train LM_A
> using only corpus A but the vocabulary of A + X? I am sorry to be
> confused, but I thought that for estimating the LM the vocabulary
> should be from the same corpus used for estimating. I am using these
> LMs in SMT systems (a baseline and an adapted one). If I influence the
> baseline LM with vocabulary from the adapted data, then the baseline
> is not really a baseline. Please tell me if I am thinking incorrectly.
You are right. What this illustrates is that perplexity alone is not a
sufficient metric for comparing LMs. In your scenario (LM adaptation)
the expansion of the vocabulary is a key component of the adaptation
process, but LMs with different vocabularies are no longer comparable by
ppl. My suggestion to unify the vocabularies was a workaround to allow
you to still use perplexity comparison.
> Thank you for introducing me into statistical significance.
> To generate a table of word level probabilities on the same test set
> should I use get-unigram-probs? But where do I specify the test set?
> $UNIGRAMPROBS_FILE linear=1 $WORKING_DIR"lm_a/lmodel.arpa."$TARGET >
> table_A.out
No, you get the word probabilities from output of ngram -debug 2 -ppl
(you need to write some perl or whatever script to extract the
> To get how many words had lower/same/greater probability in LM_B is
> using compare-ppls script ok? For example, I get this output when
> applying it to my 2 LMs (ngram -debug 2 on the same test set as in
> previous commands):
> $COMPARE_PPLS $WORKING_DIR"ppl_files/ppl_A_detail.ppl"
> $WORKING_DIR"ppl_files/ppl_B_detail.ppl"
> output: total 22450, equal 0, different 22450, greater 11447
Yes, it seems compare-ppls extracts exactly the statistics I was talking
about. I had forgotten about it ...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20140327/adc5d350/attachment.html>
More information about the SRILM-User
mailing list