Odd jelinek mercer results

Wed May 20 09:34:10 PDT 2009

Hello,

I get really odd results using Jelinek Mercer smoothing.

In the following simple example, I get the best results with no smoothing at all (1.12).
Using smoothing, setting all parameters to 1 gives better performance (1.24) than optimizing the parameters on the test set (2.4) According to Chen&Goodman, this means there is no smoothing at all.
This yields similar results with any other dataset I've tried.

Am I doing something wrong ?

training : A B C A B C A B C
test : A B C A B C A B C D

# write vocabulary 
cat $test $training > everything
ngram-count -text everything -no-eos -no-sos -write-vocab vocab -order
$order

# write count file
ngram-count -debug 2 -text $training -lm lm -order $order -write cfile
-vocab vocab -gt1max 0 -gt2max 0 -gt3max 0 -no-eos -no-sos

cat >countlm <<EOF
countmodulus 1
mixweights 1
1 1 1
1 1 1
counts cfile
EOF

# Optimize smoothing parameters > lmsmooth
ngram-count -debug 2 -text $test -count-lm -init-lm countlm -lm lmsmooth
-order $order -vocab vocab -no-eos -no-sos -gt1max 0 -gt2max 0 -gt3max 0

lsmooth :
order 3 
mixweights 1
1 1 1
0 0 0.674508
countmodulus 1
vocabsize 5
totalcount 9
counts cfile

# Evaluate perplexity using lmsmooth model 
ngram -debug  2 -count-lm -lm lmsmooth -order $order -ppl $test
-write-lm lm2 -vocab vocab -no-eos -no-sos

A B C A B C A B C D
    p( A |  )   = [9,0,3] 0.2 [ -0.69897 ]
    p( B | A ...)   = [9,0,3,0,3] 0.2 [ -0.69897 ]
    p( C | B ...)   = [9,0,3,0,3,0.674508,3] 0.739606 [ -0.130999 ]
    p( A | C ...)   = [9,0,3,0,2,0.674508,2] 0.51477 [ -0.288386 ]
    p( B | A ...)   = [9,0,3,0,3,0.674508,2] 0.739606 [ -0.130999 ]
    p( C | B ...)   = [9,0,3,0,3,0.674508,3] 0.739606 [ -0.130999 ]
    p( A | C ...)   = [9,0,3,0,2,0.674508,2] 0.51477 [ -0.288386 ]
    p( B | A ...)   = [9,0,3,0,3,0.674508,2] 0.739606 [ -0.130999 ]
    p( C | B ...)   = [9,0,3,0,3,0.674508,3] 0.739606 [ -0.130999 ]
    p( D | C ...)   = [9,0,0,0,0,0.674508,0] 0.0650984 [ -1.18643 ]
0 sentences, 10 words, 0 OOVs
0 zeroprobs, logprob= -3.81614 ppl= 2.40776 ppl1= 2.40776

# Evaluate perplexity using manual parameters
ngram -debug  2 -count-lm -lm countlm -order $order -ppl $test -write-lm
lm2 -vocab vocab -no-eos -no-sos

A B C A B C A B C D
    p( A |  )   = [9,1,3] 0.333333 [ -0.477121 ]
    p( B | A ...)   = [9,1,3,1,3] 1 [ 0 ]
    p( C | B ...)   = [9,1,3,1,3,1,3] 1 [ 0 ]
    p( A | C ...)   = [9,1,3,1,2,1,2] 0.666667 [ -0.176091 ]
    p( B | A ...)   = [9,1,3,1,3,1,2] 1 [ 0 ]
    p( C | B ...)   = [9,1,3,1,3,1,3] 1 [ 0 ]
    p( A | C ...)   = [9,1,3,1,2,1,2] 0.666667 [ -0.176091 ]
    p( B | A ...)   = [9,1,3,1,3,1,2] 1 [ 0 ]
    p( C | B ...)   = [9,1,3,1,3,1,3] 1 [ 0 ]
    p( D | C ...)   = [9,1,0,1,0,1,0] 0 [ -inf ]
0 sentences, 10 words, 0 OOVs
1 zeroprobs, logprob= -0.829304 ppl= 1.23636 ppl1= 1.23636

# Evaluate the perplexity with no smoothing at all
ngram -debug 2 -ppl $test -lm lm -order $order -vocab vocab -no-eos -no-sos

A B C A B C A B C D
    p( A |  )   = [1gram] 0.333333 [ -0.477121 ]
    p( B | A ...)   = [2gram] 1 [ 0 ]
    p( C | B ...)   = [3gram] 1 [ 0 ]
    p( A | C ...)   = [3gram] 1 [ 0 ]
    p( B | A ...)   = [3gram] 1 [ 0 ]
    p( C | B ...)   = [3gram] 1 [ 0 ]
    p( A | C ...)   = [3gram] 1 [ 0 ]
    p( B | A ...)   = [3gram] 1 [ 0 ]
    p( C | B ...)   = [3gram] 1 [ 0 ]
    p( D | C ...)   = [1gram] 0 [ -inf ]
0 sentences, 10 words, 0 OOVs
1 zeroprobs, logprob= -0.477121 ppl= 1.12983 ppl1= 1.12983

Kind regards,
-- 
Christophe