Odd jelinek mercer results
Christophe Hauser
christophe.hauser at irisa.fr
Wed May 20 09:34:10 PDT 2009
Hello,
I get really odd results using Jelinek Mercer smoothing.
In the following simple example, I get the best results with no smoothing at all (1.12).
Using smoothing, setting all parameters to 1 gives better performance (1.24) than optimizing the parameters on the test set (2.4) According to Chen&Goodman, this means there is no smoothing at all.
This yields similar results with any other dataset I've tried.
Am I doing something wrong ?
training : A B C A B C A B C
test : A B C A B C A B C D
# write vocabulary
cat $test $training > everything
ngram-count -text everything -no-eos -no-sos -write-vocab vocab -order
$order
# write count file
ngram-count -debug 2 -text $training -lm lm -order $order -write cfile
-vocab vocab -gt1max 0 -gt2max 0 -gt3max 0 -no-eos -no-sos
cat >countlm <<EOF
countmodulus 1
mixweights 1
1 1 1
1 1 1
counts cfile
EOF
# Optimize smoothing parameters > lmsmooth
ngram-count -debug 2 -text $test -count-lm -init-lm countlm -lm lmsmooth
-order $order -vocab vocab -no-eos -no-sos -gt1max 0 -gt2max 0 -gt3max 0
lsmooth :
order 3
mixweights 1
1 1 1
0 0 0.674508
countmodulus 1
vocabsize 5
totalcount 9
counts cfile
# Evaluate perplexity using lmsmooth model
ngram -debug 2 -count-lm -lm lmsmooth -order $order -ppl $test
-write-lm lm2 -vocab vocab -no-eos -no-sos
A B C A B C A B C D
p( A | ) = [9,0,3] 0.2 [ -0.69897 ]
p( B | A ...) = [9,0,3,0,3] 0.2 [ -0.69897 ]
p( C | B ...) = [9,0,3,0,3,0.674508,3] 0.739606 [ -0.130999 ]
p( A | C ...) = [9,0,3,0,2,0.674508,2] 0.51477 [ -0.288386 ]
p( B | A ...) = [9,0,3,0,3,0.674508,2] 0.739606 [ -0.130999 ]
p( C | B ...) = [9,0,3,0,3,0.674508,3] 0.739606 [ -0.130999 ]
p( A | C ...) = [9,0,3,0,2,0.674508,2] 0.51477 [ -0.288386 ]
p( B | A ...) = [9,0,3,0,3,0.674508,2] 0.739606 [ -0.130999 ]
p( C | B ...) = [9,0,3,0,3,0.674508,3] 0.739606 [ -0.130999 ]
p( D | C ...) = [9,0,0,0,0,0.674508,0] 0.0650984 [ -1.18643 ]
0 sentences, 10 words, 0 OOVs
0 zeroprobs, logprob= -3.81614 ppl= 2.40776 ppl1= 2.40776
# Evaluate perplexity using manual parameters
ngram -debug 2 -count-lm -lm countlm -order $order -ppl $test -write-lm
lm2 -vocab vocab -no-eos -no-sos
A B C A B C A B C D
p( A | ) = [9,1,3] 0.333333 [ -0.477121 ]
p( B | A ...) = [9,1,3,1,3] 1 [ 0 ]
p( C | B ...) = [9,1,3,1,3,1,3] 1 [ 0 ]
p( A | C ...) = [9,1,3,1,2,1,2] 0.666667 [ -0.176091 ]
p( B | A ...) = [9,1,3,1,3,1,2] 1 [ 0 ]
p( C | B ...) = [9,1,3,1,3,1,3] 1 [ 0 ]
p( A | C ...) = [9,1,3,1,2,1,2] 0.666667 [ -0.176091 ]
p( B | A ...) = [9,1,3,1,3,1,2] 1 [ 0 ]
p( C | B ...) = [9,1,3,1,3,1,3] 1 [ 0 ]
p( D | C ...) = [9,1,0,1,0,1,0] 0 [ -inf ]
0 sentences, 10 words, 0 OOVs
1 zeroprobs, logprob= -0.829304 ppl= 1.23636 ppl1= 1.23636
# Evaluate the perplexity with no smoothing at all
ngram -debug 2 -ppl $test -lm lm -order $order -vocab vocab -no-eos -no-sos
A B C A B C A B C D
p( A | ) = [1gram] 0.333333 [ -0.477121 ]
p( B | A ...) = [2gram] 1 [ 0 ]
p( C | B ...) = [3gram] 1 [ 0 ]
p( A | C ...) = [3gram] 1 [ 0 ]
p( B | A ...) = [3gram] 1 [ 0 ]
p( C | B ...) = [3gram] 1 [ 0 ]
p( A | C ...) = [3gram] 1 [ 0 ]
p( B | A ...) = [3gram] 1 [ 0 ]
p( C | B ...) = [3gram] 1 [ 0 ]
p( D | C ...) = [1gram] 0 [ -inf ]
0 sentences, 10 words, 0 OOVs
1 zeroprobs, logprob= -0.477121 ppl= 1.12983 ppl1= 1.12983
Kind regards,
--
Christophe
More information about the SRILM-User
mailing list