Error in lattice rescoring?
Teemu Hirsimaki
teemu.hirsimaki at hut.fi
Thu Oct 13 02:03:10 PDT 2005
While working on lattices, I noticed that lattice-tool seems to give
sometimes strange backoff probabilities when rescoring lattices. I have
a simple 2-gram model test.arpa:
\data\
ngram 1=4
ngram 2=1
\1-grams:
-99 <s>
-1.00000 </s>
-0.69897 a
-0.15490 b -0.69897
\2-grams:
-0.09691 b </s>
\end\
and a simple HTK lattice file test.htk that has just words "b a </s>":
VERSION=1.1
base=10
dir=f
start=0 end=3
N=4 L=3
I=0
I=1
I=2
I=3
J=0 S=0 E=1 W=b
J=1 S=1 E=2 W=a
J=2 S=2 E=3 W=!NULL
Rescoring gives funny probabilities for "a" and "b":
$ lattice-tool -in-lattice test.htk -read-htk -lm test.arpa \
-out-lattice - -write-htk
...
J=0 S=0 E=2 W=b l=-0.85387 (*)
J=1 S=2 E=3 W=a l=-0.69897
J=2 S=3 E=1 W=!NULL l=-1
The correct probabilities are given by the ngram tool:
$ echo "b a </s>" | ngram -debug 2 -lm test.arpa -ppl -
...
p( b | <s> ) = [1gram] 0.700003 [ -0.1549 ]
p( a | b ...) = [1gram] 0.04 [ -1.39794 ]
p( </s> | a ...)= [1gram] 0.1 [ -1 ]
Did I miss something, or is there a bug in lattice-tool? It looks like
the lattice-tool adds the backoff probability BO(b) for the first word
(*) instead of the next. The bug seems to appear in toolkit versions
1.4.4 and 1.4.5 (OS is SuSE Linux 9.3 i686).
--
Teemu hirsimäki
More information about the SRILM-User
mailing list