Another lattice rescoring problem
Teemu Hirsimaki
teemu.hirsimaki at hut.fi
Tue Oct 18 05:02:36 PDT 2005
I ran into another problem with the lattice rescoring. I have two
simple HTK lattices (acoustic log-probabilities in parentheses):
test0.htk:
a(-1) --+--> c(-2) ------+--> b(-3)
| |
+--> !NULL(-2) --+
test1.htk:
a(-1) -----> !NULL(-2) -----> b(-3)
If I rescore the above lattices with a simple 2-gram language model
test.arpa (see the end of the mail for the example files), the
language model probability of the path "a b" is computed incorrectly
for the first lattice. In the second case, the probability is
correct:
$ echo "a b" | lattice-tool -in-lattice test0.htk -read-htk \
-lm test.arpa -ppl - -debug 2
...
p( a | <s> ) = [10] 7.43548e-13 [ -12.1287 ]
p( b | a ...) = [16] 8.00959e-07 [ -6.09639 ]
p( </s> | b ...) = [9] 6.88685e-14 [ -13.162 ]
0 zeroprobs, logprob= -31.3871 ppl= 2.8997e+10 ppl1= 4.93776e+15
$ echo "a b" | lattice-tool -in-lattice test1.htk -read-htk \
-lm test.arpa -ppl - -debug 2
...
p( a | <s> ) = [9] 2.57573e-17 [ -16.5891 ]
p( b | a ...) = [13] 8.00959e-07 [ -6.09639 ]
p( </s> | b ...) = [8] 6.88685e-14 [ -13.162 ]
0 zeroprobs, logprob= -35.8475 ppl= 8.89522e+11 ppl1= 8.38947e+17
It seems that the backoff probability BO(a) is missing from the first
case.
Next I tried to use the -no-nulls flag. Then I get correct language
mode probabilities for both lattices, but the acoustic probability is
incorrect, as the acoustic probability of the !NULL edge is discarded.
Should the general LM expansion handle !NULL edges correctly?
I also tried changing the !NULL words to a distinct word symbol and
specifying it with the -ignore-vocab flag to lattice-tool (tried
versions 1.4.5 and 1.4.6 beta). Then the acoustic probabilities are
preserved nicely, but again the backoff probability BO(a) is missing
from the first rescored lattice.
Did I miss something again, or is the above expected behaviour?
-Teemu
Here are the example files:
test0.htk:
VERSION=1.1
base=10
dir=f
lmscale=1 wdpenalty=0
start=0 end=3
N=4 L=4
I=0
I=1
I=2
I=3
J=0 S=0 E=1 W=a a=-1
J=1 S=1 E=2 W=!NULL a=-2
J=2 S=1 E=2 W=c a=-2
J=3 S=2 E=3 W=b a=-3
test1.htk:
VERSION=1.1
base=10
dir=f
lmscale=1 wdpenalty=0
start=0 end=3
N=4 L=3
I=0
I=1
I=2
I=3
J=0 S=0 E=1 W=a a=-1
J=1 S=1 E=2 W=!NULL a=-2
J=2 S=2 E=3 W=b a=-3
test.arpa:
\data\
ngram 1=5
ngram 2=5
\1-grams:
-99 <s> -7.34882
-2.10718 c -4.28966
-4.77987 a -4.46041
-5.81316 </s> -7.34882
-4.02326 b -2.07313
\2-grams:
-3.33947 c a
-1.08518 c </s>
-4.58511 c b
-0.000484286 a c
-1.67833 b c
\end\
More information about the SRILM-User
mailing list