Another lattice rescoring problem

Andreas Stolcke stolcke at speech.sri.com
Fri Jan 6 14:43:54 PST 2006


Teemu,

thanks for pointing out this bug, and sorry for taking so
long to get back to you.  In the process of fixing 
this I actual found a number of issues, one of them 
affecting the lattice-tool -ppl function itself.
As you found out, the -no-nulls option masks the
problem to some extent, and that was the reason I hadn't noticed 
these problems before.

That should all be fixed now.
You can download the current beta version from the web site.
If no further problems surface in the coming weeks I'll
release this as the next version.

--Andreas 

In message <4354E45C.706 at hut.fi>you wrote:
> I ran into another problem with the lattice rescoring.  I have two
> simple HTK lattices (acoustic log-probabilities in parentheses):
> 
> test0.htk:
> 
>    a(-1) --+--> c(-2) ------+--> b(-3)
>            |                |
>            +--> !NULL(-2) --+
> 
> test1.htk:
> 
>    a(-1) -----> !NULL(-2) -----> b(-3)
> 
> If I rescore the above lattices with a simple 2-gram language model
> test.arpa (see the end of the mail for the example files), the
> language model probability of the path "a b" is computed incorrectl
> y
> for the first lattice.  In the second case, the probability is
> correct:
> 
> $ echo "a b" | lattice-tool -in-lattice test0.htk -read-htk \
>    -lm test.arpa -ppl - -debug 2
> ...
>          p( a | <s> )    = [10] 7.43548e-13 [ -12.1287 ]
>          p( b | a ...)   = [16] 8.00959e-07 [ -6.09639 ]
>          p( </s> | b ...)        = [9] 6.88685e-14 [ -13.162 ]
> 0 zeroprobs, logprob= -31.3871 ppl= 2.8997e+10 ppl1= 4.93776e+15
> 
> $ echo "a b" | lattice-tool -in-lattice test1.htk -read-htk \
>    -lm test.arpa -ppl - -debug 2
> ...
>          p( a | <s> )    = [9] 2.57573e-17 [ -16.5891 ]
>          p( b | a ...)   = [13] 8.00959e-07 [ -6.09639 ]
>          p( </s> | b ...)        = [8] 6.88685e-14 [ -13.162 ]
> 0 zeroprobs, logprob= -35.8475 ppl= 8.89522e+11 ppl1= 8.38947e+17
> 
> It seems that the backoff probability BO(a) is missing from the fir
> st
> case.
> 
> Next I tried to use the -no-nulls flag.  Then I get correct languag
> e
> mode probabilities for both lattices, but the acoustic probability 
> is
> incorrect, as the acoustic probability of the !NULL edge is discard
> ed.
> Should the general LM expansion handle !NULL edges correctly?
> 
> I also tried changing the !NULL words to a distinct word symbol and
> specifying it with the -ignore-vocab flag to lattice-tool (tried
> versions 1.4.5 and 1.4.6 beta).  Then the acoustic probabilities ar
> e
> preserved nicely, but again the backoff probability BO(a) is missin
> g
> from the first rescored lattice.
> 
> Did I miss something again, or is the above expected behaviour?
> 
> -Teemu
> 
> 
> Here are the example files:
> 
> test0.htk:
> 
> VERSION=1.1
> base=10
> dir=f
> lmscale=1 wdpenalty=0
> start=0 end=3
> N=4 L=4
> I=0
> I=1
> I=2
> I=3
> J=0 S=0 E=1 W=a   a=-1
> J=1 S=1 E=2 W=!NULL a=-2
> J=2 S=1 E=2 W=c   a=-2
> J=3 S=2 E=3 W=b   a=-3
> 
> 
> test1.htk:
> 
> VERSION=1.1
> base=10
> dir=f
> lmscale=1 wdpenalty=0
> start=0 end=3
> N=4 L=3
> I=0
> I=1
> I=2
> I=3
> J=0 S=0 E=1 W=a   a=-1
> J=1 S=1 E=2 W=!NULL a=-2
> J=2 S=2 E=3 W=b   a=-3
> 
> 
> test.arpa:
> 
> \data\
> ngram 1=5
> ngram 2=5
> 
> \1-grams:
> -99 <s> -7.34882
> -2.10718 c -4.28966
> -4.77987 a -4.46041
> -5.81316 </s> -7.34882
> -4.02326 b -2.07313
> 
> \2-grams:
> -3.33947 c a
> -1.08518 c </s>
> -4.58511 c b
> -0.000484286 a c
> -1.67833 b c
> 
> \end\




More information about the SRILM-User mailing list