Error in lattice rescoring?
Andreas Stolcke
stolcke at speech.sri.com
Thu Oct 13 10:51:34 PDT 2005
In message <434E22CE.3090005 at hut.fi>you wrote:
> While working on lattices, I noticed that lattice-tool seems to give
> sometimes strange backoff probabilities when rescoring lattices. I have
> a simple 2-gram model test.arpa:
>
> \data\
> ngram 1=4
> ngram 2=1
>
> \1-grams:
> -99 <s>
> -1.00000 </s>
> -0.69897 a
> -0.15490 b -0.69897
>
> \2-grams:
> -0.09691 b </s>
>
> \end\
>
> and a simple HTK lattice file test.htk that has just words "b a </s>":
>
> VERSION=1.1
> base=10
> dir=f
> start=0 end=3
> N=4 L=3
> I=0
> I=1
> I=2
> I=3
> J=0 S=0 E=1 W=b
> J=1 S=1 E=2 W=a
> J=2 S=2 E=3 W=!NULL
>
> Rescoring gives funny probabilities for "a" and "b":
>
> $ lattice-tool -in-lattice test.htk -read-htk -lm test.arpa \
> -out-lattice - -write-htk
> ...
> J=0 S=0 E=2 W=b l=-0.85387 (*)
> J=1 S=2 E=3 W=a l=-0.69897
> J=2 S=3 E=1 W=!NULL l=-1
>
> The correct probabilities are given by the ngram tool:
>
> $ echo "b a </s>" | ngram -debug 2 -lm test.arpa -ppl -
> ...
> p( b | <s> ) = [1gram] 0.700003 [ -0.1549 ]
> p( a | b ...) = [1gram] 0.04 [ -1.39794 ]
> p( </s> | a ...)= [1gram] 0.1 [ -1 ]
>
> Did I miss something, or is there a bug in lattice-tool? It looks like
> the lattice-tool adds the backoff probability BO(b) for the first word
> (*) instead of the next. The bug seems to appear in toolkit versions
> 1.4.4 and 1.4.5 (OS is SuSE Linux 9.3 i686).
It's not a bug. It you add all the scores along the path for
<s> b a </s> you get -2.55284, which is the right score.
You can verify this with
echo "<s> b a </s>" | \
lattice-tool -in-lattice test-rescored.htk -read-htk -ppl - -debug 2
which traces the path and aggregate probabilities of the path through
the lattice.
Since the nodes for a and b correspond to backoff contexts, the weights
are assigned as follows:
transition weight
<s> -> b p(b) + bow(b)
b -> a p(a) + bow(a)
a -> </a> p(</s>)
It is more compact to assign the backoff weight to the transitions coming
INto the corresponding node, in case that node has multiple successors.
If you want to see the weight assignment you expect you can use the
lattice-tool -old-expansion option, but it can only handle up to 3-gram LMs.
The default algorithm is both more general and yields more compact lattices.
--Andreas
More information about the SRILM-User
mailing list