lattice-tool to rescore a lattice

Wed Jul 11 08:37:01 PDT 2007

>
>
>What you can verify is that the lattice as a whole assigns the correct
>log probabiliy to a complete path through the lattice.
>For this purpose, the lattice-tool -ppl option allows you to treat the
>lattice as a language model, and you can feed it sentences.
>The -debug 2 option displays scores at the word level.
>
>  
>

As you suggested, I compared the result of lattice-tool -ppl given for a 
lattice and the result of ngram -ppl. In both cases, I obtained the same 
logprob for the complete sentence. However, the logprobs at the word 
level are different, which I have already noticed in the linguistic 
scores of the HTK lattices.

Here are the results I obtained:

 > ngram -lm <LM> -order 4 -ppl test.ppl -debug 2
appeler les opérateurs marocains
        p( appeler | <s> )      = [2gram] 6.15184e-06 [ -5.211 ]
        p( les | appeler ...)   = [2gram] 0.0759806 [ -1.1193 ]
        p( opérateurs | les ...)        = [2gram] 0.000738462 [ -3.13167 ]
        p( marocains | opérateurs ...)  = [2gram] 0.00450344 [ -2.34646 ]
        p( </s> | marocains ...)        = [2gram] 0.186189 [ -0.730047 ]
1 sentences, 4 words, 0 OOVs
0 zeroprobs, logprob= -12.5385 ppl= 321.879 ppl1= 1363.38

file test.ppl: 1 sentences, 4 words, 0 OOVs
0 zeroprobs, logprob= -12.5385 ppl= 321.879 ppl1= 1363.38

 > lattice-tool -ppl test.ppl -in-lattice <lattice> -read-htk -debug 2 
-order 4
appeler les opérateurs marocains
        p( appeler | <s> )      = [400][405][385][386][390] 5.1187e-06 [ 
-5.29084 ]
        p( les | appeler ...)   = [512][519][531][532][539] 0.0623089 [ 
-1.20545 ]
        p( opérateurs | les ...)        = 
[1120][1121][1122][1067][1068][1069][977][978][979][965][966][967][879][880][881] 
0.00107221 [ -2.96972 ]
        p( marocains | opérateurs ...)  = 
[1123][1124][1125][1126][1127][1128][1129][1130][1131][980][981][982][983][984][985][986][987][988][882][883][884][885][886][887][888][889][890][1070][1071][1072][1073][1074][1075][1076][1077][1078][968][969][970][971][972][973][974][975][976] 
0.00454559 [ -2.34241 ]
        p( </s> | marocains ...)        = 
[1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1] 
0.186188 [ -0.730047 ]
Lattice states: 0 386 532 966 973 1
1 sentences, 4 words, 0 OOVs
0 zeroprobs, logprob= -12.5385 ppl= 321.88 ppl1= 1363.38

The differences might be linked to the way the backoffs are taken into 
account in the linguistic scores in the lattice. With the few changes I 
previously did in the source code, the logprobs seem more correct at the 
word level:

 > lattice-tool -ppl test.ppl -in-lattice <lattice> -read-htk -debug 2 
-order 4
appeler les opérateurs marocains
        p( appeler | <s> )      = [474][479][459][460][464] 6.15177e-06 
[ -5.211 ]
        p( les | appeler ...)   = [586][593][605][606][613] 0.0759801 [ 
-1.1193 ]
        p( opérateurs | les ...)        = 
[966][967][968][1243][1244][1245][1164][1165][1166][1064][1065][1066][1052][1053][1054] 
0.000738466 [ -3.13167 ]
        p( marocains | opérateurs ...)  = 
[1055][1056][1057][1058][1059][1060][1061][1062][1063][1167][1168][1169][1170][1171][1172][1173][1174][1175][1246][1247][1248][1249][1250][1251][1252][1253][1254][1067][1068][1069][1070][1071][1072][1073][1074][1075][969][970][971][975][976][977][972][973][974] 
0.00450339 [ -2.34646 ]
        p( </s> | marocains ...)        = 
[1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1] 
0.186188 [ -0.730047 ]
Lattice states: 0 464 613 967 973 1
1 sentences, 4 words, 0 OOVs
0 zeroprobs, logprob= -12.5385 ppl= 321.881 ppl1= 1363.39

Anyway, what I need is the scores provided by lattice-tool at the 
sentence level and they are correct.

Thanks for your answer.

Stéphane