lattice-tool to rescore a lattice
Stephane Huet
shuet at irisa.fr
Wed Jul 11 08:37:01 PDT 2007
>
>
>What you can verify is that the lattice as a whole assigns the correct
>log probabiliy to a complete path through the lattice.
>For this purpose, the lattice-tool -ppl option allows you to treat the
>lattice as a language model, and you can feed it sentences.
>The -debug 2 option displays scores at the word level.
>
>
>
As you suggested, I compared the result of lattice-tool -ppl given for a
lattice and the result of ngram -ppl. In both cases, I obtained the same
logprob for the complete sentence. However, the logprobs at the word
level are different, which I have already noticed in the linguistic
scores of the HTK lattices.
Here are the results I obtained:
> ngram -lm <LM> -order 4 -ppl test.ppl -debug 2
appeler les opérateurs marocains
p( appeler | <s> ) = [2gram] 6.15184e-06 [ -5.211 ]
p( les | appeler ...) = [2gram] 0.0759806 [ -1.1193 ]
p( opérateurs | les ...) = [2gram] 0.000738462 [ -3.13167 ]
p( marocains | opérateurs ...) = [2gram] 0.00450344 [ -2.34646 ]
p( </s> | marocains ...) = [2gram] 0.186189 [ -0.730047 ]
1 sentences, 4 words, 0 OOVs
0 zeroprobs, logprob= -12.5385 ppl= 321.879 ppl1= 1363.38
file test.ppl: 1 sentences, 4 words, 0 OOVs
0 zeroprobs, logprob= -12.5385 ppl= 321.879 ppl1= 1363.38
> lattice-tool -ppl test.ppl -in-lattice <lattice> -read-htk -debug 2
-order 4
appeler les opérateurs marocains
p( appeler | <s> ) = [400][405][385][386][390] 5.1187e-06 [
-5.29084 ]
p( les | appeler ...) = [512][519][531][532][539] 0.0623089 [
-1.20545 ]
p( opérateurs | les ...) =
[1120][1121][1122][1067][1068][1069][977][978][979][965][966][967][879][880][881]
0.00107221 [ -2.96972 ]
p( marocains | opérateurs ...) =
[1123][1124][1125][1126][1127][1128][1129][1130][1131][980][981][982][983][984][985][986][987][988][882][883][884][885][886][887][888][889][890][1070][1071][1072][1073][1074][1075][1076][1077][1078][968][969][970][971][972][973][974][975][976]
0.00454559 [ -2.34241 ]
p( </s> | marocains ...) =
[1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1]
0.186188 [ -0.730047 ]
Lattice states: 0 386 532 966 973 1
1 sentences, 4 words, 0 OOVs
0 zeroprobs, logprob= -12.5385 ppl= 321.88 ppl1= 1363.38
The differences might be linked to the way the backoffs are taken into
account in the linguistic scores in the lattice. With the few changes I
previously did in the source code, the logprobs seem more correct at the
word level:
> lattice-tool -ppl test.ppl -in-lattice <lattice> -read-htk -debug 2
-order 4
appeler les opérateurs marocains
p( appeler | <s> ) = [474][479][459][460][464] 6.15177e-06
[ -5.211 ]
p( les | appeler ...) = [586][593][605][606][613] 0.0759801 [
-1.1193 ]
p( opérateurs | les ...) =
[966][967][968][1243][1244][1245][1164][1165][1166][1064][1065][1066][1052][1053][1054]
0.000738466 [ -3.13167 ]
p( marocains | opérateurs ...) =
[1055][1056][1057][1058][1059][1060][1061][1062][1063][1167][1168][1169][1170][1171][1172][1173][1174][1175][1246][1247][1248][1249][1250][1251][1252][1253][1254][1067][1068][1069][1070][1071][1072][1073][1074][1075][969][970][971][975][976][977][972][973][974]
0.00450339 [ -2.34646 ]
p( </s> | marocains ...) =
[1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1]
0.186188 [ -0.730047 ]
Lattice states: 0 464 613 967 973 1
1 sentences, 4 words, 0 OOVs
0 zeroprobs, logprob= -12.5385 ppl= 321.881 ppl1= 1363.39
Anyway, what I need is the scores provided by lattice-tool at the
sentence level and they are correct.
Thanks for your answer.
Stéphane
More information about the SRILM-User
mailing list