[SRILM User List] Does keep-unk work with lattice-tool and htk format?
Andreas Stolcke
stolcke at icsi.berkeley.edu
Fri Aug 24 00:07:40 PDT 2012
Congratulations, you found a bug! The patch attached to this message (to
HTKLattice.cc) should fix this problem.
Andreas
On 8/21/2012 2:43 PM, Lluís Formiga i Fanals wrote:
> Hi Andreas,
>
> Sorry to bother you with this old issue.
>
> The two-step lattice-tool process worked perfectly. First the
> rescoring and second the conversion to CN.
>
> But, unfortunately I have seen a few unks while rescoring the lattice
> (not as many as writing the mesh).
>
> The command I use to rescore is:
>
> lattice-tool -lm ../../lm/interpolated-lm.en -in-lattice
> wordlattice0.slf -read-htk -out-lattice out.slf-write-htk -keep-unk
> -print-sent-tags -htk-logbase 2.71828
>
> And I find lines like these: (Whithin these lines the <unk> tag should
> be queit)
>
> J=26 S=19 E=24 W=qu a=0 l=-13.8261 J=27 S=19 E=25 W=que a=0 l=-11.4986
> J=28 S=19 E=26 W=<unk> a=0 l=-2.76367 J=29 S=19 E=27 W=quest a=0
> l=-10.831 J=30 S=19 E=28 W=quiet a=0 l=-10.57 J=31 S=19 E=29 W=quit
> a=0 l=-10.4455 J=32 S=20 E=21 W=row a=0 l=-10.1076 J=33 S=21 E=24 W=qu
> a=0 l=-14.9448 J=34 S=21 E=25 W=que a=0 l=-12.6173 J=35 S=21 E=26
> W=<unk> a=0 l=-3.88236 J=36 S=21 E=27 W=quest a=0 l=-11.9497 J=37 S=21
> E=28 W=quiet a=0 l=-11.6887 J=38 S=21 E=29 W=quit a=0 l=-11.0153 J=39
> S=22 E=19 W=arrow a=0 l=-12.6258
>
> I have to say that I use the rescoring to give probabilities to the
> archs from misspelling corrections. So I do not have any acoustic
> scores. (I set all them equal).
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20120824/8ac46647/attachment.html>
-------------- next part --------------
*** lattice/src/HTKLattice.cc 3 Aug 2012 01:11:34 -0000 1.60
--- lattice/src/HTKLattice.cc 24 Aug 2012 07:02:40 -0000
***************
*** 1769,1776 ****
toNode->word == vocab.seIndex()) ||
toNode->word == Vocab_None) ?
HTK_null_word :
! (node->htkinfo && node->htkinfo->wordLabel ?
! node->htkinfo->wordLabel :
vocab.getWord(toNode->word)),
htkheader.useQuotes);
}
--- 1769,1776 ----
toNode->word == vocab.seIndex()) ||
toNode->word == Vocab_None) ?
HTK_null_word :
! (toNode->htkinfo && toNode->htkinfo->wordLabel ?
! toNode->htkinfo->wordLabel :
vocab.getWord(toNode->word)),
htkheader.useQuotes);
}
More information about the SRILM-User
mailing list