[SRILM User List] Does keep-unk work with lattice-tool and htk format?
Lluís Formiga i Fanals
lluis.formiga at upc.edu
Tue Aug 21 14:43:05 PDT 2012
Hi Andreas,
Sorry to bother you with this old issue.
The two-step lattice-tool process worked perfectly. First the rescoring and second the conversion to CN.
But, unfortunately I have seen a few unks while rescoring the lattice (not as many as writing the mesh).
The command I use to rescore is:
lattice-tool -lm ../../lm/interpolated-lm.en -in-lattice wordlattice0.slf -read-htk -out-lattice out.slf -write-htk -keep-unk -print-sent-tags -htk-logbase 2.71828
And I find lines like these: (Whithin these lines the <unk> tag should be queit)
J=26 S=19 E=24 W=qu a=0 l=-13.8261
J=27 S=19 E=25 W=que a=0 l=-11.4986
J=28 S=19 E=26 W=<unk> a=0 l=-2.76367
J=29 S=19 E=27 W=quest a=0 l=-10.831
J=30 S=19 E=28 W=quiet a=0 l=-10.57
J=31 S=19 E=29 W=quit a=0 l=-10.4455
J=32 S=20 E=21 W=row a=0 l=-10.1076
J=33 S=21 E=24 W=qu a=0 l=-14.9448
J=34 S=21 E=25 W=que a=0 l=-12.6173
J=35 S=21 E=26 W=<unk> a=0 l=-3.88236
J=36 S=21 E=27 W=quest a=0 l=-11.9497
J=37 S=21 E=28 W=quiet a=0 l=-11.6887
J=38 S=21 E=29 W=quit a=0 l=-11.0153
J=39 S=22 E=19 W=arrow a=0 l=-12.6258
I have to say that I use the rescoring to give probabilities to the archs from misspelling corrections. So I do not have any acoustic scores. (I set all them equal).
Regards,
Lluís
El 23/05/2012, a les 2:05, Andreas Stolcke va escriure:
> On 5/22/2012 10:56 AM, Lluís Formiga i Fanals wrote:
>>
>> Hi,
>>
>> I was trying to execute the following command:
>>
>>
>> lattice-tool -in-lattice-list lattice_lists.txt -read-htk -lm
>> /veu4/usuaris24/lluisf/EMS/misspelling2012/lm/interpolated-lm.en
>> -write-mesh-dir out -keep-unk
>>
>> but I find that unks ("<unk>") are still on the written CN (-write-mesh).
>>
>> Does -keep-unk option work only for lattices output? Am I doing something wrong?
> No, the code is working as intended.
>
> The option is described as
> -keep-unk
> Treat out-of-vocabulary words as <unk> but preserve their labels in lattice output.
>
> What you are outputting is confusion networks, not lattices. In the CN building process, lattice nodes that are mapped to <unk> are treated as equivalent, and the word information is lost in the process.
>
> I would suggest that you simple do your lattice rescoring with -keep-unk, output the rescored lattices, and then run lattice-tool a second time without -keep-unk and without the -vocab option, so all word labels are preserved (all words are implicitly added to the vocabulary).
>
> Andreas
>
>
>>
>> Thanks,
>>
>> Lluís
>> <Adjunt de Mail.jpeg>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20120821/ff0e14a6/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image003.jpg
Type: image/jpeg
Size: 8771 bytes
Desc: not available
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20120821/ff0e14a6/attachment.jpg>
More information about the SRILM-User
mailing list