[SRILM User List] Does keep-unk work with lattice-tool and htk format?
Andreas Stolcke
stolcke at icsi.berkeley.edu
Tue May 22 17:05:33 PDT 2012
On 5/22/2012 10:56 AM, Lluís Formiga i Fanals wrote:
> Hi,
>
> I was trying to execute the following command:
>
> lattice-tool -in-lattice-list lattice_lists.txt -read-htk -lm
> /veu4/usuaris24/lluisf/EMS/misspelling2012/lm/interpolated-lm.en
> -write-mesh-dir out -keep-unk
>
> but I find that unks ("<unk>") are still on the written CN (-write-mesh).
>
> Does -keep-unk option work only for lattices output? Am I doing
> something wrong?
No, the code is working as intended.
The option is described as
-keep-unk
Treat out-of-vocabulary words as <unk> but preserve their
labels in lattice output.
What you are outputting is confusion networks, not lattices. In the CN
building process, lattice nodes that are mapped to <unk> are treated as
equivalent, and the word information is lost in the process.
I would suggest that you simple do your lattice rescoring with
-keep-unk, output the rescored lattices, and then run lattice-tool a
second time without -keep-unk and without the -vocab option, so all word
labels are preserved (all words are implicitly added to the vocabulary).
Andreas
>
> Thanks,
>
> Lluís
> **
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20120522/3aafc4dc/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/jpeg
Size: 8771 bytes
Desc: not available
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20120522/3aafc4dc/attachment.jpe>
More information about the SRILM-User
mailing list