[SRILM User List] lattice-tool related issues
Anoop Deoras
adeoras at jhu.edu
Mon Aug 2 10:33:17 PDT 2010
Hello,
I am trying to rescore htk lattices using lattice-tool and am
running into following issues:
1. I pass a 3gm language model and a vocabulary file to rescore the
lattice (encoding bigram information) and
then write back the updated and expanded lattice back in the htk format.
However, when I specify -unk and -keep-unk flags, the OOV words gets
mapped to unk without preserving the
original label. I was under the impression that -keep-unk would
preserve the label of the OOV word, but it does not do so.
2. Before I rescore the lattice, I want to split some words (multiword
units). The multiwords are connected by an
underscore character. I hence use the flags, -split-multiwords -multi-
char _
All goes well, as long as I do not use -unk -keep-unk flag in
conjunction with -split-multiwords . If I use -unk -keep-unk flag
(for point 1 above) and also use -split-multiwords flags, then the
multiword functionality does not work moreover the OOV
words get mapped to <unk>.
I should point out that the multi-word unit is NOT in my vocabulary
but after the split, all the individual words are found
in the vocabulary. Hence, I am suspecting that the functionality for
the flag -unk takes place before the splitting
and since no multiword unit is in the vocabulary, the -split-
multiwords functionality does not have
anything to split.
I was wondering if there is anyway we can invoke split-multiword
functionality before mapping
unk words ?
I apologize if I am not understanding the lattice-tool well enough and
am passing wrong arguments in the first place.
Thanks and Regards
-Anoop
More information about the SRILM-User
mailing list