[SRILM User List] lattice-tool related issues

Anoop Deoras adeoras at jhu.edu
Mon Aug 2 10:33:17 PDT 2010


Hello,

  I am trying to rescore htk lattices using lattice-tool and am  
running into following issues:

1. I pass a 3gm language model and a vocabulary file to rescore the  
lattice (encoding bigram information) and
then write back the updated and expanded lattice back in the htk format.

However, when I specify -unk and -keep-unk flags, the OOV words gets  
mapped to unk without preserving the
original label. I was under the impression that -keep-unk would  
preserve the label of the OOV word, but it does not do so.

2. Before I rescore the lattice, I want to split some words (multiword  
units). The multiwords are connected by an
underscore character. I hence use the flags,  -split-multiwords -multi- 
char _

All goes well, as long as I do not use -unk -keep-unk flag in  
conjunction with -split-multiwords . If I use -unk -keep-unk flag
(for point 1 above) and also use -split-multiwords flags, then the  
multiword functionality does not work moreover the OOV
words get mapped to <unk>.

I should point out that the multi-word unit is NOT in my vocabulary  
but after the split, all the individual words are found
in the vocabulary. Hence, I am suspecting that the functionality for  
the flag -unk takes place before the splitting
and since no multiword unit is in the vocabulary, the -split- 
multiwords functionality does not have
anything to split.

I was wondering if there is anyway we can invoke split-multiword  
functionality before mapping
unk words ?

I apologize if I am not understanding the lattice-tool well enough and  
am passing wrong arguments in the first place.

Thanks and Regards
-Anoop


More information about the SRILM-User mailing list