[SRILM User List] Does keep-unk work with lattice-tool and htk format?

Andreas Stolcke stolcke at icsi.berkeley.edu
Tue May 22 17:05:33 PDT 2012


On 5/22/2012 10:56 AM, Lluís Formiga i Fanals wrote:
> Hi,
>
> I was trying to execute the following command:
>
> lattice-tool -in-lattice-list lattice_lists.txt -read-htk -lm 
> /veu4/usuaris24/lluisf/EMS/misspelling2012/lm/interpolated-lm.en 
> -write-mesh-dir out -keep-unk
>
> but I find that unks ("<unk>") are still on the written CN (-write-mesh).
>
> Does -keep-unk option work only for lattices output? Am I doing 
> something wrong?
No, the code is working as intended.

The option is described as
        -keep-unk
               Treat out-of-vocabulary words as <unk> but preserve their 
labels in lattice output.

What you are outputting is confusion networks, not lattices.  In the CN 
building process, lattice nodes that are mapped to <unk>  are treated as 
equivalent, and the word information is lost in the process.

I would suggest that you simple do your lattice rescoring with 
-keep-unk, output the rescored lattices, and then run lattice-tool a 
second time without -keep-unk and without the -vocab option, so all word 
labels are preserved (all words are implicitly added to the vocabulary).

Andreas


>
> Thanks,
>
> Lluís
> **
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20120522/3aafc4dc/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/jpeg
Size: 8771 bytes
Desc: not available
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20120522/3aafc4dc/attachment.jpe>


More information about the SRILM-User mailing list