[SRILM User List] Problems w/ misspellings, CN and lattice-tool

Lluís Formiga i Fanals lluis.formiga at upc.edu
Tue Jun 21 01:52:15 PDT 2011


Dear all,

     I am trying to implement the CN based misspelling correction method 
published by Bertoldi et al. 2010 (full citation is available at the end 
of this e-mail). However, I am sticked  at step number 4 which involves 
the generation of a word-based CN by means of lattice-tool of SRILM toolkit.

     Once I have set the unifilar word lattices altogether in SLF format 
I call lattice-tool through this command:

     lattice-tool -in-lattice wordlattice.slf -read-htk -lm lm/en.lm  
-write-mesh wordlattice.cn

     However, the fact of including the language model may destroy 
completely the original CN form if the input lattice is considerably 
long (>15 nodes). I have tried to scale the language model impact 
through -htk-scale and -htk-wdpenalty options. But even though I set the 
htk-scale and htk-wdpenalty options to 0 the CN still gets destroyed. 
The only way I can save the CN structure is avoiding completely the -lm 
option. But then the BLEU score of the translations decrease considerably.

     Could anyone give me some clues in order to keep track of the 
problem I may have? I can provide slf lattice sample alongside 
dot-generated images of intact and destroyed CNs.

     Regards,

Lluís Formiga



[Bertoldi et al. 2010] Nicola Bertoldi, Mauro Cettolo, and Marcello 
Federico. 2010. Statistical machine translation of texts with misspelled 
words. In/Human Language Technologies: The 2010 Annual Conference of the 
North American Chapter of the Association for Computational 
Linguistics/(HLT '10). Association for Computational Linguistics, 
Stroudsburg, PA, USA, 412-419.
-- 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20110621/04fd654f/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Firma.png
Type: image/png
Size: 24739 bytes
Desc: not available
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20110621/04fd654f/attachment.png>


More information about the SRILM-User mailing list