[SRILM User List] Problems w/ misspellings, CN and lattice-tool

Andreas Stolcke stolcke at icsi.berkeley.edu
Tue Jun 21 11:11:47 PDT 2011


Lluís Formiga i Fanals wrote:
> Dear all,
>
>     I am trying to implement the CN based misspelling correction 
> method published by Bertoldi et al. 2010 (full citation is available 
> at the end of this e-mail). However, I am sticked  at step number 4 
> which involves the generation of a word-based CN by means of 
> lattice-tool of SRILM toolkit.
>
>     Once I have set the unifilar word lattices altogether in SLF 
> format I call lattice-tool through this command:
>
>     lattice-tool -in-lattice wordlattice.slf -read-htk -lm lm/en.lm  
> -write-mesh wordlattice.cn
>
>     However, the fact of including the language model may destroy 
> completely the original CN form if the input lattice is considerably 
> long (>15 nodes). I have tried to scale the language model impact 
> through -htk-scale and -htk-wdpenalty options. But even though I set 
> the htk-scale and htk-wdpenalty options to 0 the CN still gets 
> destroyed. The only way I can save the CN structure is avoiding 
> completely the -lm option. But then the BLEU score of the translations 
> decrease considerably.
>
>     Could anyone give me some clues in order to keep track of the 
> problem I may have? I can provide slf lattice sample alongside 
> dot-generated images of intact and destroyed CNs.
As per the lattice-tool(1) man page, the sequence of processing steps is 
such that the -lm option triggers expansion of the CNs into general 
lattices, so of course whatever special properties your original CNs had 
might be lost.  I haven't read the original paper, so I don't know what 
those properties are.  Can't you contact the author to find out more 
specifically how lattice-tool was used?

Andreas

>
>     Regards,
>
> Lluís Formiga
>
>
>
> [Bertoldi et al. 2010] Nicola Bertoldi, Mauro Cettolo, and Marcello 
> Federico. 2010. Statistical machine translation of texts with 
> misspelled words. In /Human Language Technologies: The 2010 Annual 
> Conference of the North American Chapter of the Association for 
> Computational Linguistics/ (HLT '10). Association for Computational 
> Linguistics, Stroudsburg, PA, USA, 412-419.
> -- 
> ------------------------------------------------------------------------
>
> _______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://www.speech.sri.com/mailman/listinfo/srilm-user



More information about the SRILM-User mailing list