[SRILM User List] Problems w/ misspellings, CN and lattice-tool
Andreas Stolcke
stolcke at icsi.berkeley.edu
Tue Jun 21 11:11:47 PDT 2011
Lluís Formiga i Fanals wrote:
> Dear all,
>
> I am trying to implement the CN based misspelling correction
> method published by Bertoldi et al. 2010 (full citation is available
> at the end of this e-mail). However, I am sticked at step number 4
> which involves the generation of a word-based CN by means of
> lattice-tool of SRILM toolkit.
>
> Once I have set the unifilar word lattices altogether in SLF
> format I call lattice-tool through this command:
>
> lattice-tool -in-lattice wordlattice.slf -read-htk -lm lm/en.lm
> -write-mesh wordlattice.cn
>
> However, the fact of including the language model may destroy
> completely the original CN form if the input lattice is considerably
> long (>15 nodes). I have tried to scale the language model impact
> through -htk-scale and -htk-wdpenalty options. But even though I set
> the htk-scale and htk-wdpenalty options to 0 the CN still gets
> destroyed. The only way I can save the CN structure is avoiding
> completely the -lm option. But then the BLEU score of the translations
> decrease considerably.
>
> Could anyone give me some clues in order to keep track of the
> problem I may have? I can provide slf lattice sample alongside
> dot-generated images of intact and destroyed CNs.
As per the lattice-tool(1) man page, the sequence of processing steps is
such that the -lm option triggers expansion of the CNs into general
lattices, so of course whatever special properties your original CNs had
might be lost. I haven't read the original paper, so I don't know what
those properties are. Can't you contact the author to find out more
specifically how lattice-tool was used?
Andreas
>
> Regards,
>
> Lluís Formiga
>
>
>
> [Bertoldi et al. 2010] Nicola Bertoldi, Mauro Cettolo, and Marcello
> Federico. 2010. Statistical machine translation of texts with
> misspelled words. In /Human Language Technologies: The 2010 Annual
> Conference of the North American Chapter of the Association for
> Computational Linguistics/ (HLT '10). Association for Computational
> Linguistics, Stroudsburg, PA, USA, 412-419.
> --
> ------------------------------------------------------------------------
>
> _______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://www.speech.sri.com/mailman/listinfo/srilm-user
More information about the SRILM-User
mailing list