query regarding usage of SRILM toolkit

Wed Oct 4 12:41:47 PDT 2006

Lakshmi A wrote:
>
> Greetings!!!
>
> Thanks for the prompt reply. But the ideas you mentioned seems to be 
> for boundary marking when the whole sequence is correct. Our 
> recognition output is only 50% correct. That is we have a sequence of 
> syllables that are just 50% correct from which we need to extract the 
> words. The n-best results of the recognizer could be used to improve 
> the performance. We can have a lattice of syllable sequence where each 
> syllable has a n-best list.
> Now, the task is to find the best word sequence from this n-best 
> lattice. Do you have any similar programs. Please do reply.
>
> Thanks in Advance.
> Regards,
> Lakshmi
>
> On Fri, 29 Sep 2006, Andreas Stolcke wrote:
>
If your output if n-best, you can apply the disambig or hidden-ngram 
taggers to each of the hypotheses, and
then extract the 1-best segmentation by some criterion. 

If your output is in lattice format, things are more involved. You'd 
have to edit the lattices to insert nodes
representing the different tagging choices (e.g., 
boundary/no-boundary).  then rescore the lattice with
the tagging LM to extract the best hypothesis.

Andreas