[SRILM User List] Computing nbest-error rate from HTK MLF files

Wed Apr 24 15:43:36 PDT 2013

On 4/23/2013 10:22 PM, E wrote:
> Thanks for the response Andreas.
>
> I will share my script once its ready.
>
> This "oracle" WER seems like a very crude way of computing nbest-error 
> to me. Suppose a reference word is located in [0,  1] seconds, one can 
> look at all the alternatives in the nbest list (all words that 
> significantly overlap with reference word) and choose the word that 
> best matches.
>
> So basically one will extract "most accurate" segments from each nbest 
> hypothesis in order to get a new "oracle" hypothesis.
>
> Do you know if people have done that kind of thing while computing 
> nbest error?
>
> Thanks,
> Ethan
What you suggest is not how nbest WER is commonly defined.  However, 
taking different pieces from different hypotheses and glueing them 
together for an overall better result is the idea behind "confusion 
networks" (aka word sausages, or word meshes in SRILM terminology).

You can read more about confusion networks at 
http://arxiv.org/pdf/cs/0010012 .

The nbest-lattice tool in SRILM builds confusion networks from nbest 
lists.   It also has functionality to compute the lowest WER and best 
path through the network.

Andreas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20130424/159e117e/attachment.html>