[SRILM User List] Computing nbest-error rate from HTK MLF files
Andreas Stolcke
stolcke at icsi.berkeley.edu
Wed Apr 24 15:43:36 PDT 2013
On 4/23/2013 10:22 PM, E wrote:
> Thanks for the response Andreas.
>
> I will share my script once its ready.
>
> This "oracle" WER seems like a very crude way of computing nbest-error
> to me. Suppose a reference word is located in [0, 1] seconds, one can
> look at all the alternatives in the nbest list (all words that
> significantly overlap with reference word) and choose the word that
> best matches.
>
> So basically one will extract "most accurate" segments from each nbest
> hypothesis in order to get a new "oracle" hypothesis.
>
> Do you know if people have done that kind of thing while computing
> nbest error?
>
> Thanks,
> Ethan
What you suggest is not how nbest WER is commonly defined. However,
taking different pieces from different hypotheses and glueing them
together for an overall better result is the idea behind "confusion
networks" (aka word sausages, or word meshes in SRILM terminology).
You can read more about confusion networks at
http://arxiv.org/pdf/cs/0010012 .
The nbest-lattice tool in SRILM builds confusion networks from nbest
lists. It also has functionality to compute the lowest WER and best
path through the network.
Andreas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20130424/159e117e/attachment.html>
More information about the SRILM-User
mailing list