[SRILM User List] Computing nbest-error rate from HTK MLF files

Tue Apr 23 22:05:22 PDT 2013

On 4/23/2013 7:20 PM, E wrote:
> Hello all,
>
> I have n-best results from HTK for lots of wav files in a single MLF 
> file. I want to compute nbest-error rate using SRILM toolkit. I would 
> like to know -
>
> 1. Is there any direct way to convert HTK n-best MLF format into SRILM 
> n-best format 
> (http://www.speech.sri.com/projects/srilm/manpages/nbest-format.5.html). 
> Or should I write a script for that?
Sorry, there is no standard conversion tool I am aware of. If you write 
one you should share it with the list.
>
> 2. What does the n-best error actually mean in SRILM? Suppose I have 
> 10 hypotheses for each wav file. I am familiar with usual way of 
> reporting errors using WER.
>
> Does nbest correctness mean the number of words in reference that 
> occur ANYWHERE in the nbest-list? Does nbest deletions mean the number 
> of words in reference that occur NOWHERE in the nbest-list? What does 
> nbest substitutions mean?
nbesterror is the lowest WER achievable by picking the best hypothesis 
(the one giving the lowest number of errors) from each nbest list. 
That's why it's also called the"oracle" error rate, asan oracle 
magically told you which hypothesis to pick to give the best result.
The number of deletions, substitutions, etc. in this context is that the 
number of deleted, substituted, etc. , words relative to the reference 
found in that oracle hypothesis.

Andreas


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20130423/80c9d8e1/attachment.html>