[SRILM User List] Questions on conversion word lattice to mesh

Fri Sep 26 13:30:43 PDT 2014

On 9/25/2014 11:28 PM, Максим Кореневский wrote:
> Hi, all,
>
> I use lattice-tool.exe to convert word lattices (in HTK-like SLF 
> format) obtained from recognition pass into a word confusion networks 
> (meshes). SLFs contains both acoustic and language model scores and 
> lm_scale parameter (used by recognizer) in its header. Word insertion 
> penalty was set to 0.
>
> When I scale both acoustic and LM scores with a constant factor C, I 
> see that the 1-best path through mesh depends strongly on it. When C 
> is large the mesh 1-best sentence coincides to word lattice 1-best 
> sentence (which is in turn recognizer 1-best output), but when C goes 
> down to zero, WER of mesh 1-best sequence increases monotonically.
What you're seeing is expected.   In fact, the scaling of of scores can 
be achieved using the lattice-tool -posterior-scale option, you don't 
have to do it yourself by manipulating the scores in the lattices.

        -posterior-scale S
               Scale the transition weights by dividing by S for the 
purpose of
               posterior probability computation.  If the input weights  
repre-
               sent combined acoustic-language model scores then this 
should be
               approximately the language model weight  of  the 
recognizer  in
               order  to  avoid  overly peaked posteriors (the default 
value is
               8).

> I believed that optimal value of this factor should be about 
> 1/lm_scale (as proposed in several papers, for example, "Confidence 
> measures for Large Vocabulary Speech Recognition" by F.Wessel et al., 
> 2001), but I observe an average WER increase about 5% absolute over 
> large number of files for such factor value.
Now the default posterior-scale (see above) is equal to the LM score 
weight,  just as advocated in the paper you mention.  BTW, the rationale 
for this choice can be found in our earlier work on expected error 
minimization, e.g., in section 3.6 of this paper 
<http://www.speech.sri.com/cgi-bin/run-distill?ftp:papers/eurospeech99-consensus.ps.gz>.
So if you are scaling the scores yourself and also use the default 
-posterior-scale then you would end up with the wrong scaling.

If you are not seeing a lower WER using the default posterior scaling 
then you probably won't see a gain from confusion networks on your task. 
This could be for various reasons, e..g, the lattices are too thin, or 
the utterances too short.

Andreas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20140926/0d60c092/attachment.html>