[SRILM User List] Questions on conversion word lattice to mesh
Andreas Stolcke
stolcke at icsi.berkeley.edu
Fri Sep 26 13:30:43 PDT 2014
On 9/25/2014 11:28 PM, Максим Кореневский wrote:
> Hi, all,
>
> I use lattice-tool.exe to convert word lattices (in HTK-like SLF
> format) obtained from recognition pass into a word confusion networks
> (meshes). SLFs contains both acoustic and language model scores and
> lm_scale parameter (used by recognizer) in its header. Word insertion
> penalty was set to 0.
>
> When I scale both acoustic and LM scores with a constant factor C, I
> see that the 1-best path through mesh depends strongly on it. When C
> is large the mesh 1-best sentence coincides to word lattice 1-best
> sentence (which is in turn recognizer 1-best output), but when C goes
> down to zero, WER of mesh 1-best sequence increases monotonically.
What you're seeing is expected. In fact, the scaling of of scores can
be achieved using the lattice-tool -posterior-scale option, you don't
have to do it yourself by manipulating the scores in the lattices.
-posterior-scale S
Scale the transition weights by dividing by S for the
purpose of
posterior probability computation. If the input weights
repre-
sent combined acoustic-language model scores then this
should be
approximately the language model weight of the
recognizer in
order to avoid overly peaked posteriors (the default
value is
8).
> I believed that optimal value of this factor should be about
> 1/lm_scale (as proposed in several papers, for example, "Confidence
> measures for Large Vocabulary Speech Recognition" by F.Wessel et al.,
> 2001), but I observe an average WER increase about 5% absolute over
> large number of files for such factor value.
Now the default posterior-scale (see above) is equal to the LM score
weight, just as advocated in the paper you mention. BTW, the rationale
for this choice can be found in our earlier work on expected error
minimization, e.g., in section 3.6 of this paper
<http://www.speech.sri.com/cgi-bin/run-distill?ftp:papers/eurospeech99-consensus.ps.gz>.
So if you are scaling the scores yourself and also use the default
-posterior-scale then you would end up with the wrong scaling.
If you are not seeing a lower WER using the default posterior scaling
then you probably won't see a gain from confusion networks on your task.
This could be for various reasons, e..g, the lattices are too thin, or
the utterances too short.
Andreas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20140926/0d60c092/attachment.html>
More information about the SRILM-User
mailing list