Lidia Mangu's Thesis
stolcke at speech.sri.com
Fri Aug 26 13:28:34 PDT 2005
In message <1124983170.430de18227afc at www.webmail.mcgill.ca>you wrote:
> Is there a paper that describes the algorithm used in SRILM?
No, never got around to it. But, there is a rather detailed comment in
lattice/src/LatticeAlign.cc (around line 3500) that describes the
> Also, the reason I asked about Lidia Mangu's thesis and about the possibility
> a future reimplementation is because the current algorithm does not take into
> account timing information. The research I am conducting is dealing with
> recognition at the phone level. In creating sausages from the resulting
> lattices it is uncertain how much timing information could effect the results
> Perhaps timing information may be more of an issue in phone recognition then
> word recognition.
The goal of my algorithm is precisely not to rely on time information, for
- time information may not be available, thereby limiting the applicability
of the algorithm.
- word error scoring is oblivious to time alignment, therefore you should
not need time information to compute the hypothesis with minimal expected
Of course for certain applications time information might be desirable to use.
> Do you happen to know of any sausage creation software that takes into accoun
> the timing information?
Well, Lidia's algorithm actually does take time into acount when evaluating
the merging of word hypotheses into equivalence classes.
Two recent linear-time algorithms that do use time information are described in
A general algorithm for word graph matrix decomposition
D Hakkani-Tur, G Riccardi - ICASSP 2003
Improved Confusion Network Algorithm and Shortest Path Search from
Jian Xue, Yunxin Zhao, ICASSP 2005
(i don't have a link to the full paper)
As for modifying SRILM to use time information: it should not be too hard
because time information is already optionally maintained in the
hypotheses and confusion networks (enabled by lattice-tool -acoustic-mesh).
Internally it is encoded in the NBestWordInfo structures (NBest.h).
The cost of adding a word to a confusion set is computed by
WordMesh::alignError(). You would have to modify it to take time overlap
into account, and change its interface and calling functions to pass the
time information along with the word identifies.
More information about the SRILM-User