Lidia Mangu's Thesis

Fri Aug 26 13:28:34 PDT 2005

In message <1124983170.430de18227afc at www.webmail.mcgill.ca>you wrote:
> Andreas,
> 
> Is there a paper that describes the algorithm used in SRILM?

No, never got around to it. But, there is a rather detailed comment in
lattice/src/LatticeAlign.cc (around line 3500) that describes the
algorithm.

> Also, the reason I asked about Lidia Mangu's thesis and about the possibility
>  of
> a future reimplementation is because the current algorithm does not take into
> account timing information. The research I am conducting is dealing with
> recognition at the phone level. In creating sausages from the resulting
> lattices it is uncertain how much timing information could effect the results
> .
> Perhaps timing information may be more of an issue in phone recognition then 
> in
> word recognition.

The goal of my algorithm is precisely not to rely on time information, for
two reasons: 

- time information may not be available, thereby limiting the applicability
  of the algorithm.
- word error scoring is oblivious to time alignment, therefore you should 
  not need time information to compute the hypothesis with minimal expected
  word error.

Of course for certain applications time information might be desirable to use.

> Do you happen to know of any sausage creation software that takes into accoun
> t
> the timing information?

Well, Lidia's algorithm actually does take time into acount when evaluating
the merging of word hypotheses into equivalence classes.

Two recent linear-time algorithms that do use time information are described in

	A general algorithm for word graph matrix decomposition
	D Hakkani-Tur, G Riccardi - ICASSP 2003
	http://akpublic.research.att.com/~dsp3/icassp03.pdf

	Improved Confusion Network Algorithm and Shortest Path Search from
	Word Lattice
	Jian Xue, Yunxin Zhao, ICASSP 2005
	http://www.icassp2005.com/Papers/ViewPapers_MS.asp?papernum=3537
	(i don't have a link to the full paper)

As for modifying SRILM to use time information: it should not be too hard
because time information is already optionally maintained in the 
hypotheses and confusion networks (enabled by lattice-tool -acoustic-mesh).
Internally it is encoded in the NBestWordInfo structures (NBest.h).
The cost of adding a word to a confusion set is computed by
WordMesh::alignError().  You would have to modify it to take time overlap
into account, and change its interface and calling functions to pass the
time information along with the word identifies.

--Andreas