[SRILM User List] How to decode with an interpolated class-based LM with lattice-tool

Mon May 14 20:23:32 PDT 2012

I have tried the -simple-classes option. It seems that my models do not satisfy its requirements as I get such warning:
...
./LM/xx.class: line 6122: word holidays has multiple class memberships
./LM/xx.class: line 6122: word still has multiple class memberships
./LM/xx.class: line 6122: word five has multiple class memberships
./LM/xx.class: line 6122: word form has multiple class memberships
...
I merged word classes for LM1 LM2 and LM3 from three different corpus separately. So it can not avoid that they have some duplicate words between each other. And I want to use interpolated class-based LM in my decoding task. How to operate it? Thank you

At 2012-05-15 01:01:28,"Andreas Stolcke" <stolcke at icsi.berkeley.edu> wrote:
On 5/14/2012 4:41 AM, mvp-songyoung wrote:
Hi,I meet a question when lattice rescoring with an interpolated class-based lm with lattice-tool. This class-based LM was trained by interpolating three other different class-based LMs:LM1 c! ontian 3500 words and merged into 350 clases;LM2 contain 2500 words and merged into 250 classes ; LM3 contian 110 words and merged into 10 classes.  I have renamed the class definitions for three class-based LMs before training and interpolating them.and I also merged the class definitions to a single file before decoding. My decoding comand is as follows:

lattice-tool -read-htk -viterbi-decode -order 4 -lm class-4gram.lm -classes <class> -in-lattice-list lattice.scp -htk-wdpenalty $PENALTY -htk-lmscale $LMSCALE

But, I found that the decoding process was very slow and memory consuming. I wonder to know why I meet and how to process this situation? Are there any steps I have did incorrect? Please give me the right steps? thank you
              !            &nbs p;         

The -classes option leads to an LM that no longer uses only a finite history to evaluate the probability of the next word.  This means that during lattice expansion all histories need to be kept distinct.   You should try the -simple-classes option, assuming your models satisfy its requirements:
-classes file Interpret the LM as an N-gram over word classes. The expansions of the classes are given in file in classes-format(5). Tokens in the LM that are not defined as classes in file are assumed to be plain words, so that the LM can contain mixed N-grams over both words and word classes. -simple-classes Assume a "simple" class model: each word is member of at most one word class, and class expansions are exactly one word long.
Hope this helps,

Andreas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20120515/49dd695f/attachment.html>