[SRILM User List] How to decode with an interpolated class-based LM with lattice-tool

Andreas Stolcke stolcke at icsi.berkeley.edu
Mon May 14 10:01:28 PDT 2012

On 5/14/2012 4:41 AM, mvp-songyoung wrote:
> Hi,I meet a question when lattice rescoring with an interpolated 
> class-based lm with lattice-tool. This class-based LM was trained by 
> interpolating three other different class-based LMs:LM1 c! ontian 3500 
> words and merged into 350 clases;LM2 contain 2500 words and merged 
> into 250 classes ; LM3 contian 110 words and merged into 10 classes.  
> I have renamed the class definitions for three class-based LMs before 
> training and interpolating them.and I also merged the class 
> definitions to a single file before decoding. My decoding comand is as 
> follows:
> lattice-tool -read-htk -viterbi-decode -order 4 -lm class-4gram.lm 
> -classes <class> -in-lattice-list lattice.scp -htk-wdpenalty $PENALTY 
> -htk-lmscale $LMSCALE
> But, I found that the decoding process was very slow and memory 
> consuming. I wonder to know why I meet and how to process this 
> situation? Are there any steps I have did incorrect? Please give me 
> the right steps? thank you
>               ! &nbs p;

The -classes option leads to an LM that no longer uses only a finite 
history to evaluate the probability of the next word.  This means that 
during lattice expansion all histories need to be kept distinct.   You 
should try the -simple-classes option, assuming your models satisfy its 
> *
> *-classes*/file/
>     Interpret the LM as an N-gram over word classes. The expansions of
>     the classes are given in /file/ in classes-format(5)
>     <http://www.speech.sri.com/projects/srilm/manpages/classes-format.5.html>.
>     Tokens in the LM that are not defined as classes in /file / are
>     assumed to be plain words, so that the LM can contain mixed
>     N-grams over both words and word classes. 
> *-simple-classes*
>     Assume a "simple" class model: each word is member of at most one
>     word class, and class expansions are exactly one word long. *

Hope this helps,


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20120514/f8123365/attachment.html>

More information about the SRILM-User mailing list