[SRILM User List] How to interpolate two class-based language models

Fri Apr 6 11:09:38 PDT 2012

On 4/6/2012 9:01 AM, Meng Chen wrote:
> Hi, I have a question about interpolating two class-based language 
> models. Suppose I have two class-based language models trained from 
> two different corpus.
> And each class-based lm has its own class definition files. For 
> example, the class definition file for class-lm1 is lm1.classes, and 
> lm2.classes for class-lm2. So my question is, how to interpolate these 
> two different class-based language models? Can you give me the steps? 
> with commands better.
>
>   * Do I need to use the -classes option when interpolating them?
>
You need to merge the class definitions for both LMs, making sure that 
there are no name conflicts.  If necessary rename class labels 
CLASS01234 to LM1_CLASS01234 etc., in both the LM and the class 
definition files, then combine the two class definitions into one file, 
then interpolate the models.
>
>   * Do I need to use the -bays 0 option to interpolate them dynamically?
>
Yes, you want use something like

     ngram -lm LM1 -mix-lm LM2 -lambda L -classes 
MERGED_CLASS_DEFINITIONS -bayes 0

> I also confused about the expand class operation. If I expand the 
> class-based language model to word-based language model, does the 
> perplexity change with the same test set ?
ngram -expand-classes is an approximation, so you won't get exactly the 
same ppl, but something close.

Andreas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20120406/25db24c9/attachment.html>