[SRILM User List] language models
Andreas Stolcke
stolcke at speech.sri.com
Thu Aug 27 13:38:35 PDT 2009
Md. Akmal Haidar wrote:
>
> Hi,
> Thanks for your reply.
> I need to mix 20 topic models. srilm provide 10 LM file one at a time.
> I use the following command:(t:topic,w:topic weight)
> ngram -lm t1.lm w1 -mix-lm t2.lm w2 -mix-lm2 t3.lm w3
> .............-mix-lm9 t10.lm w10 -write-lm t1to10.lm
> ngram -lm t11.lm w11 -mix-lm t12.lm w12 -mix-lm2 t13.lm w13
> .............-mix-lm9 t20.lm w20 -write-lm t11to20.lm
> ngram -lm t1to10.lm .5 -mix-lm t11to20.lm .5 -write-lm t1to20.lm
You can mix the models recursively. To mix three models L1 L2 L3 with
weights w1 w2 w3 (w1 + w2+ w3 = 1)
you first build
L12 = w1/(w1+w2) L1 + w2/(w1+w2) L2
and then
L = (w1 + w2) L12 + w3 L3.
I'll leave it to you to generalize this to a larger number of models.
Please direct future questions of this nature to the srilm-user mailing
list.
Andreas
>
> could you please tell me is the command correct for mixing LM file?
>
> Thanks
> Akmal
>
> ------------------------------------------------------------------------
> *From:* Andreas Stolcke <stolcke at speech.sri.com>
> *To:* Md. Akmal Haidar <akmalcuet00 at yahoo.com>
> *Cc:* srilm-user <srilm-user at speech.sri.com>
> *Sent:* Wednesday, August 19, 2009 8:05:40 PM
> *Subject:* Re: language models
>
> Md. Akmal Haidar wrote:
> > Hi,
> > I have three 3 lm file.
> > The first one i got by ngram-count.
> > The second one is by applying some matlab programming on the first.
> > The third one is by renormalizing the second one using ngram -renorm
> option.
> > In creating the third one, i faced some message like: BOW
> denominator for context "been has" is -0.382151<=0, numerator is 0.846874
> That's expected if you changed the probabilities such that they sum to
> > 1 for a given context.
> ngram -renorm cannot deal with this. It simply recomputes the backoff
> weights to normalize the model, but it won't change the existing ngram
> probabilities. Obviously if just the explicit ngram probabilities sum
> to > 1 there is no way to assign backoff weights such that the model
> is normalized, hence the above message.
> > The second and third one gives too lowest perplexity(7.53 & 5.70) .
> The first one gives 73.73
> That's right, if your probabilities don't sum to 1 (over the entire
> vocabulary, for all contexts) perplexities are meaningless.
>
> You can run ngram -debug 3 -ppl to check that probabilities are
> normalized for all contexts occurring in your test set.
>
> I don't have a simple solution for your problem. Since you
> manipulated the probabilities you have to figure out a way to get them
> normalized ! I suggest you use the srilm-user mailing list if you
> want to seek further advice this. But you would first have to explain
> in more detail how you assign your probabilities.
>
> Andreas
>
> > Could you please tell me whats the meaning of these message?
> > Thanks & Regards
> > Haidar
> >
> >
> ------------------------------------------------------------------------
> > *From:* Andreas Stolcke <stolcke at speech.sri.com
> <mailto:stolcke at speech.sri.com>>
> > *To:* Md. Akmal Haidar akmalcuet00 at yahoo.com
> <mailto:akmalcuet00 at yahoo.com> <mailto:akmalcuet00 at yahoo.com
> <mailto:akmalcuet00 at yahoo.com>>
> > *Sent:* Thursday, August 13, 2009 1:24:41 PM
> > *Subject:* Re: language models
> >
> >
> > In message <92580.94445.qm at web38002.mail.mud.yahoo.com
> <mailto:92580.94445.qm at web38002.mail.mud.yahoo.com>
> <mailto:92580.94445.qm at web38002.mail.mud.yahoo.com
> <mailto:92580.94445.qm at web38002.mail.mud.yahoo.com>>>you wrote:
> > >
> > > Dear Andreas,
> > > I attahced 2 lm file.
> > > here, train3.lm is the original lm file which i got by applying
> ngram-count.
> >
> > So does that file have probabilities summing to 1?
> > I would think not.
> >
> > > ntrain3.lm is the modified lm which i got by some matlab
> programming. But, he
> > > re sum the of seen 2-gram probabilities sharing common 1 gram is
> greater than
> > > 1.
> >
> > I cannot help you debugging you matlab script if that's what's giving
> > you unnormalized probabilities.
> >
> > >
> > > If i changed the 1 gram back off weight to make the sum of
> 2-gram(seen & unse
> > > en) proability sharing common 1 gram is equal to 1, is the method
> will correc
> > > t?
> >
> > yes.
> >
> > ngram -renorm will also do this for you.
> >
> > Andreas
> >
> >
>
>
More information about the SRILM-User
mailing list