language models

Wed Aug 12 10:47:11 PDT 2009

Md. Akmal Haidar wrote:
> Dear Andreas,
> Thanks for your reply.
> Is the sum of n-gram probabilities sharing common (n-1) gram should be 
> equal to 1?
No, because smoothing results in some probability mass being assigned to 
ngrams not observed in the training data (and hence in the LM).  This 
probability mass is then assigned to the unobserved ngrams via the 
backoff formula.
> if yes,
> Is there any tool to normalize the language model probabilities such 
> that sum of n-gram probabilities sharing common (n-1) gram is equal to 1?
To make the probabilities of only the observed ngrams add up to 1 you 
need to disable smoothing, and also make sure all observed ngrams are 
include in the model.  Try ngram-count with these options:

-gt3min 1 -gt4min 1  (etc.)
-gt1max 0 -gt2max 0 -gt3max 0 -gt4max 0 (etc. up to the order of ngram 
you need)

For more details on smoothing check
http://www.speech.sri.com/projects/srilm/manpages/ngram-discount.7.html

Andreas


Andreas


> 1
> Thanks
> Best Regards
> Akmal
>
>
> ------------------------------------------------------------------------
> *From:* Andreas Stolcke <stolcke at speech.sri.com>
> *To:* Md. Akmal Haidar <akmalcuet00 at yahoo.com>
> *Sent:* Tuesday, July 21, 2009 7:20:06 PM
> *Subject:* Re: language models
>
> Md. Akmal Haidar wrote:
> > Dear Andreas,
> >  Thanks for your reply.
> >  what is the difference between language model creating from a text 
> file and a count file.
> > if i use like -text textfile -lm lmfile & -read countfile -vocab 
> vocabfile -lm lmfile. the first one gives smaller perplexity.
> The difference is probably due to use of the -vocab option.  It limits 
> the vocabulary of the LM.
> If you use it in both cases, or not at all your should get the same 
> results.
>
> Andreas
>
> >  Could you please tell me what's the reason?
> > Thanks & Regards
> > Akmal
> >
> > ------------------------------------------------------------------------
> > *From:* Andreas Stolcke <stolcke at speech.sri.com 
> <mailto:stolcke at speech.sri.com>>
> > *To:* Md. Akmal Haidar <akmalcuet00 at yahoo.com 
> <mailto:akmalcuet00 at yahoo.com>>
> > *Sent:* Tuesday, July 7, 2009 7:31:41 AM
> > *Subject:* Re: Mixing several topic models
> >
> > Md. Akmal Haidar wrote:
> > > Hi,
> > >
> > > I am new in srilm.
> > >
> > > I am working for language model adaptation using LDA. I need to mix
> > > several topic models through weighting factor. >
> > > Is there any way in srilm to mix several language models?
> > Read the ngram(1) man page, specifically about the options -mix-lm,
> > -mix-lm2, etc.
> >
> > Andreas
> >
> > >
> > > Thanks
> > >
> > > Kind Regards
> > > Akmal
> > >
> > >
> >
> >
>
>