Interpolation of word-based and POS-nased ngrams

Andreas Stolcke stolcke at speech.sri.com
Tue Jul 20 11:50:22 PDT 2004


In message <200407201701.i6KH1A9R016054 at www6.pobox.sk>you wrote:
> Hi Andreas,
>  my problem is that I use different data for both models. The
> word-based model uses a text consisting of recognized words, POS-based
> class model uses a text consistig of recognized words' POS. I have
> estimated this model simply by using the ngram-count tool from the
> text where words were replaced by their POS tags. 
>  POS-based classes are also not typical "simple classes"...

You are right of course.  While hidden-ngram can theoretically 
handle general class ngrams, the implementation is currently 
not able to handle anything but toy examples.  the reason is that 
general class-based models are no longer Markovian: they require the 
complete word history.  This means hidden-ngram has to 
keep complete distinct histories for every hypothesis, which quickly 
becomes infeasible.

With some small changes to the code one could approximate the full 
class N-gram by truncating the ngram context used to a fixed length 
(say 4).  this might not hurt you much in practice, and would enable
use of class-based N-grams in hidden-ngram decoding.
Let me know if you're insterest in that.

The other possbility (also an approximation) for now is to expand 
the class ngram into a word ngram, but this might also fail due to 
resource limitations, depending on your vocabulary size.

--Andreas

>   
> Robert
> 
> P.S.
>  It would be ideal to gain the interpolation weights by SRILM as well;-)
> 
> > 
> > In message <200407201526.i6KFQNr4006030 at www3.pobox.sk>you wrote:
> > > Hello SRILM users!
> > >  Does anybody know if there is an implementation of interpolation
> > > weights in SRILM? I have an ordinary word-based ngram and
> > > part-of-speech-based ngram and want to interpolate them to create HMM
> > > model for disfluency detection (using hidden-ngram tool). Is it
> > > possible to do it directly in SRILM?
> > 
> > By using the options
> > 
> > 	-lm
> > 	-classes
> > 	-simple-classes
> > 	-lambda
> > 	-mix-lm
> > 
> > with hidden-ngram you can tell it to use an interpolated LM where
> > (one or both of) the component models are class-based.
> > 
> > For details see the man page.
> > 
> > --Andreas 
> > 
> 
> ____________________________________
> http://www.pobox.sk/ - spolahliva a bezpecna prevadzka
> 
> 
> 
> 




More information about the SRILM-User mailing list