Interpolation of word-based and POS-nased ngrams
Andreas Stolcke
stolcke at speech.sri.com
Tue Jul 20 11:50:22 PDT 2004
In message <200407201701.i6KH1A9R016054 at www6.pobox.sk>you wrote:
> Hi Andreas,
> my problem is that I use different data for both models. The
> word-based model uses a text consisting of recognized words, POS-based
> class model uses a text consistig of recognized words' POS. I have
> estimated this model simply by using the ngram-count tool from the
> text where words were replaced by their POS tags.
> POS-based classes are also not typical "simple classes"...
You are right of course. While hidden-ngram can theoretically
handle general class ngrams, the implementation is currently
not able to handle anything but toy examples. the reason is that
general class-based models are no longer Markovian: they require the
complete word history. This means hidden-ngram has to
keep complete distinct histories for every hypothesis, which quickly
becomes infeasible.
With some small changes to the code one could approximate the full
class N-gram by truncating the ngram context used to a fixed length
(say 4). this might not hurt you much in practice, and would enable
use of class-based N-grams in hidden-ngram decoding.
Let me know if you're insterest in that.
The other possbility (also an approximation) for now is to expand
the class ngram into a word ngram, but this might also fail due to
resource limitations, depending on your vocabulary size.
--Andreas
>
> Robert
>
> P.S.
> It would be ideal to gain the interpolation weights by SRILM as well;-)
>
> >
> > In message <200407201526.i6KFQNr4006030 at www3.pobox.sk>you wrote:
> > > Hello SRILM users!
> > > Does anybody know if there is an implementation of interpolation
> > > weights in SRILM? I have an ordinary word-based ngram and
> > > part-of-speech-based ngram and want to interpolate them to create HMM
> > > model for disfluency detection (using hidden-ngram tool). Is it
> > > possible to do it directly in SRILM?
> >
> > By using the options
> >
> > -lm
> > -classes
> > -simple-classes
> > -lambda
> > -mix-lm
> >
> > with hidden-ngram you can tell it to use an interpolated LM where
> > (one or both of) the component models are class-based.
> >
> > For details see the man page.
> >
> > --Andreas
> >
>
> ____________________________________
> http://www.pobox.sk/ - spolahliva a bezpecna prevadzka
>
>
>
>
More information about the SRILM-User
mailing list