Class n-grams

Andreas Stolcke stolcke at speech.sri.com
Thu Jul 3 22:17:59 PDT 2008


Basiou Nikoletta wrote:
> Dear Andreas,
>  
> thanks a lot for your answer. Actually, i want to build the classes 
> from trigram statistics/counts. Is there any improvision for such an 
> implementation in the near future or there are restrictions due to 
> higher memory and process requirements?
It would take a lot longer and is currently not implemented. 
I vaguely recall a paper by Herman Ney and colleagues many years ago 
showing that inducing classes based on higher-order statistics doesn't 
buy that much
(i.e., it is sufficient to learn the classes using bigram statistics, 
and then use them in higher-order class-based models).

Andreas

>  
> Looking forward for your answer,
> Nikoletta
>
>     ------------------------------------------------------------------------
>     *From:* Andreas Stolcke [mailto:stolcke at speech.sri.com]
>     *To:* Nikoletta Bassiou [mailto:nbassiou at aiia.csd.auth.gr]
>     *Cc:* srilm-user at speech.sri.com
>     *Sent:* Tue, 01 Jul 2008 19:25:22 +0300
>     *Subject:* Re: Class n-grams
>
>     Nikoletta Bassiou wrote:
>     > I would like to build a class trigram using ngram-class but
>     according
>     > to the documentation only class bigram is implemented.
>     > If this is true, do you know any other way I can build a class
>     > trigram? Is there an improvision for extending ngram-class for
>     higher
>     > order n-grams (n>3)?
>     >
>     > Nikoletta
>     The bigram restriction only applies to the statistics used to
>     learn the
>     word classes. Once you have the classes you can apply them to your
>     text
>     and build an ngram of any order.
>
>     Andreas
>
>
>  
>  





More information about the SRILM-User mailing list