Saving option for ngram-class

Mats Svenson svmats at yahoo.com
Thu Oct 25 04:13:03 PDT 2007


Hi,
 I guess the -save options as implemented in ngram-class is not very useful. Typically, I'm not interesting in testing classes as appearing on the beginning of the clustering process, but rather in classes induced in final steps. If the number of clustered words is high, the current option results in creating an enormous number of useless files.

It'd be much more practical if the user could explicitly set which classes with different granularity should be saved, or, alternatively, to have some -startsave option which'd allow to start saving class files close to the end of the clustering.

Would that be easy to implement?

One more thing, is there an easy way how to find how many classes appear in particular class file without writing a script? The number of iterations doesn't say that directly and I'm not sure whether it can be computed as NUMBER_OF_WORDS_IN_THE_VOCAB - NUMBER_OF_ITERATIONS - NUMBER_OF_WORDS_IN_THE_NO_CLASS_VOCAB

Best,
 Mats



__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 



More information about the SRILM-User mailing list