Mats Svenson svmats at
Thu Oct 25 04:13:03 PDT 2007

 I guess the -save options as implemented in ngram-class is not very useful. Typically, I'm not interesting in testing classes as appearing on the beginning of the clustering process, but rather in classes induced in final steps. If the number of clustered words is high, the current option results in creating an enormous number of useless files.

It'd be much more practical if the user could explicitly set which classes with different granularity should be saved, or, alternatively, to have some -startsave option which'd allow to start saving class files close to the end of the clustering.

Would that be easy to implement?

One more thing, is there an easy way how to find how many classes appear in particular class file without writing a script? The number of iterations doesn't say that directly and I'm not sure whether it can be computed as NUMBER_OF_WORDS_IN_THE_VOCAB - NUMBER_OF_ITERATIONS - NUMBER_OF_WORDS_IN_THE_NO_CLASS_VOCAB


