add new words to current classes

Andreas Stolcke stolcke at speech.sri.com
Tue Jun 12 10:49:30 PDT 2007


Sergey Protasov wrote:
> Dear experts,
>
> I have small corpora with dictionary of 10K words that split on 200 
> classes.
>
> And I have big corpora with dictionary of 30K words (20K of new words).
>
> I want to split 20K new words to the 200 classes that exist.
>
> How can I do it? (using srilm)
>
> I dont want to move any of old 10K words from class to class. 

I agree this would be a useful function to have, but unfortunately it is 
not currently implemented.
It should be fairly straightforward to do based on the existing code.

You basically  need to load an existing class definition, then create 
singleton classes for the
new words, and start incremental merging with the number of classes 
limited to the original set.

If you care about this problem you should try to modify ngram-class.cc 
and share the results with
the rest of us! I'd be happy to give some guidance and review changes if 
you are willing to do the work.

Andreas


Andreas





More information about the SRILM-User mailing list