add new words to current classes
Andreas Stolcke
stolcke at speech.sri.com
Tue Jun 12 10:49:30 PDT 2007
Sergey Protasov wrote:
> Dear experts,
>
> I have small corpora with dictionary of 10K words that split on 200
> classes.
>
> And I have big corpora with dictionary of 30K words (20K of new words).
>
> I want to split 20K new words to the 200 classes that exist.
>
> How can I do it? (using srilm)
>
> I dont want to move any of old 10K words from class to class.
I agree this would be a useful function to have, but unfortunately it is
not currently implemented.
It should be fairly straightforward to do based on the existing code.
You basically need to load an existing class definition, then create
singleton classes for the
new words, and start incremental merging with the number of classes
limited to the original set.
If you care about this problem you should try to modify ngram-class.cc
and share the results with
the rest of us! I'd be happy to give some guidance and review changes if
you are willing to do the work.
Andreas
Andreas
More information about the SRILM-User
mailing list