[SRILM User List] Question of replace-words-with-classes

Andreas Stolcke stolcke at icsi.berkeley.edu
Mon Apr 2 15:08:54 PDT 2012


On 3/31/2012 8:00 PM, Meng Chen wrote:
> Hi, I met a question when training class-based language model by 
> replace-words-with-classes command. My commands are as follows:
>
>   * ngram-class -vocab wlist -text training_set -numclasses 200
>     -incremental -classes output.classes
>   * replace-words-with-classes classes=output.classestraining_set >
>     training_set_classes
>
> After these two steps, I found that there are both words and classes 
> in training_set_classes. These words are OOVs in wlist, however, I 
> don't need them at all. Shouldn't these words belong to <unk> in 
> CLASS-00001? So I wonder to know how to process this situation? Does 
> SRILM support some scripts to map these OOVs to CLASS-00001? Or Do I 
> need to write a script by myself?

It must be the case that wlist does not contain all the words in 
training_set, and therefore output.classes does not cover the entire 
vocabulary.
In that case replace-words-with-classes will only operate on words 
contained in the class definitions.

You can easily augment the class definitions to add an extra class that 
catches all your OOV words.  The format should be self-explanatory, or 
check the classes-format(5) man page.

Andreas


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20120402/e9c02e79/attachment.html>


More information about the SRILM-User mailing list