[SRILM User List] Class-based LM

Andreas Stolcke stolcke at icsi.berkeley.edu
Thu Jul 21 10:26:37 PDT 2011


You can find some tutorial information on how to use induced-class-based 
LMs at

http://ssli.ee.washington.edu/courses/ee517/srilm.html

The basic mistake in your case is that you are trying to feed tag.txt 
instead of corpus.txt
to ngram-class.

replace-word-with-classes is typically used to prepare training data 
once the class definitions exist (either from ngram-class or by hand 
crafting).

Andreas



Maral Sh. wrote:
> Dear SRILM users,
> I am trying to train a class-based LM.  I was hoping there is an 
> step-by-step guide for doing this, but I couldn't find any.
> I have to create two different LM. my corpus is POS tagged and one of 
> LMs should be based on POS tags. I should also create an LM based on 
> automatic clustering(I removed the tags and I should perform this 
> automatic clustering on this untagged corpus).
> The format of my tagged corpus is one word per line along with its 
> tag, which are tab-separated.
> I first excluded the tags in a separate text file and performed the 
> following command on it ->
>
> ./ngram-class -text tag.txt -full -classes output.cls  -class-counts 
> output.counts
>
> then I tried
>
> ./replace-word-with-classes classes=output.cls corpus.txt > tag.txt
>
> in the end the tag.txt file was someting like the corpus.txt file (it 
> was a word -space- tag per line format).
>
> The thing is I don't know what to do next, and if I have done 
> correctly up to now.
> I appreciate it if anyone can help me ASAP. I have deadlines on Monday.
>
>
> Maral
> ------------------------------------------------------------------------
>
> _______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://www.speech.sri.com/mailman/listinfo/srilm-user



More information about the SRILM-User mailing list