[SRILM User List] Class-based LM
Andreas Stolcke
stolcke at icsi.berkeley.edu
Thu Jul 21 10:26:37 PDT 2011
You can find some tutorial information on how to use induced-class-based
LMs at
http://ssli.ee.washington.edu/courses/ee517/srilm.html
The basic mistake in your case is that you are trying to feed tag.txt
instead of corpus.txt
to ngram-class.
replace-word-with-classes is typically used to prepare training data
once the class definitions exist (either from ngram-class or by hand
crafting).
Andreas
Maral Sh. wrote:
> Dear SRILM users,
> I am trying to train a class-based LM. I was hoping there is an
> step-by-step guide for doing this, but I couldn't find any.
> I have to create two different LM. my corpus is POS tagged and one of
> LMs should be based on POS tags. I should also create an LM based on
> automatic clustering(I removed the tags and I should perform this
> automatic clustering on this untagged corpus).
> The format of my tagged corpus is one word per line along with its
> tag, which are tab-separated.
> I first excluded the tags in a separate text file and performed the
> following command on it ->
>
> ./ngram-class -text tag.txt -full -classes output.cls -class-counts
> output.counts
>
> then I tried
>
> ./replace-word-with-classes classes=output.cls corpus.txt > tag.txt
>
> in the end the tag.txt file was someting like the corpus.txt file (it
> was a word -space- tag per line format).
>
> The thing is I don't know what to do next, and if I have done
> correctly up to now.
> I appreciate it if anyone can help me ASAP. I have deadlines on Monday.
>
>
> Maral
> ------------------------------------------------------------------------
>
> _______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://www.speech.sri.com/mailman/listinfo/srilm-user
More information about the SRILM-User
mailing list