[SRILM User List] Class-based LM

Maral Sh. maralthemoral at gmail.com
Thu Jul 21 03:23:27 PDT 2011


Dear SRILM users,
I am trying to train a class-based LM.  I was hoping there is an
step-by-step guide for doing this, but I couldn't find any.
I have to create two different LM. my corpus is POS tagged and one of LMs
should be based on POS tags. I should also create an LM based on automatic
clustering(I removed the tags and I should perform this automatic clustering
on this untagged corpus).
The format of my tagged corpus is one word per line along with its tag,
which are tab-separated.
I first excluded the tags in a separate text file and performed the
following command on it ->

./ngram-class -text tag.txt -full -classes output.cls  -class-counts
output.counts

then I tried

./replace-word-with-classes classes=output.cls corpus.txt > tag.txt

in the end the tag.txt file was someting like the corpus.txt file (it was a
word -space- tag per line format).

The thing is I don't know what to do next, and if I have done correctly up
to now.
I appreciate it if anyone can help me ASAP. I have deadlines on Monday.


Maral
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20110721/8d9a895b/attachment.html>


More information about the SRILM-User mailing list