[SRILM User List] Class-based LM
Maral Sh.
maralthemoral at gmail.com
Thu Jul 21 03:23:27 PDT 2011
Dear SRILM users,
I am trying to train a class-based LM. I was hoping there is an
step-by-step guide for doing this, but I couldn't find any.
I have to create two different LM. my corpus is POS tagged and one of LMs
should be based on POS tags. I should also create an LM based on automatic
clustering(I removed the tags and I should perform this automatic clustering
on this untagged corpus).
The format of my tagged corpus is one word per line along with its tag,
which are tab-separated.
I first excluded the tags in a separate text file and performed the
following command on it ->
./ngram-class -text tag.txt -full -classes output.cls -class-counts
output.counts
then I tried
./replace-word-with-classes classes=output.cls corpus.txt > tag.txt
in the end the tag.txt file was someting like the corpus.txt file (it was a
word -space- tag per line format).
The thing is I don't know what to do next, and if I have done correctly up
to now.
I appreciate it if anyone can help me ASAP. I have deadlines on Monday.
Maral
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20110721/8d9a895b/attachment.html>
More information about the SRILM-User
mailing list