a naive question need your help
stolcke at speech.sri.com
Thu Aug 14 13:16:18 PDT 2008
jian zhu wrote:
> Hi professor stolcke:
> I am a computer programmer from China. Thanks a lot for your great
> work on language model, and unselfishly sharing the perfect slm
> I have a naive question need your help.
> I want to use "disambig" tool for part-of-speech tagging, but I
> have some trouble
> with it.
> I use the tool as following:
> disambig -text file -map wtfile -lm ttfile
> file --- word text
> wtfile --- P(word|tag2) emit file
> ttfile --- P(tag2|tag1) transit file
> ttfile can be trained using "ngram-count" tool, but i don't know
> how i can get
> wtfile, i don't know how i can get this file by using srilm.
> it's format is as following:
> -map file
> Specifies the file containing the V1-to-V2 mapping information.
> Each line of file contains the mapping for a single word in V1:
> w1 w21 [p21] w22 [p22] ...
> where w1 is a word from V1, which has possible mappings w21, w22,
> ... from V2. Optionally, each of these can be followed by a numeric
> string for the probability p21, which defaults to 1. The number is
> used as the conditional probability P(w1|w21), but the program does
> not depend on these numbers being properly normalized.
> Thank you very much!
> Looking forward for your help.
There is no ready-made tool for estimating and formatting the map
probabilities. It is such a simple format that you should be able to
write a perl script or similar to estimate these probabilities from
data. Note that for taggers it is usually more convenient to construct
the map file with probabilities p(w21 | w1) and use the -scale option.
To estimate p(POS | word) you can count occurrences in a tagged training
corpus (possibly with some smoothing to allow for unseen combinations
(for unseen words and open-class POS classes). In the absence of
training data you can try a uniform POS distribution.
I know that people have built POS taggers with SRILM. I suggest that
you direct further questions to the srilm-user mailing list.
> Best Regards
More information about the SRILM-User