LM for tagged words
stolcke at speech.sri.com
Thu Feb 13 12:44:26 PST 2003
In message <3E4A63E0.3000003 at ira.uka.de>you wrote:
> What does "LM support for tagged words is incomplete" (in the "Bugs"
> section of the help for ngram-count) more precisely mean?
> I wanted to used ngram-count with -tagged option to build a language
> model over word/tag pairs, and then use this LM with hidden-ngram to
> find hidden tags.
> It does not seem to work (no tag is found) - is it because of the LM?
> How could I build the LM I need?
the ngram-count -tagged option has nothing to do with the "tagging"
done by hidden-ngram. ngram-count -tagged is used to build an LM
that uses word tags (classes) for estimating backoff probabilities.
(This feature is rather experimental, and hasn't been touched in a long time,
hence the warning in the man page.)
For hidden-ngram you build a standard LM with ngram-count, treating the
event tags as regular words. You just prepare a training text files that
contains data like
word tag word tag word word ...
(The way hidden-ngram works it makes sense to have multiple
words without intervening tags, but not to have multiple tags between
You then give this LM to hidden-ngram, together with the list of tags
(-hidden-vocab) and some test data that contains only words but no tags.
It will output the automatically tagged data.
hidden-ngram is rather heavily used and should be working fine. Let me know
if you have problems.
Hope this helps,
More information about the SRILM-User