LM for tagged words

Andreas Stolcke stolcke at speech.sri.com
Thu Feb 13 12:44:26 PST 2003


In message <3E4A63E0.3000003 at ira.uka.de>you wrote:
> Hi,
> What does "LM support for tagged words is incomplete" (in the "Bugs" 
> section of the help for ngram-count) more precisely mean?
> I wanted to used ngram-count with -tagged option to build a language 
> model over word/tag pairs, and then use this LM with hidden-ngram to 
> find hidden tags.
> It does not seem to work (no tag is found) - is it because of the LM? 
> How could I build the LM I need?

Amelie,

the ngram-count -tagged option has nothing to do with the "tagging" 
done by hidden-ngram.   ngram-count -tagged is used to build an LM
that uses word tags (classes) for estimating backoff probabilities.
(This feature is rather experimental, and hasn't been touched in a long time,
hence the warning in the man page.)

For hidden-ngram you build a standard LM with ngram-count, treating the
event tags as regular words.  You just prepare a training text files that
contains data like

		word tag word tag word word ...

(The way hidden-ngram works it makes sense to have multiple 
words without intervening tags, but not to have multiple tags between
words.)

You then give this LM to hidden-ngram, together with the list of tags
(-hidden-vocab) and some test data that contains only words but no tags. 
It will output the automatically tagged data.
hidden-ngram is rather heavily used and should be working fine. Let me know
if you have problems.

Hope this helps,

--Andreas




More information about the SRILM-User mailing list