can srilm cope with xml tagged corpora?
mjglab at googlemail.com
Tue Jan 13 06:43:54 PST 2009
I'd like to use srilm to generate bigram counts from the British National
Corpus in XML format. I see that the paper
"SRILM - An Extensible Language Modeling Toolkit", in Proc. Intl. Conf.
Spoken Language Processing, Denver, Colorado, September 2002
mentions that support for SGML-tagged formats is regarded as desirable: has
this support been implemented in the toolkit at this time please?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the SRILM-User