can srilm cope with xml tagged corpora?

Matt Green mjglab at
Tue Jan 13 06:43:54 PST 2009

I'd like to use srilm to generate bigram counts from the British National
Corpus in XML format. I see that the paper
 "SRILM - An Extensible Language Modeling Toolkit", in Proc. Intl. Conf.
Spoken Language Processing, Denver, Colorado, September 2002
mentions that support for SGML-tagged formats is regarded as desirable: has
this support been implemented in the toolkit at this time please?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the SRILM-User mailing list