can srilm cope with xml tagged corpora?

Matt Green mjglab at googlemail.com
Tue Jan 13 06:43:54 PST 2009


I'd like to use srilm to generate bigram counts from the British National
Corpus in XML format. I see that the paper
 "SRILM - An Extensible Language Modeling Toolkit", in Proc. Intl. Conf.
Spoken Language Processing, Denver, Colorado, September 2002
mentions that support for SGML-tagged formats is regarded as desirable: has
this support been implemented in the toolkit at this time please?

thanks,
--matt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20090113/84762e39/attachment.html>


More information about the SRILM-User mailing list