unicode & many files

Alexy Khrabrov deliverable at gmail.com
Wed Sep 12 08:50:50 PDT 2007


How good is the unicode support -- e.g. for utf8?  I've fed it some  
utf8 Cyrillics and it did fine.  How does it know we're using  
multibyte or single byte characters?

Another question -- how do I feed many text files from a directory,  
should I do multiple -text options after cooking them somehow, or use  
-read on an accumulating count file?

Cheers,
Alexy



More information about the SRILM-User mailing list