ARPA format (sorting)

Paul Melis melis at cs.utwente.nl
Tue Mar 11 14:21:59 PST 2003


Hello Andreas,

Is there any explicit sorting that LM's in ARPA format should have? Specifically, is there a standard sort order for the words of uni-, bi- and trigrams? (e.g. <unk> first, then diacritics, then alphabetically, then...). 
We've had some problems with arpa's written by SRILM that the CMU toolkit can't handle, and we suspect a problem in the sorting of n-grams.

Regards,
Paul
-- 
melis at cs.utwente.nl



More information about the SRILM-User mailing list