ARPA format (sorting)
Paul Melis
melis at cs.utwente.nl
Tue Mar 11 14:21:59 PST 2003
Hello Andreas,
Is there any explicit sorting that LM's in ARPA format should have? Specifically, is there a standard sort order for the words of uni-, bi- and trigrams? (e.g. <unk> first, then diacritics, then alphabetically, then...).
We've had some problems with arpa's written by SRILM that the CMU toolkit can't handle, and we suspect a problem in the sorting of n-grams.
Regards,
Paul
--
melis at cs.utwente.nl
More information about the SRILM-User
mailing list