sorting of n-grams

Andreas Stolcke stolcke at speech.sri.com
Fri Aug 4 10:13:35 PDT 2006


In message <9749FF5BC5DC1C48A2D7EEB08AB1CF61AEA8AE at ntmail.pc.itc.it>you wrote:
> Hi Andreas,
> 
> I'm just wondering if there is some special reason 
> why ngrams of a LM are not printed according to 
> the ordering given by the 1-grams. In particular,
> the order is not respected up from the 3-grams.
> The first 3-gram that are printed do not begin
> with the top words in the 1-gram list.

The N-grams are output in the order that corresponds to the 
internal data structure.  Of course no particular order is required
for the external representation, but this order also happens to 
be the most efficient (in terms of hardware caching) when the model
is read back in.

If you want to sort the ngrams in an LM file like some other
software (like Sphinx) seems to require, use the sort-lm script
(see man lm-scripts).

--Andreas 




More information about the SRILM-User mailing list