question about vocabulary

Anand Venkataraman anand at speech.sri.com
Tue May 4 08:53:13 PDT 2004

Previous message: question about vocabulary
Next message: question about vocabulary
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

> I would like to know if it's possible with the SRILM toolkit to generate
> a vocabulary with the 20000 most frequent words of a corpus for example.

You should be able achieve this by using "ngram-count -order 1 -write -",
doing reverse sort on field 2 and taking the top 20000.

&

Previous message: question about vocabulary
Next message: question about vocabulary
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the SRILM-User mailing list