write-vocab

Andreas Stolcke stolcke at speech.sri.com
Tue May 15 09:28:24 PDT 2007


B. Plank wrote:
> Dear SRILM-team,
>
> is there a parameter to get the n most frequent words out of a LM? (i.e.
> like restricing the write-vocab of "ngram -order 1" to just output the
> n-most frequent words?) I am sure there is, just now I don't see it.
>
> Thank you for any help,
> Barbara
>
>   
Actually, there is no such tool.  The frequency of words is not 
generally available in the LM, only their unigram
probabilities.  Since the unigram probabilities are usually  a monotonic 
function of the unigram frequencies you
could write a small script that extracts the words from the unigram 
section of the LM file and sorts them by
their probabilities.

Andreas





More information about the SRILM-User mailing list