write-vocab
Andreas Stolcke
stolcke at speech.sri.com
Tue May 15 09:28:24 PDT 2007
B. Plank wrote:
> Dear SRILM-team,
>
> is there a parameter to get the n most frequent words out of a LM? (i.e.
> like restricing the write-vocab of "ngram -order 1" to just output the
> n-most frequent words?) I am sure there is, just now I don't see it.
>
> Thank you for any help,
> Barbara
>
>
Actually, there is no such tool. The frequency of words is not
generally available in the LM, only their unigram
probabilities. Since the unigram probabilities are usually a monotonic
function of the unigram frequencies you
could write a small script that extracts the words from the unigram
section of the LM file and sorts them by
their probabilities.
Andreas
More information about the SRILM-User
mailing list