[SRILM User List] ngram pruning

Andreas Stolcke stolcke at speech.sri.com
Sun Dec 20 09:42:31 PST 2009


On 12/19/2009 4:19 AM, 王秋锋 wrote:
> hi all,
> I get the original BiGram from the text with ngram-count tool,
> like "ngram-count -text corpus -lm Original_BiGram -order 2"
> so the original_Bigram is very large, I need pruning, like "ngram -lm
> Original_BiGram -order 2 -prune... "
> But I found that the -prune tool can not prune the UniGram, the
> -minprune n is at least 2.
> So What can I do to prune the Unigram?
> because all the words from the corpus are in the Unigram, it is too
> large, and some words are really useless.
Make a list of the words you want to INclude then use that as the
vocabulary of your LM

ngram-count -vocab LIST ...

Andreas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20091220/2c0bc2c9/attachment.html>


More information about the SRILM-User mailing list