[SRILM User List] Compacting language models

Sun Feb 13 13:05:51 PST 2011

I am using SRI to produce some reverse language models and are quite big.
Stats: training data: 1.1G words
                                88M sentences

but system was limited to 39k words (wordlist.txt) by:
ngram-count -memuse -order 3 -interpolate -kndiscount -unk -vocab ../lang-data/wordlist.txt -limit-vocab -text ../lang-data/${training}-${reverse}.xml -lm ${training}-reverse-lm${trigram}

Is there other options to reduce LM size since trigrams are 1.7G? (without so much lost in performance)?

Thanks,

Luis

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20110213/168a90a8/attachment.html>