[SRILM User List] Compacting language models

Luis Uebel lfu20 at hotmail.com
Sun Feb 13 13:05:51 PST 2011


I am using SRI to produce some reverse language models and are quite big.
Stats: training data: 1.1G words
                                88M sentences

but system was limited to 39k words (wordlist.txt) by:
ngram-count -memuse -order 3 -interpolate -kndiscount -unk -vocab ../lang-data/wordlist.txt -limit-vocab -text ../lang-data/${training}-${reverse}.xml -lm ${training}-reverse-lm${trigram}


Is there other options to reduce LM size since trigrams are 1.7G? (without so much lost in performance)?

Thanks,


Luis

 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20110213/168a90a8/attachment.html>


More information about the SRILM-User mailing list