[SRILM User List] language model file : use a light-weight file - hash or compact trie data

Andreas Stolcke stolcke at icsi.berkeley.edu
Thu Nov 6 10:30:13 PST 2014


On 11/5/2014 10:13 AM, kamel nebhi wrote:
> Dear all,
>
> I'm actually using the disambig tool and I want to know if there is a 
> way to make the model smaller.
>
> I'll filter the results but I want to know if it is possible to use a 
> hash or a compact trie data structure to have a much higher 
> compression and also to facilitate the lookup into the LM.
>
>
The ngram LM data structure in SRILM already uses a trie with hash 
tables at the nodes.  This is speed-optimized, not so much for space.  
There are specialized data structures for ngram tables but they are 
cumbersome for dynamic manipulation of the ngram set, which are 
important in many SRILM algorithms.

For general hints on conserving space check the FAQ file 
<http://www.speech.sri.com/projects/srilm/manpages/srilm-faq.7.html>.

Andreas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20141106/d507db61/attachment.html>


More information about the SRILM-User mailing list