Memory problem with ngram-count

Andreas Stolcke stolcke at speech.sri.com
Fri Aug 13 20:07:15 PDT 2004


In message <411D3821.8090308 at ee.washington.edu>you wrote:
> Hi,
> 
> I'm running into some memory limitations with ngram-count and am 
> wondering if anyone has any suggestions.
> 
> I have a very large text file (more than 1GB) as input to ngram-count. I 
> divided this text into smaller files and used the 'make-batch-counts' 
> and 'merge-batch-counts' commands to create a large count-file. Then, I 
> tried to use 'ngram-count -read myfile.counts -lm ...' to estimate a 
> language model. I receive the following error:
> 
> ngram-count: /SRILM/include/LHash.cc:127: void LHash<KeyT, 
> DataT>::alloc(unsigned int) [with KeyT = VocabIndex, DataT = 
> Trie<VocabIndex, unsigned int>]: Assertion `body != 0' failed.
> 
> Does anyone have any suggestions for solving this problem?

1. Use a binary compiled for "compact" memory use.
   If you are lucky (the person who installed SRILM did a thorough job)
   you should find these installed in 

	$SRILM/bin/${MACHINE_TYPE}_c/ ...

2. Use the make-big-lm script.  See the training-scripts(1) man page
   for details.

3. Find a machine with more memory or swap space.

4. Some combination of the above.

--Andreas 




More information about the SRILM-User mailing list