Memory problem with ngram-count
Andreas Stolcke
stolcke at speech.sri.com
Fri Aug 13 20:07:15 PDT 2004
In message <411D3821.8090308 at ee.washington.edu>you wrote:
> Hi,
>
> I'm running into some memory limitations with ngram-count and am
> wondering if anyone has any suggestions.
>
> I have a very large text file (more than 1GB) as input to ngram-count. I
> divided this text into smaller files and used the 'make-batch-counts'
> and 'merge-batch-counts' commands to create a large count-file. Then, I
> tried to use 'ngram-count -read myfile.counts -lm ...' to estimate a
> language model. I receive the following error:
>
> ngram-count: /SRILM/include/LHash.cc:127: void LHash<KeyT,
> DataT>::alloc(unsigned int) [with KeyT = VocabIndex, DataT =
> Trie<VocabIndex, unsigned int>]: Assertion `body != 0' failed.
>
> Does anyone have any suggestions for solving this problem?
1. Use a binary compiled for "compact" memory use.
If you are lucky (the person who installed SRILM did a thorough job)
you should find these installed in
$SRILM/bin/${MACHINE_TYPE}_c/ ...
2. Use the make-big-lm script. See the training-scripts(1) man page
for details.
3. Find a machine with more memory or swap space.
4. Some combination of the above.
--Andreas
More information about the SRILM-User
mailing list