[SRILM User List] ngram-count hangs and other problems

Andreas Stolcke stolcke at icsi.berkeley.edu
Wed Oct 9 09:43:37 PDT 2013


On 10/9/2013 7:31 AM, E wrote:
> Perhaps the ngramCount file I used crosses some limit on count of a 
> particular ngram. Because some very large count words have positive 
> log probability in the "ug.lm" file. BTW I used bin/i686/ngram-count 
> executable.
> I used Web1T to obtain these counts. Is there a workaround, like 
> assigning artificial counts (= upperlimit) to the troublesome ngrams?

My suspicion is that you're exceeding memory limits with this data.  
Possibly you are also exceeding the range of 32bit integers with some 
large unigram counts.

1) Make sure you're building 64-bit executables.   If "file 
bin/i686/ngram-count" says that it's an 32-bit executable, do a "make 
clean" and rebuilt with "make MACHINE_TYPE=i686-m64  ..." .

2) To find out what the memory demand of your job is, try scaling back 
the data size (say take 1/100 or 1/10 of it), and monitor the memory 
usage with "top" or similar utility.  Then extrapolate (linearly) to the 
full data size.

3) If you find your computer doesn't have enough memory try the memory 
saving techniques discussed at 
http://www.speech.sri.com/projects/srilm/manpages/srilm-faq.7.html under 
"Large data and memory issues".

Good luck!

Andreas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131009/a86a740a/attachment.html>


More information about the SRILM-User mailing list