[SRILM User List] ngram-count hangs and other problems
Andreas Stolcke
stolcke at icsi.berkeley.edu
Wed Oct 9 09:43:37 PDT 2013
On 10/9/2013 7:31 AM, E wrote:
> Perhaps the ngramCount file I used crosses some limit on count of a
> particular ngram. Because some very large count words have positive
> log probability in the "ug.lm" file. BTW I used bin/i686/ngram-count
> executable.
> I used Web1T to obtain these counts. Is there a workaround, like
> assigning artificial counts (= upperlimit) to the troublesome ngrams?
My suspicion is that you're exceeding memory limits with this data.
Possibly you are also exceeding the range of 32bit integers with some
large unigram counts.
1) Make sure you're building 64-bit executables. If "file
bin/i686/ngram-count" says that it's an 32-bit executable, do a "make
clean" and rebuilt with "make MACHINE_TYPE=i686-m64 ..." .
2) To find out what the memory demand of your job is, try scaling back
the data size (say take 1/100 or 1/10 of it), and monitor the memory
usage with "top" or similar utility. Then extrapolate (linearly) to the
full data size.
3) If you find your computer doesn't have enough memory try the memory
saving techniques discussed at
http://www.speech.sri.com/projects/srilm/manpages/srilm-faq.7.html under
"Large data and memory issues".
Good luck!
Andreas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131009/a86a740a/attachment.html>
More information about the SRILM-User
mailing list