[SRILM User List] What's the limitation to memory in make-batch-counts ?

Andreas Stolcke stolcke at icsi.berkeley.edu
Fri Aug 3 16:39:50 PDT 2012


On 8/3/2012 3:18 AM, Meng Chen wrote:
> Hi, in *make-batch-counts*, we need to set the batch-size in order to 
> count faster. it says "For maximum performance, batch-size should be 
> as large as possible without triggering paging". However, sometimes I 
> found it would crash if I set it too large (eg. 500). So I want to ask 
> if there is any limitation to batch-size. Suppose every text in file 
> list is *a* MB, the memory of server is *b* MB,the batch-size should 
> not be larger than *b/a*, is it right? Or some other limitations?

make-batch-counts actually works sequentially, so you can devote all of 
a machine's memory to computing counts, unless you have other things 
running.  If you want to parallelize the counting you have to devise 
your own method for that.

Of course in general there other things running on a machine, and some 
systems start randomly killing processes when you exhaust their memory.  
I suspect that's what is happening in your case.  There is no built-in 
limitation in make-batch-counts, other than the limits imposed by the 
system.   Another reason your job might have crashed is that you are 
using 32bit binaries and you were hitting against the 2 or 4 GB limit 
inherent in 32bit memory addresses.

Andreas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20120803/6262708b/attachment.html>


More information about the SRILM-User mailing list