stolcke at speech.sri.com
Wed Nov 27 10:25:49 PST 2002
In message <3DE4DA7D.9662ED5E at limsi.fr>you wrote:
> I'm a french PhD Student, using the toolkit to compute ngram and
> class-ngram models on Hub4 and Hub5 data.
> I recently tried to mix several models with ngram -mix-lm, which works
> fine except for big models (learned on Hub4).
> It seems to be matter of memory. So I used the -memuse option to have an
> idea of the memory load.
> But this option doesn't reflect the actual load of the memory. It says
> 900M when a top running of the same machine gives a amount a 2,5G used.
That's because -memuse only calculates the memory used by the final model.
For static interpolation with -mix-lm the program needs to temporarily
allocate both the input models and the resulting mixture model, so 2.5 GB
doesn't sound too outlandish.
(I know one could implement this operation without requiring all models
to be fully in memory, but i preferred to keep the code simple.)
> So my 2 questions are :
> - is it normal that the -memuse option gives a wrong result ?
> - is it normal that the toolkit use so much memory, or have I done
> something wrong in the installation ?
The default build optimizes data structures for speed, not space.
that's why you see a significant portion of memory "wasted" (according to
-memuse output). That's the extra space needed to keep hash tables sparse.
As of SRILM version 1.3.2, you can build a separate version of the binaries
optimized for space, and that's usually worth it once you start dealing with
Hub4 ;-) Follow the instructions under item 9 in the INSTALL file.
More information about the SRILM-User