[SRILM User List] ngram-count large count file

Andreas Stolcke stolcke at speech.sri.com
Tue Dec 15 09:43:21 PST 2009


王秋锋 wrote:
> Dear SRILM users,
> I wanted to get a BiGram from the word pair counts file,
> so I took it as :
> ngram-count -read CountFile -lm -BiGrams -order 2
> But several minutes later, it was killed.
> I suspect my CountFile is too large(3.5GB) ,and my memmoy is 2.0GB.
> so if the CountFiles is read in memory, it will be overflow.
> So my question is ,
> 1:Does the SRILM reads the whole CountFile in memory?
> or read some lines and train some BiGrams ,them again and again?
You need more than just the ngram corresponding to a certain LM
probability, on account of smoothing and normalization.
> 2:How can I do to get the BiGram with this large CountFile?
This is a FAQ, please look at
http://www.speech.sri.com/projects/srilm/manpages/srilm-faq.7.html

Andreas




More information about the SRILM-User mailing list