[SRILM User List] Why does 'ngram -factored' needs the countfile

Andreas Stolcke stolcke at icsi.berkeley.edu
Thu Oct 11 09:41:48 PDT 2012


On 10/11/2012 7:59 AM, Gregor Donaj wrote:
> Hi,
>
> I'm trying to rescore factored hypothesizes with ngram with the 
> -factored option. I realized that the program requires the countfile 
> to be present as specified in the flm definition file and that it also 
> seems to be loaded into memory. Same with using fngram. Why is this so?
>
> Since for calculating probabilities and perplexities I only need the 
> actual language model file and not the counts, this is a bit annoying 
> as my countfiles are sometimes larger than my RAM.
>
> I kind of "solved" the problem by creating and empty countfile. I 
> tested this on a small example and saw that it calculates the rescored 
> probabilities fine. Is there any way to tell ngram not to look for the 
> countfile? I guess that would be a better solution that just giving 
> the program a dummy countfile that doesn't correspond to the language 
> model file.
>
> Thanks
>
>
I would agree with you, but I'm cc-ing Jeff Bilmes, who wrote the 
original code and might know of other reasons for handling the 
countfiles the way it is done now.

If empty countfiles work for you then a quick workaround is to write a 
few lines of perl that replace the count files with /dev/null (no need 
to create actual empty files) in any given FLM model file.

Andreas



More information about the SRILM-User mailing list