[SRILM User List] Why does 'ngram -factored' needs the countfile
Andreas Stolcke
stolcke at icsi.berkeley.edu
Thu Oct 11 09:41:48 PDT 2012
On 10/11/2012 7:59 AM, Gregor Donaj wrote:
> Hi,
>
> I'm trying to rescore factored hypothesizes with ngram with the
> -factored option. I realized that the program requires the countfile
> to be present as specified in the flm definition file and that it also
> seems to be loaded into memory. Same with using fngram. Why is this so?
>
> Since for calculating probabilities and perplexities I only need the
> actual language model file and not the counts, this is a bit annoying
> as my countfiles are sometimes larger than my RAM.
>
> I kind of "solved" the problem by creating and empty countfile. I
> tested this on a small example and saw that it calculates the rescored
> probabilities fine. Is there any way to tell ngram not to look for the
> countfile? I guess that would be a better solution that just giving
> the program a dummy countfile that doesn't correspond to the language
> model file.
>
> Thanks
>
>
I would agree with you, but I'm cc-ing Jeff Bilmes, who wrote the
original code and might know of other reasons for handling the
countfiles the way it is done now.
If empty countfiles work for you then a quick workaround is to write a
few lines of perl that replace the count files with /dev/null (no need
to create actual empty files) in any given FLM model file.
Andreas
More information about the SRILM-User
mailing list