[SRILM User List] Why does 'ngram -factored' needs the countfile
Andreas Stolcke
stolcke at icsi.berkeley.edu
Thu Oct 11 11:15:22 PDT 2012
FYI, here is Jeff's response, which didn't get propagated to the list
since he isn't subscribed:
On 10/11/2012 10:37 AM, Jeff Bilmes wrote:
> For some backoff strategies (which can only be determined based on the
> options associated with the backoff graph), one does need the count
> file to determine how to do backoff. If I remember correctly, I think
> that the check for existence of count file is done at a stage in the
> code far different than when it is determined if it is needed or not
> which might be the reason why it just, by default, always asks for
> one. But if you are certain that in your backoffoptions associated
> with the backoff graph it is not necessary to have a count file, then
> it should be safe to use the /dev/null solution mentioned by Andreas
> below ...
Andreas
On 10/11/2012 9:41 AM, Andreas Stolcke wrote:
> On 10/11/2012 7:59 AM, Gregor Donaj wrote:
>> Hi,
>>
>> I'm trying to rescore factored hypothesizes with ngram with the
>> -factored option. I realized that the program requires the countfile
>> to be present as specified in the flm definition file and that it
>> also seems to be loaded into memory. Same with using fngram. Why is
>> this so?
>>
>> Since for calculating probabilities and perplexities I only need the
>> actual language model file and not the counts, this is a bit annoying
>> as my countfiles are sometimes larger than my RAM.
>>
>> I kind of "solved" the problem by creating and empty countfile. I
>> tested this on a small example and saw that it calculates the
>> rescored probabilities fine. Is there any way to tell ngram not to
>> look for the countfile? I guess that would be a better solution that
>> just giving the program a dummy countfile that doesn't correspond to
>> the language model file.
>>
>> Thanks
>>
>>
> I would agree with you, but I'm cc-ing Jeff Bilmes, who wrote the
> original code and might know of other reasons for handling the
> countfiles the way it is done now.
>
> If empty countfiles work for you then a quick workaround is to write a
> few lines of perl that replace the count files with /dev/null (no need
> to create actual empty files) in any given FLM model file.
>
> Andreas
>
More information about the SRILM-User
mailing list