[SRILM User List] Why does 'ngram -factored' needs the countfile

Andreas Stolcke stolcke at icsi.berkeley.edu
Thu Oct 11 11:15:22 PDT 2012


FYI, here is Jeff's response, which didn't get propagated to the list 
since he isn't subscribed:

On 10/11/2012 10:37 AM, Jeff Bilmes wrote:
> For some backoff strategies (which can only be determined based on the 
> options associated with the backoff graph), one does need the count 
> file to determine how to do backoff. If I remember correctly, I think 
> that the check for existence of count file is done at a stage in the 
> code far different than when it is determined if it is needed or not 
> which might be the reason why it just, by default, always asks for 
> one. But if you are certain that in your backoffoptions associated 
> with the backoff graph it is not necessary to have a count file, then 
> it should be safe to use the /dev/null solution mentioned by Andreas 
> below ...

Andreas

On 10/11/2012 9:41 AM, Andreas Stolcke wrote:
> On 10/11/2012 7:59 AM, Gregor Donaj wrote:
>> Hi,
>>
>> I'm trying to rescore factored hypothesizes with ngram with the 
>> -factored option. I realized that the program requires the countfile 
>> to be present as specified in the flm definition file and that it 
>> also seems to be loaded into memory. Same with using fngram. Why is 
>> this so?
>>
>> Since for calculating probabilities and perplexities I only need the 
>> actual language model file and not the counts, this is a bit annoying 
>> as my countfiles are sometimes larger than my RAM.
>>
>> I kind of "solved" the problem by creating and empty countfile. I 
>> tested this on a small example and saw that it calculates the 
>> rescored probabilities fine. Is there any way to tell ngram not to 
>> look for the countfile? I guess that would be a better solution that 
>> just giving the program a dummy countfile that doesn't correspond to 
>> the language model file.
>>
>> Thanks
>>
>>
> I would agree with you, but I'm cc-ing Jeff Bilmes, who wrote the 
> original code and might know of other reasons for handling the 
> countfiles the way it is done now.
>
> If empty countfiles work for you then a quick workaround is to write a 
> few lines of perl that replace the count files with /dev/null (no need 
> to create actual empty files) in any given FLM model file.
>
> Andreas
>



More information about the SRILM-User mailing list