[SRILM User List] Why does 'ngram -factored' needs the countfile

Fri Oct 12 01:25:38 PDT 2012

Thank youfor your answers. I already thought it has something to do with 
backoff strategies. I am currently experimenting only on models with 
fixed backoff paths, so I will use /dev/null.

Gregor

On 10/11/2012 08:15 PM, Andreas Stolcke wrote:
> FYI, here is Jeff's response, which didn't get propagated to the list 
> since he isn't subscribed:
>
> On 10/11/2012 10:37 AM, Jeff Bilmes wrote:
>> For some backoff strategies (which can only be determined based on 
>> the options associated with the backoff graph), one does need the 
>> count file to determine how to do backoff. If I remember correctly, I 
>> think that the check for existence of count file is done at a stage 
>> in the code far different than when it is determined if it is needed 
>> or not which might be the reason why it just, by default, always asks 
>> for one. But if you are certain that in your backoffoptions 
>> associated with the backoff graph it is not necessary to have a count 
>> file, then it should be safe to use the /dev/null solution mentioned 
>> by Andreas below ...
>
> Andreas
>
> On 10/11/2012 9:41 AM, Andreas Stolcke wrote:
>> On 10/11/2012 7:59 AM, Gregor Donaj wrote:
>>> Hi,
>>>
>>> I'm trying to rescore factored hypothesizes with ngram with the 
>>> -factored option. I realized that the program requires the countfile 
>>> to be present as specified in the flm definition file and that it 
>>> also seems to be loaded into memory. Same with using fngram. Why is 
>>> this so?
>>>
>>> Since for calculating probabilities and perplexities I only need the 
>>> actual language model file and not the counts, this is a bit 
>>> annoying as my countfiles are sometimes larger than my RAM.
>>>
>>> I kind of "solved" the problem by creating and empty countfile. I 
>>> tested this on a small example and saw that it calculates the 
>>> rescored probabilities fine. Is there any way to tell ngram not to 
>>> look for the countfile? I guess that would be a better solution that 
>>> just giving the program a dummy countfile that doesn't correspond to 
>>> the language model file.
>>>
>>> Thanks
>>>
>>>
>> I would agree with you, but I'm cc-ing Jeff Bilmes, who wrote the 
>> original code and might know of other reasons for handling the 
>> countfiles the way it is done now.
>>
>> If empty countfiles work for you then a quick workaround is to write 
>> a few lines of perl that replace the count files with /dev/null (no 
>> need to create actual empty files) in any given FLM model file.
>>
>> Andreas
>>
>
> _______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://www.speech.sri.com/mailman/listinfo/srilm-user
>