[SRILM User List] Why does 'ngram -factored' needs the countfile
Gregor Donaj
gregor.donaj at uni-mb.si
Fri Oct 12 01:25:38 PDT 2012
Thank youfor your answers. I already thought it has something to do with
backoff strategies. I am currently experimenting only on models with
fixed backoff paths, so I will use /dev/null.
Gregor
On 10/11/2012 08:15 PM, Andreas Stolcke wrote:
> FYI, here is Jeff's response, which didn't get propagated to the list
> since he isn't subscribed:
>
> On 10/11/2012 10:37 AM, Jeff Bilmes wrote:
>> For some backoff strategies (which can only be determined based on
>> the options associated with the backoff graph), one does need the
>> count file to determine how to do backoff. If I remember correctly, I
>> think that the check for existence of count file is done at a stage
>> in the code far different than when it is determined if it is needed
>> or not which might be the reason why it just, by default, always asks
>> for one. But if you are certain that in your backoffoptions
>> associated with the backoff graph it is not necessary to have a count
>> file, then it should be safe to use the /dev/null solution mentioned
>> by Andreas below ...
>
> Andreas
>
> On 10/11/2012 9:41 AM, Andreas Stolcke wrote:
>> On 10/11/2012 7:59 AM, Gregor Donaj wrote:
>>> Hi,
>>>
>>> I'm trying to rescore factored hypothesizes with ngram with the
>>> -factored option. I realized that the program requires the countfile
>>> to be present as specified in the flm definition file and that it
>>> also seems to be loaded into memory. Same with using fngram. Why is
>>> this so?
>>>
>>> Since for calculating probabilities and perplexities I only need the
>>> actual language model file and not the counts, this is a bit
>>> annoying as my countfiles are sometimes larger than my RAM.
>>>
>>> I kind of "solved" the problem by creating and empty countfile. I
>>> tested this on a small example and saw that it calculates the
>>> rescored probabilities fine. Is there any way to tell ngram not to
>>> look for the countfile? I guess that would be a better solution that
>>> just giving the program a dummy countfile that doesn't correspond to
>>> the language model file.
>>>
>>> Thanks
>>>
>>>
>> I would agree with you, but I'm cc-ing Jeff Bilmes, who wrote the
>> original code and might know of other reasons for handling the
>> countfiles the way it is done now.
>>
>> If empty countfiles work for you then a quick workaround is to write
>> a few lines of perl that replace the count files with /dev/null (no
>> need to create actual empty files) in any given FLM model file.
>>
>> Andreas
>>
>
> _______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://www.speech.sri.com/mailman/listinfo/srilm-user
>
More information about the SRILM-User
mailing list