Another detail about the ngram-count

Andreas Stolcke stolcke at speech.sri.com
Wed Jan 30 23:11:44 PST 2008


Sai Tang Huang wrote:
> Hi Andreas,
>  
> Another detail probably worth mentioning is that when I run 
> ngram-count to get the counts and create the LM I get a coeff out of 
> range warning:
>  
> warning: discount coeff 1 is out of range: -3.33329e-17
>  
> I read that this was a bug somewhere in the mailing list archive.
It's not a bug (there was bug related to this message back in 2003, but 
it's long fixed).
What it means is that your corpus statistics are such that Good Turing 
discounting is not applicable, specifically, leading to a discounting 
factor that is effectively 0.
The effect is that discounting is disabled for this order of n-gram.
For reasons and countermeasures please check the FAQ man page or web page.
>  
> Could this be affecting the ngram -counts?
Only indirectly, in that the LM will be suboptimal.

Andreas





More information about the SRILM-User mailing list