Another detail about the ngram-count
Andreas Stolcke
stolcke at speech.sri.com
Wed Jan 30 23:11:44 PST 2008
Sai Tang Huang wrote:
> Hi Andreas,
>
> Another detail probably worth mentioning is that when I run
> ngram-count to get the counts and create the LM I get a coeff out of
> range warning:
>
> warning: discount coeff 1 is out of range: -3.33329e-17
>
> I read that this was a bug somewhere in the mailing list archive.
It's not a bug (there was bug related to this message back in 2003, but
it's long fixed).
What it means is that your corpus statistics are such that Good Turing
discounting is not applicable, specifically, leading to a discounting
factor that is effectively 0.
The effect is that discounting is disabled for this order of n-gram.
For reasons and countermeasures please check the FAQ man page or web page.
>
> Could this be affecting the ngram -counts?
Only indirectly, in that the LM will be suboptimal.
Andreas
More information about the SRILM-User
mailing list