make-ngram-pfsg: bad results with new gawk version
Andreas Stolcke
stolcke at speech.sri.com
Fri Mar 5 07:52:32 PST 2004
Thanks for tracking this down. I'll add a note somewhere that one better
set LC_NUMERIC=C or LC_ALL=C for gawk scripts to do proper artihmetic.
--Andreas
In message <40487FE8.3020708 at ei.tum.de>you wrote:
> Hi Andreas,
>
> Andreas Stolcke wrote:
> > This is quite odd.
>
> I think so, too :)
>
> > make-ngram-pfsg doesn't perform much arithmetic on the log probabilties
> > in the LM. It only scales and rounds them.
> >
> > Can you apply the scale_log() function in make-ngram-pfsg to your LM
> > probabilties and backoff weights, and extract the cases where the output
> > differs?
>
> old awk:
> add_trans BO -> </s> -0.314718
> scale_log(prob) = -7247
> add_trans <s> -> BO -2.596963
> scale_log(prob) = -59800
>
> new awk:
> logscale = 23027
> add_trans BO -> </s> -0.314718
> scale_log(prob) = 0
> add_trans <s> -> BO -2.596963
> scale_log(prob) = -46054
>
> Note that I printed the logscale which seems to be correct.
> ...
> I think I found the problem:
>
> The float log-probs (x) seem to be converted to integers when
> multiplying them with the logscale:
>
> function scale_log(x) {
> return rint(x * logscale);
> }
>
> This seems to be related to the locale settings
> http://mail.gnu.org/archive/html/bug-gnu-utils/2002-07/msg00196.html
>
> If I set LC_ALL="C" in my shell, it also works as expected. So the bad
> behaviour seems to occur with gawk 3.1.3 AND LC_ALL=""...
>
>
> Regards.
> Matthias
>
>
> > --Andreas
> >
> > In message <40475599.9070700 at ei.tum.de>you wrote:
> >
> >>Hello again,
> >>
> >>forgot to say that I tested this with srilm 1.3.3 and 1.3.1.
> >>
> >>Matthias
> >>
> >>Matthias Thomae wrote:
> >>
> >>>Hello Andreas,
> >>>
> >>>make-ngram-pfsg gives me different results with different versions of
> >>>gawk. The header and the links are the same, but the weights differ
> >>>substantially.
> >>>
> >>>I see the old behaviour with gawk 3.1.0 (on debian) and 3.1.1 (on suse),
> >>>and the differing one with 3.1.3-1 and 3.1.3-2 (on debian). The newly
> >>>created PFSGs cause some ASR error degradation...
> >>>
> >>>Any clues?
> >>>
> >>>Regards.
> >>>Matthias
> >>
> >
> >
>
More information about the SRILM-User
mailing list