make-ngram-pfsg: bad results with new gawk version
Matthias Thomae
thomae at ei.tum.de
Fri Mar 5 05:26:00 PST 2004
Hi Andreas,
Andreas Stolcke wrote:
> This is quite odd.
I think so, too :)
> make-ngram-pfsg doesn't perform much arithmetic on the log probabilties
> in the LM. It only scales and rounds them.
>
> Can you apply the scale_log() function in make-ngram-pfsg to your LM
> probabilties and backoff weights, and extract the cases where the output
> differs?
old awk:
add_trans BO -> </s> -0.314718
scale_log(prob) = -7247
add_trans <s> -> BO -2.596963
scale_log(prob) = -59800
new awk:
logscale = 23027
add_trans BO -> </s> -0.314718
scale_log(prob) = 0
add_trans <s> -> BO -2.596963
scale_log(prob) = -46054
Note that I printed the logscale which seems to be correct.
...
I think I found the problem:
The float log-probs (x) seem to be converted to integers when
multiplying them with the logscale:
function scale_log(x) {
return rint(x * logscale);
}
This seems to be related to the locale settings
http://mail.gnu.org/archive/html/bug-gnu-utils/2002-07/msg00196.html
If I set LC_ALL="C" in my shell, it also works as expected. So the bad
behaviour seems to occur with gawk 3.1.3 AND LC_ALL=""...
Regards.
Matthias
> --Andreas
>
> In message <40475599.9070700 at ei.tum.de>you wrote:
>
>>Hello again,
>>
>>forgot to say that I tested this with srilm 1.3.3 and 1.3.1.
>>
>>Matthias
>>
>>Matthias Thomae wrote:
>>
>>>Hello Andreas,
>>>
>>>make-ngram-pfsg gives me different results with different versions of
>>>gawk. The header and the links are the same, but the weights differ
>>>substantially.
>>>
>>>I see the old behaviour with gawk 3.1.0 (on debian) and 3.1.1 (on suse),
>>>and the differing one with 3.1.3-1 and 3.1.3-2 (on debian). The newly
>>>created PFSGs cause some ASR error degradation...
>>>
>>>Any clues?
>>>
>>>Regards.
>>>Matthias
>>
>
>
More information about the SRILM-User
mailing list