Fw: GT coefficients
Andreas Stolcke
stolcke at speech.sri.com
Thu Sep 11 03:57:43 PDT 2008
Mirjam Sepesy Maucec wrote:
> Hi,
>
> I found an old question and no answer (in the SRI-LM Mailing List
> Archive) . I attach it!
> I tackle the same problem:
> When I convert decimal , (comma) into a . (dot) in discount files,
> warnings disappear...
> Discount files were produced by make-big-lm script.
I decimal numbers in the discount files apear with commas instead of
decimal points that's almost certainly a locale setting issue. The
CHANGES file has the following :
* Matthias Thomae <thomae at ei.tum.de> found that make-ngram-pfsg
(and probably other gawk scripts) may not work correctly with recent
versions of gawk unless the environment is set to LC_NUMERIC=C.
Note that the gt files are computed by gawk scripts.
What I can do is set LC_NUMERIC=C in make-big-lm to avoid the problem
in most common cases.
Andreas
>
> Best,
>
> Mirjam
>
> ----- Original Message -----
> *From:* ilya oparin <mailto:ioparin at yahoo.co.uk>
> *To:* srilm-list <mailto:srilm-user at speech.sri.com>
> *Sent:* Sunday, June 11, 2006 3:05 PM
> *Subject:* GT coefficients
>
> Hello!
>
> If I count GT coefficients in advance and then feed GT-files
> (generated by make-gt-discounts) to ngram-count or make-big-lm, I get
> warnings of the kind
>
> file.gt1: line 9: warning: discount coefficient 1 = 0.0
> file.gt1: line 9: warning: discount coefficient 2 = 0.0
> ...
>
> and so on for all the gt parameters. Files themselves are alright and
> do not contain any zeroes. Number next to line corresponds to the last
> line in a gt-file.
> The model I get with this differs from that I get when just use
> ngram-count without loading GT coefficients (it appears much smaller
> in bigrams and trigrams) with the same gtmin and gtmax values.
> Could anybody tell me why it happens like this?
>
>
> best regards,
> Ilya
>
> Send instant messages to your online friends
> http://uk.messenger.yahoo.com
>
More information about the SRILM-User
mailing list