Fw: GT coefficients

Andreas Stolcke stolcke at speech.sri.com
Thu Sep 11 03:57:43 PDT 2008


Mirjam Sepesy Maucec wrote:
> Hi,
>  
> I found an old question and no answer (in the SRI-LM Mailing List 
> Archive) . I attach it!
> I tackle the same problem:
> When I convert decimal ,  (comma) into a . (dot) in discount files, 
> warnings disappear...
> Discount files were produced by make-big-lm script.
I decimal numbers in the discount files apear with commas instead of 
decimal points that's almost certainly a locale setting issue.  The 
CHANGES file has the following :

        * Matthias Thomae <thomae at ei.tum.de> found that make-ngram-pfsg
        (and probably other gawk scripts) may not work correctly with recent
        versions of gawk unless the environment is set to LC_NUMERIC=C.

Note that the gt files are computed by gawk scripts.

What I can do is set  LC_NUMERIC=C in make-big-lm to avoid the problem 
in most common cases.

Andreas

 

>  
> Best,
>  
> Mirjam
>  
> ----- Original Message -----
> *From:* ilya oparin <mailto:ioparin at yahoo.co.uk>
> *To:* srilm-list <mailto:srilm-user at speech.sri.com>
> *Sent:* Sunday, June 11, 2006 3:05 PM
> *Subject:* GT coefficients
>
> Hello!
>
> If I count GT coefficients in advance and then feed GT-files 
> (generated by make-gt-discounts) to ngram-count or make-big-lm, I get 
> warnings of the kind
>
> file.gt1: line 9: warning: discount coefficient 1 = 0.0
> file.gt1: line 9: warning: discount coefficient 2 = 0.0
> ...
>
> and so on for all the gt parameters. Files themselves are alright and 
> do not contain any zeroes. Number next to line corresponds to the last 
> line in a gt-file.
> The model I get with this differs from that I get when just use 
> ngram-count without loading GT coefficients (it appears much smaller 
> in bigrams and trigrams) with the same gtmin and gtmax values.
> Could anybody tell me why it happens like this?
>
>
> best regards,
> Ilya
>
> Send instant messages to your online friends 
> http://uk.messenger.yahoo.com
>





More information about the SRILM-User mailing list