[SRILM User List] arpa header number of 4g to big for int

Anand Venkataraman venkataraman.anand at gmail.com
Thu Sep 19 08:34:16 PDT 2013


Juan,

One of the things I would probably first check is to see if you're
including way too many 4-grams then necessary. To reduce noise and one-off
occurrences for higher order ngrams, you should probably at least use the
-gt4min 2 option. In most cases the quality of the resultant LM improves
although the count of actual ngrams included decreases. Did you do this?

&


On Thu, Sep 19, 2013 at 4:13 AM, Juan Pino <jmp84 at cam.ac.uk> wrote:

> Hello,
>
> I am running this command with version 1.7.0 (the purpose is to fix the
> format of my input lm):
>
> srilm1.7.0/bin/i686-m64/ngram -debug 1 -order 4 -lm MY_LM_IN_ARPA_FORMAT
> -write-lm MY_OUTPUT_LM
>
> I get this error:
>
> line 6: ngram number -1840328771 out of range
>
> This is because I have this header in my input lm:
> ngram 4=2454638525
>
> So the number of 4grams is bigger than the maximum 32-bit int.
>
> I've fixed it by replacing
> int nNgrams;
> by
> long nNgrams;
> at line 497 in lm/src/NgramLM.cc and by replacing
> } else if (sscanf(line, "ngram %d=%d", &thisOrder, &nNgrams) == 2) {
> by
> } else if (sscanf(line, "ngram %d=%ld", &thisOrder, &nNgrams) == 2) {
> at line 515 in lm/src/NgramLM.cc
>
> Are there other places in the code that I should change ? Is there a
> better solution for my problem ?
>
> Thanks very much,
>
> Juan
>
> _______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://www.speech.sri.com/mailman/listinfo/srilm-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20130919/8961369f/attachment.html>


More information about the SRILM-User mailing list