[SRILM User List] arpa header number of 4g to big for int

Juan Pino jmp84 at cam.ac.uk
Thu Sep 19 08:49:08 PDT 2013


Hi Anand,

Thanks for the tip. The context is:
-- I generate a 4gram with the kenlm toolkit.
-- I don't think the kenlm toolkit has an option equivalent to -gt4min
-- I can't generate a Kneser-Ney lm with srilm because of memory
constraints. Probably the memory requirements with -gt4min 2 go down so I
would have to check this.
I still would like to know how to modify srilm to handle more ngrams than
the max 32-bit int.

Best,

Juan


On Thu, Sep 19, 2013 at 4:34 PM, Anand Venkataraman <
venkataraman.anand at gmail.com> wrote:

> Juan,
>
> One of the things I would probably first check is to see if you're
> including way too many 4-grams then necessary. To reduce noise and one-off
> occurrences for higher order ngrams, you should probably at least use the
> -gt4min 2 option. In most cases the quality of the resultant LM improves
> although the count of actual ngrams included decreases. Did you do this?
>
> &
>
>
> On Thu, Sep 19, 2013 at 4:13 AM, Juan Pino <jmp84 at cam.ac.uk> wrote:
>
>> Hello,
>>
>> I am running this command with version 1.7.0 (the purpose is to fix the
>> format of my input lm):
>>
>> srilm1.7.0/bin/i686-m64/ngram -debug 1 -order 4 -lm MY_LM_IN_ARPA_FORMAT
>> -write-lm MY_OUTPUT_LM
>>
>> I get this error:
>>
>> line 6: ngram number -1840328771 out of range
>>
>> This is because I have this header in my input lm:
>> ngram 4=2454638525
>>
>> So the number of 4grams is bigger than the maximum 32-bit int.
>>
>> I've fixed it by replacing
>> int nNgrams;
>> by
>> long nNgrams;
>> at line 497 in lm/src/NgramLM.cc and by replacing
>> } else if (sscanf(line, "ngram %d=%d", &thisOrder, &nNgrams) == 2) {
>> by
>> } else if (sscanf(line, "ngram %d=%ld", &thisOrder, &nNgrams) == 2) {
>> at line 515 in lm/src/NgramLM.cc
>>
>> Are there other places in the code that I should change ? Is there a
>> better solution for my problem ?
>>
>> Thanks very much,
>>
>> Juan
>>
>> _______________________________________________
>> SRILM-User site list
>> SRILM-User at speech.sri.com
>> http://www.speech.sri.com/mailman/listinfo/srilm-user
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20130919/be688ed3/attachment.html>


More information about the SRILM-User mailing list