format error in kncounts.gz

ilya oparin ioparin at yahoo.co.uk
Thu Jun 5 08:36:42 PDT 2008


You have probably set wrong parameters to make-big-lm or took wrong output file.

make-big-lm -name name -read counts -lm new-model [ -trust-totals ] [-max-per-file M ] [ -ngram-filter filter ] [ ngram-options ... ]

May it happen that took counts file (from manual: "The  -name parameter is used to name various auxiliary files.  counts contains the raw N-gram counts; it may be (and usually is) a compressed file. "), instead of the resulting LM file generated by the script (the name of which you put after -lm option)? Basically a count file is used to generate LMs that are subsequently read with "ngram -lm my_LM ...". Counts file is not a language model on its own.


best regards,
Ilya


--- On Thu, 5/6/08, Alexy Khrabrov <deliverable at gmail.com> wrote:

> From: Alexy Khrabrov <deliverable at gmail.com>
> Subject: Re: format error in kncounts.gz
> To: ioparin at yahoo.co.uk
> Cc: "srilm-user" <srilm-user at speech.sri.com>
> Date: Thursday, 5 June, 2008, 6:46 PM
> Hmm -- I've run make-big-lm, and got a few small files,
> a .kndir, and  
> that kncounts.gz -- which looks just like counts and is a
> few  
> gigabytes, so I thought that's my model.  I've
> posted my command line  
> earlier when figuring out exactly the way to get a
> Kneser-Ney  
> model...  The kncounts.gz looks just like a counts file.
> 
> The counts I fed to make-big-lm with -read are the ones I
> got with  
> make/merge-batch-counts -order 5 for 5-grams.  Should I
> have done  
> anything extra before or after?
> 
> Cheers,
> Alexy
> 
> On Jun 5, 2008, at 5:46 AM, ilya oparin wrote:
> 
> > That usually means you're loading something else
> than a LM in the  
> > ARPA format. Have you visually checked your
> model.kncounts.gz?


      __________________________________________________________
Sent from Yahoo! Mail.
A Smarter Email http://uk.docs.yahoo.com/nowyoucan.html




More information about the SRILM-User mailing list