format error in kncounts.gz
ilya oparin
ioparin at yahoo.co.uk
Thu Jun 5 08:36:42 PDT 2008
You have probably set wrong parameters to make-big-lm or took wrong output file.
make-big-lm -name name -read counts -lm new-model [ -trust-totals ] [-max-per-file M ] [ -ngram-filter filter ] [ ngram-options ... ]
May it happen that took counts file (from manual: "The -name parameter is used to name various auxiliary files. counts contains the raw N-gram counts; it may be (and usually is) a compressed file. "), instead of the resulting LM file generated by the script (the name of which you put after -lm option)? Basically a count file is used to generate LMs that are subsequently read with "ngram -lm my_LM ...". Counts file is not a language model on its own.
best regards,
Ilya
--- On Thu, 5/6/08, Alexy Khrabrov <deliverable at gmail.com> wrote:
> From: Alexy Khrabrov <deliverable at gmail.com>
> Subject: Re: format error in kncounts.gz
> To: ioparin at yahoo.co.uk
> Cc: "srilm-user" <srilm-user at speech.sri.com>
> Date: Thursday, 5 June, 2008, 6:46 PM
> Hmm -- I've run make-big-lm, and got a few small files,
> a .kndir, and
> that kncounts.gz -- which looks just like counts and is a
> few
> gigabytes, so I thought that's my model. I've
> posted my command line
> earlier when figuring out exactly the way to get a
> Kneser-Ney
> model... The kncounts.gz looks just like a counts file.
>
> The counts I fed to make-big-lm with -read are the ones I
> got with
> make/merge-batch-counts -order 5 for 5-grams. Should I
> have done
> anything extra before or after?
>
> Cheers,
> Alexy
>
> On Jun 5, 2008, at 5:46 AM, ilya oparin wrote:
>
> > That usually means you're loading something else
> than a LM in the
> > ARPA format. Have you visually checked your
> model.kncounts.gz?
__________________________________________________________
Sent from Yahoo! Mail.
A Smarter Email http://uk.docs.yahoo.com/nowyoucan.html
More information about the SRILM-User
mailing list