missing counts

Fri Dec 20 00:12:33 PST 2002

Andreas Stolcke wrote:

> In message <3E007101.7A413D18 at uni-mb.si>you wrote:
> >
> > Hi,
> >
> > I have the following problem.
> >
> > The n-gram counts are computed from raw text corpus by using
> > 'ngram-count' and  'ngram-merge'.
> > I experiment with different vocabularies and bigram and trigram models.
> > In each experiment I run again 'ngram-count -vocab -order' and make the
> > language model with ' make-big-lm -trust-totals'.
> > I test language models on my test set and noticed some mistakes. Some
> > bigrams, which are present in the bigram model get lost in the trigram
> > model. When I omit the -trust-totals option, the results are OK.
> > Why should I not trust the totals in my case?  Are the counts of
> > different orders made by 'ngram-count' and  'ngram-merge' not in line?
> >
> > Regards,
> >
> > Mirjam.
>
> This is indeed a little strange. However, the -trust-totals option
> is obsolete, as it does not interact well with some discounting
> methods (e.g., KN).  It was always a hack, and the latest version of
> make-big-lm uses a different strategy for saving memory on ngrams discarded by
> cutoffs (the ngram-count -meta-tag and -read-with-mincounts options,
> see the man page).
>
> Still, if you can reduce your problem to a small test case I could look
> at it to understand exactly what's going on.
>
> --Andreas

Thank you for answering so quick.
You are right. I used KN discounting.  I see, it's time to switch from the
version 1.3.1 to 1.3.2.
I will report the results.

Have nice holidays!

Mirjam

-------------- next part --------------
A non-text attachment was scrubbed...
Name: mirjam.sepesy.vcf
Type: text/x-vcard
Size: 302 bytes
Desc: Card for Mirjam Sepesy Maucec
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20021220/968a68e3/attachment.vcf>