Mirjam Sepesy Maucec
mirjam.sepesy at uni-mb.si
Fri Dec 20 00:12:33 PST 2002
Andreas Stolcke wrote:
> In message <3E007101.7A413D18 at uni-mb.si>you wrote:
> > Hi,
> > I have the following problem.
> > The n-gram counts are computed from raw text corpus by using
> > 'ngram-count' and 'ngram-merge'.
> > I experiment with different vocabularies and bigram and trigram models.
> > In each experiment I run again 'ngram-count -vocab -order' and make the
> > language model with ' make-big-lm -trust-totals'.
> > I test language models on my test set and noticed some mistakes. Some
> > bigrams, which are present in the bigram model get lost in the trigram
> > model. When I omit the -trust-totals option, the results are OK.
> > Why should I not trust the totals in my case? Are the counts of
> > different orders made by 'ngram-count' and 'ngram-merge' not in line?
> > Regards,
> > Mirjam.
> This is indeed a little strange. However, the -trust-totals option
> is obsolete, as it does not interact well with some discounting
> methods (e.g., KN). It was always a hack, and the latest version of
> make-big-lm uses a different strategy for saving memory on ngrams discarded by
> cutoffs (the ngram-count -meta-tag and -read-with-mincounts options,
> see the man page).
> Still, if you can reduce your problem to a small test case I could look
> at it to understand exactly what's going on.
Thank you for answering so quick.
You are right. I used KN discounting. I see, it's time to switch from the
version 1.3.1 to 1.3.2.
I will report the results.
Have nice holidays!
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 302 bytes
Desc: Card for Mirjam Sepesy Maucec
More information about the SRILM-User