missing counts

Wed Dec 18 04:58:41 PST 2002

Hi,

I have the following problem.

The n-gram counts are computed from raw text corpus by using
'ngram-count' and  'ngram-merge'.
I experiment with different vocabularies and bigram and trigram models.
In each experiment I run again 'ngram-count -vocab -order' and make the
language model with ' make-big-lm -trust-totals'.
I test language models on my test set and noticed some mistakes. Some
bigrams, which are present in the bigram model get lost in the trigram
model. When I omit the -trust-totals option, the results are OK.
Why should I not trust the totals in my case?  Are the counts of
different orders made by 'ngram-count' and  'ngram-merge' not in line?

Regards,

Mirjam.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mirjam.sepesy.vcf
Type: text/x-vcard
Size: 302 bytes
Desc: Card for Mirjam Sepesy Maucec
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20021218/e0a2f443/attachment.vcf>