Mirjam Sepesy Maucec
mirjam.sepesy at uni-mb.si
Wed Dec 18 04:58:41 PST 2002
I have the following problem.
The n-gram counts are computed from raw text corpus by using
'ngram-count' and 'ngram-merge'.
I experiment with different vocabularies and bigram and trigram models.
In each experiment I run again 'ngram-count -vocab -order' and make the
language model with ' make-big-lm -trust-totals'.
I test language models on my test set and noticed some mistakes. Some
bigrams, which are present in the bigram model get lost in the trigram
model. When I omit the -trust-totals option, the results are OK.
Why should I not trust the totals in my case? Are the counts of
different orders made by 'ngram-count' and 'ngram-merge' not in line?
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 302 bytes
Desc: Card for Mirjam Sepesy Maucec
More information about the SRILM-User