once occuring trigram discarded
Andreas Stolcke
stolcke
Mon Jan 31 10:01:16 PST 2005
In message <41FE6F03.5040103 at irisa.fr>you wrote:
> Hi,
> I made a trigram model using Kneser-Ney modified smoothing and
> interpolation and I don't understand why there are only 5828 trigrams in
> the model whereas there are 102520 trigrams in the corpus. I think that
> the trigrams discarded occur just once because there are 96692 trigrams
> occuring once which is the difference between the trigrams in the corpus
> and the trigram in the model. I tried to use other smoothing and even no
> smoothing but every time the trigrams are discarded.
> I don't understand why since the bigram occuring once (there are 58764
> of such bigrams) are not discarded in the bigram model I built using
> Kneser-Ney modified smoothing and interpolation.
The default cutoff for trigrams (and higher) is count 2.
The default cutoff for unigrams and bigrams is count 1.
Use ngram-count -gt3min 1 to include all trigrams.
ngram-count -help displays the default values for all the options.
--Andreas
More information about the SRILM-User
mailing list