Solen Quiniou solen.quiniou at irisa.fr
Mon Jan 31 09:46:43 PST 2005

I made a trigram model using Kneser-Ney modified smoothing and 
interpolation and I don't understand why there are only 5828 trigrams in 
the model whereas there are 102520 trigrams in the corpus. I think that 
the trigrams discarded occur just once because there are 96692 trigrams 
occuring once which is the difference between the trigrams in the corpus 
and the trigram in the model. I tried to use other smoothing and even no 
smoothing but every time the trigrams are discarded.
I don't understand why since the bigram occuring once (there are 58764 
of such bigrams) are not discarded in the bigram model I built using 
Kneser-Ney modified smoothing and interpolation.

Thanks a lot for your answer.

