once occuring trigram discarded
solen.quiniou at irisa.fr
Mon Jan 31 09:46:43 PST 2005
I made a trigram model using Kneser-Ney modified smoothing and
interpolation and I don't understand why there are only 5828 trigrams in
the model whereas there are 102520 trigrams in the corpus. I think that
the trigrams discarded occur just once because there are 96692 trigrams
occuring once which is the difference between the trigrams in the corpus
and the trigram in the model. I tried to use other smoothing and even no
smoothing but every time the trigrams are discarded.
I don't understand why since the bigram occuring once (there are 58764
of such bigrams) are not discarded in the bigram model I built using
Kneser-Ney modified smoothing and interpolation.
Thanks a lot for your answer.
Solen Quiniou (Solen.Quiniou at irisa.fr)
Doctorante, équipe IMADOC - bureau C303
IRISA-INRIA, Campus de Beaulieu
35042 Rennes cedex, France
Tél: +33 (0) 2 99 84 22 35
Fax: +33 (0) 2 99 84 71 71
More information about the SRILM-User