OOV words
Dmitriy Dligach
Dmitriy.Dligach at colorado.edu
Tue Apr 7 08:30:53 PDT 2009
Hello,
First of all I wanted to thank the creators of SRILM -- I find this
tool extremely useful in my research.
Second, I have a question about out-of-vocabulary (OOV) words. I train
a language model on a collection of english news wire text:
ngram-count -text all.txt -lm all.lm -order 5
and then compute probabilities:
ngram -lm all.lm -ppl test.txt -debug 1
There happen to be some sentences in foreign languages in my test.txt
file. I'd expect them to receive very low probabilities because the
model was trained on strictly english text. However, instead they
receive very high probabilities.
Could this have something to do with the way SRILM handles OOV words?
Dima
More information about the SRILM-User
mailing list