Naive question about unknown words

Andreas Stolcke stolcke at speech.sri.com
Tue Oct 11 09:15:49 PDT 2005


In message <434BD6DF.7040405 at healthonnet.org>you wrote:
> Sorry for this naive question:
> 
> I create my LM with this command:
> ngram-count  -text learningdb.txt -lm GT -unk
> 
> I evaluate a sentence with the following command:
> ngram -lm GT -ppl sentence.txt
> 
> I obtain coherent results but I get also the following warning message:
> "warning: non-zero probability for <unk> in closed-vocabulary LM"
> 
> Can anyone give me some information about this warning and how to avoid it?
> Of course I need to give a weight for the unknown words.

You need to specify -unk on the ngram command line as well.

--Andreas 




More information about the SRILM-User mailing list