[SRILM User List] Probability of Unknown Words - Kneser Ney?

Andreas Stolcke stolcke at icsi.berkeley.edu
Mon May 21 11:35:15 PDT 2012


On 5/20/2012 8:28 PM, Burkay Gur wrote:
> Hi!
>
> I was wondering how we calculate the probability of unk words while 
> using unmodified Kneser Ney. I know that Kneser Ney never assigns zero 
> probs. How is that possible with words that are never seen? Or words 
> that are in the dictionary but not in the training corpus?
There is nothing special that KN smoothing does with unknown words.  
Like all smoothing methods, unknown words are either ignored (assigned 0 
probability) or modeled by a designated <unk> token, depending on how 
your data is prepared and the ngram-count -unk option.

For more information see the FAQ page 
http://www.speech.sri.com/projects/srilm/manpages/srilm-faq.7.html and 
look for "unknown" .

Andreas



More information about the SRILM-User mailing list