[SRILM User List] Adding n-grams to an existing LM
    Andreas Stolcke 
    stolcke at icsi.berkeley.edu
       
    Sat Nov  2 18:35:07 PDT 2013
    
    
  
On 11/2/2013 7:46 AM, Joris Pelemans wrote:
> On 11/02/13 02:07, Andreas Stolcke wrote:
>>
>> For example, if have p(c | a b) = x  and d and c synonyms, you set
>>
>> p(c | a b ) = x/2
>> p(d | a b) = x/2
>
> Another question with regards to this problem. Say, I don't know a 
> good synonym for d, but I still want to include it by mapping it onto 
> <unk> (what else, right?), obviously by a very small fraction of the 
> <unk> probability, since it's a class. The above technique would lead 
> to gigantic LMs, since <unk> is all over the place. Is there a smart 
> way in the SRILM toolkit that lets you specify that some words should 
> be modeled as <unk>?
I'm not sure I understand what you mean.  <unk>  is a special word that 
all words not in the vocabulary are mapped to at test time.  So the way 
you 'model'  a word by <unk> is to not include it in the vocabulary of 
your LM.
Andreas
    
    
More information about the SRILM-User
mailing list