[SRILM User List] Adding n-grams to an existing LM
Andreas Stolcke
stolcke at icsi.berkeley.edu
Sat Nov 2 18:35:07 PDT 2013
On 11/2/2013 7:46 AM, Joris Pelemans wrote:
> On 11/02/13 02:07, Andreas Stolcke wrote:
>>
>> For example, if have p(c | a b) = x and d and c synonyms, you set
>>
>> p(c | a b ) = x/2
>> p(d | a b) = x/2
>
> Another question with regards to this problem. Say, I don't know a
> good synonym for d, but I still want to include it by mapping it onto
> <unk> (what else, right?), obviously by a very small fraction of the
> <unk> probability, since it's a class. The above technique would lead
> to gigantic LMs, since <unk> is all over the place. Is there a smart
> way in the SRILM toolkit that lets you specify that some words should
> be modeled as <unk>?
I'm not sure I understand what you mean. <unk> is a special word that
all words not in the vocabulary are mapped to at test time. So the way
you 'model' a word by <unk> is to not include it in the vocabulary of
your LM.
Andreas
More information about the SRILM-User
mailing list