[SRILM User List] Adding n-grams to an existing LM
Joris Pelemans
Joris.Pelemans at esat.kuleuven.be
Sat Nov 2 07:46:31 PDT 2013
On 11/02/13 02:07, Andreas Stolcke wrote:
> On 11/2/2013 8:00 AM, Joris Pelemans wrote:
>>
>> What do these errors mean, can I ignore them or is there a better way
>> to renormalize my new LMs?
>
> I think you should split the existing ngram probabilities among all
> the synonyms, when the synonym occurs in the final position of the
> ngram. That would not add anything to the sums of probabilities
> involved in the BOW computation.
>
> For example, if have p(c | a b) = x and d and c synonyms, you set
>
> p(c | a b ) = x/2
> p(d | a b) = x/2
Another question with regards to this problem. Say, I don't know a good
synonym for d, but I still want to include it by mapping it onto <unk>
(what else, right?), obviously by a very small fraction of the <unk>
probability, since it's a class. The above technique would lead to
gigantic LMs, since <unk> is all over the place. Is there a smart way in
the SRILM toolkit that lets you specify that some words should be
modeled as <unk>?
Regards,
Joris
More information about the SRILM-User
mailing list