[SRILM User List] Adding n-grams to an existing LM

Sat Nov 2 18:32:10 PDT 2013

On 11/2/2013 6:16 AM, Joris Pelemans wrote:
> On 11/02/13 02:07, Andreas Stolcke wrote:
>> On 11/2/2013 8:00 AM, Joris Pelemans wrote:
>>> but I get a lot of errors of the type "BOW numerator for context is 
>>> ... < 0" and "BOW denominator for context is ... <= 0.
>>
>> The BOW for a given context is is computed as 1 - sum of all 
>> higher-order probabilities (in a given context), divided by 1 - sum 
>> of all backoff probabilities for those same ngrams.  So, if you're 
>> adding ngrams to a context, those sums can exceed 1, and you end up 
>> with negative numerators and/or denominators.
> I can see how that happens for the numerators, but aren't the backoff 
> weights recomputed and thus this not prevent the denominators from 
> ending up negative? What if I remove all the backoff weights and then 
> renormalize? I'm just asking out of interest, I got rid of all the 
> denominator complaints (see below).

The same reasoning applies to the denominator, since it obtained by 
summing over ngram one order less.  If you're adding trigrams and 
bigrams, say, then the denominator for bigram BOWs will be affected by 
the added bigrams.

>>> What do these errors mean, can I ignore them or is there a better 
>>> way to renormalize my new LMs?
>>
>> I think you should split the existing ngram probabilities among all 
>> the synonyms, when the synonym occurs in the final position of the 
>> ngram.  That would not add anything to the sums of probabilities 
>> involved in the BOW computation.
> That did take care of most of the errors. Only a handful of numerator 
> complaints left, but I guess that might be due to bad scripting on my 
> behalf. I find it strange though that the complaints I get, concern 
> n-grams that aren't in the LM at all. The following is the first 
> complaint that I get:
>
> BOW numerator for context "negentig Hills" is -0.0120325 < 0
>
> But if I grep the LM (before and after renormalization) for "negentig 
> Hills" it gives me nothing? If there are no 3-grams with this context, 
> how can 1 - (sum of all higher-order probabilities with this context) 
> be negative?

The ngrams in these messages are printed in reverse order.  That's 
because the contexts are stored in a trie that's indexed 
most-recent-word-first.

Andreas

>
>> For example, if have p(c | a b) = x  and d and c synonyms, you set
>>
>> p(c | a b ) = x/2
>> p(d | a b) = x/2
> OK, that makes sense. And just to be complete (in case others might 
> want to know), if I want to map d onto c with a certainty of say 0.1, 
> then I just do:
>
> p(c | a b ) = 0.9*x
> p(d | a b) = 0.1*x
>
>> If, however, the synonyms occur in the context portion of the ngram, 
>> you can just copy the parameter (as you have been doing).
>>
>> p( e | a c) = p(e | a d)
>
> And this stays the same for the 0.1 example/
>
> Thanks already!
>
> Joris