[SRILM User List] ARPA LM with only higher order grams?

Wed Dec 29 14:29:10 PST 2010

Amr Desoky wrote:
> Hi,
>   I am asking is it possible to have an ARPA LM storing only 3-gram 
> log probabilities?
>   Assuming that in my application (in which I will use the LM), I will 
> only require the probability of these specific 3-grams.
>   example of the LM:
>
> \data\
> ngram 1=0
> ngram 2=0
> ngram 3=3
>
> \1-grams:
>
> \2-grams:
>
> \3-grams:
> <logprob> <w1 w2 w3>
> <logprob> <w4 w5 w6>
> <logprob> <w7 w8 w9>
>
> \end\
>
>
> To say in other words: if I got some method to estimate the 
> probability of some 3-grams needed for 3-gram lattice rescoring for 
> ASR, is it possible to insert the probabilities of these 3-grams in a 
> normal ARPA backoff LM? I did so, but when I tried to normalize the 
> new LM (after adding the new 3-grams), I got the following warinings 
> and the new grams are filtered out!
>
> warning: no bow for prefix of ngram "w1 w2 w3"
> .........(lots of the above warinig)
This is a sanity check of the backoff format.  For each ngram w1 w2 w3 
it is checked the the history "w1 w2" has a corresponding backoff weight.

> BOW numerator for context "w4 w5" is -0.535204 < 0
> .........(lots of the above warinig)
>
> could you tell me why this is happening? since if some 3-gram 
> probability is there I will not need to backoff and I will not need to 
> use the lower order grams to get the probability of this specific 
> 3-gram...yes?
>
> What if I did not normalize the new LM will it be a correct LM or you 
> see some bug, is there some other way to validate the correctness of 
> this LM?
As long as you don't renormalize the LM, AND you only use the trigram 
probabilities, AND you insert dummy unigrams and bigrams (to satisfy the 
above sanity check) with arbitrary log probabilities and backoff weights 
(make them 0) you can use the model in the standard way.

Andreas