SRILM 1.4

Thu Mar 4 12:53:00 PST 2004

In message <OF2145503D.B96019FD-ON88256E4D.005803E1-88256E4D.005888D3 at mohomine.
com>you wrote:
> This is a multipart message in MIME format.
> --=_alternative 005888C888256E4D_=
> Content-Type: text/plain; charset="US-ASCII"
> 
> Ok... that seemed to be fine... they did perform similarly.  I just wanted 
> to make sure everything was ok.
> 
> If I wanted to change the backoff order of the LM... is there an easy way 
> to do this...?  I looked into the NgramLM.cc file... and it seems kind of 
> tricky... becuase I need to know how the trie is used...
> 
> ... is there some other code that I should be looking in?
> 
> In particular... if the ngram is: p(a|b,c,d) I would prefer the backoff to 
> be:
> p(a|b,c,d) => p(a|b,c)bo(b,c,d) // This is normal
>                     => p(a|c)bo(b,c)    // BO normal, p context is not...
>             => p(a)bo(c)                // This is normal...
> 
> Or, even better would be:
> p(a|b,c,d) => p(a|b,c)bo(b,c,d)                 // This is normal
>                     => p(a|b,c)bo(b,c) + p(a|c)bo(b,c)  // ... is 
> something like this possible?
>             => p(a)bo(c)                                // This is 
> normal...
> 
> I was also thinking that maybe I could write a script to output a counts 
> file given the text file that would somehow "trick" the LM to generate the 
> backoff order I'm interested in... is that an option?

This would be one solution.  Use ngram-counts -read 
and then ngram -counts.   Just reorder the words in the N-grams to reflect the 
backoff order you want.

Note that the factored LM stuff in the latest version (courtesy of Jeff Bilmes)
gives you complete flexibility in specifying the backoff order (and many other
things, such as parallel backoff paths and their combination).
Look in $SRILM/flm/doc for details.

--Andreas