deleted estimation using SRILM

Andreas Stolcke stolcke at speech.sri.com
Thu Dec 7 14:08:01 PST 2006



In message <Pine.LNX.4.62L.0612061719580.13078 at athena.lcs.mit.edu>you wrote:
> Quick question.
> Is there a way to get the deleted-interpolation LM in arpa or fstn format?
> 
> thanks,
> -Ghinwa

No, because a deleted-interpolation LM cannot exactly be represented as a 
backoff LM in general (short of listing all ngrams).
What you can do, however, is define a set of ngrams and then create 
a backoff LM whose probabilities match exactly those of the
deleted-interpolation LM for those ngrams (and use backoff for all others).
This way, most SRILM LM classes can be approximated by backoff LMs.

To do this use the ngram -rescore-ngram option (see man page).

	ngram -rescore-ngram BACKOFF-LM \
		OTHER-LM-OPTIONS \
		-write-lm NEW-BACKOFF-LM

where OTHER-LM-OPTIONS specifies the LM from which the new probabilities 
are taken.  By chosing the set of ngrams in BACKOFF-LM large or small
you control the goodness of the approximation.

Andreas 

> 
> On Wed, 6 Dec 2006, Andreas Stolcke wrote:
> 
> >
> > In message <Pine.LNX.4.62L.0612061422120.13078 at athena.lcs.mit.edu>you wrote
> :
> >> Hello Andreas,
> >>
> >> I have the latest SRILM toolkit version and I am trying to implement
> >> deleted interpolation using ngram/ngram-count but I cannot seem to get it
> >> to work. Would it be possible to get a sample of how the command(s) would
> >> look like?
> >
> > The latest version of SRILM implements deleted interpolation as part
> > of the "count-LM" LM class.  Look up the -count-lm option in both the
> > ngram-count and the ngram man pages.
> > Then look at $SRILM/test/tests/ngram-count-lm/run-test for an example
> > of how it all fits together.
> >
> > Deleted interpolation is not typically as good as other schemes such
> > as modified Kneser Ney smoothing, but has some practical advantages
> > (efficient memory implementation) when applied to very large count sets.
> >
> > Andreas
> >
> >




More information about the SRILM-User mailing list