interpolation and SRISLM

Thu Nov 15 08:36:28 PST 2001

In message <Pine.GSO.4.02.10111151503410.22301-100000 at burma.enst.fr>you wrote:
> Dear Mr Stolcke,
> 
> I'm trying to use SRISLM in order to build ARPA n-gram language models. Up
> to know it seems to work except that when I try to evaluate a model with
> the CMUSLM toolkit, I've got an error message (due to the use of unk
> instead ok UNK and also to the fact that there's no backoff weight for the
> </s> symbol). I also wonder whether it is possible to get interpolation
> weights using SRISLM. Is there an option to use in order to get these
> weights (the lambdas)? Thanks in advance.
> 
> Best regards,
> 
>         jeanphi

Dear jean-philippe,

I'm sorry you encountered problems using SRILM model with the CMU toolkit,
but they are easy to fix.    The case of "unk" you can just edit by hand or
with a simple text filter.   The "missing" backoff weights on unigrams
are actually a feature, because backoff weights should only be 
needed on unigrams that are a prefix to a longer ngram.  However, because
this is a common problem there is a script that adds "dummy" backoff weights.
The script should be in $SRILM/bin/$MACHINE_TYPE/add-dummy-bows and 
documented in the "lm-scripts" manual page".

As for the interpolation weights:  SRILM currently only supports interpolation
of LMs are the model-level, so there is a fixed lambda for each model 
that you are interpolating.    Given a held-out training set, you can 
estimate these model-lambdas to minimize the perplexity of the data.
This is done by $SRILM/bin/$MACHINE_TYPE/compute-best-mix .
It is described in the "ppl-scripts" manual page.

Hope this helps,

--Andreas