interpolation and SRISLM
Andreas Stolcke
stolcke at speech.sri.com
Thu Nov 15 08:36:28 PST 2001
In message <Pine.GSO.4.02.10111151503410.22301-100000 at burma.enst.fr>you wrote:
> Dear Mr Stolcke,
>
> I'm trying to use SRISLM in order to build ARPA n-gram language models. Up
> to know it seems to work except that when I try to evaluate a model with
> the CMUSLM toolkit, I've got an error message (due to the use of unk
> instead ok UNK and also to the fact that there's no backoff weight for the
> </s> symbol). I also wonder whether it is possible to get interpolation
> weights using SRISLM. Is there an option to use in order to get these
> weights (the lambdas)? Thanks in advance.
>
> Best regards,
>
> jeanphi
Dear jean-philippe,
I'm sorry you encountered problems using SRILM model with the CMU toolkit,
but they are easy to fix. The case of "unk" you can just edit by hand or
with a simple text filter. The "missing" backoff weights on unigrams
are actually a feature, because backoff weights should only be
needed on unigrams that are a prefix to a longer ngram. However, because
this is a common problem there is a script that adds "dummy" backoff weights.
The script should be in $SRILM/bin/$MACHINE_TYPE/add-dummy-bows and
documented in the "lm-scripts" manual page".
As for the interpolation weights: SRILM currently only supports interpolation
of LMs are the model-level, so there is a fixed lambda for each model
that you are interpolating. Given a held-out training set, you can
estimate these model-lambdas to minimize the perplexity of the data.
This is done by $SRILM/bin/$MACHINE_TYPE/compute-best-mix .
It is described in the "ppl-scripts" manual page.
Hope this helps,
--Andreas
More information about the SRILM-User
mailing list