Help
Andreas Stolcke
stolcke at speech.sri.com
Wed Jul 24 09:01:30 PDT 2002
Dimitris,
your email to the list did not go through because at the time you
sent it you were not subscribed to the list (to prevent spam we
only allow list members to post).
Regarding your question: indeed the perplexity of the mixed LM should
be much closer to what compute-best-mix outputs.
There are two ways to create an interpolated model:
"on-the-fly" this is the traditional approach: you keep the component
models separate, and compute the interpolated probabilties
when you evaluate the model
The command for this is
ngram -lm ... -mix-lm ... -lambda L -bayes 0
"merged" you create a single static model that implements an
approximation to the on-the-fly method
The command for this is
ngram -lm ... -mix-lm ... -lambda L
(no -bayes option).
The -write-lm option outputs the merged model if desired.
In the "merged" case you only get an approximation because in general
it is not possible to create a single back-off model that exactly
implements the mixed probabilties of the two component models (without
expanding out all possible N-grams and effectively bypassing the
backoff mechanism).
As explained in the ICSLP paper, the "merged" approach is usually slightly
better than the traditional interpolation. However, it only works if
you have two models of the same type (both word-based or both class-based).
When you merge a word-based and a class-based model the approximation
doesn't work anymore. I suspect that's what you did in your experiment.
Rerun ngram with the -bayes 0 option and see if you get the perplexity
you expect.
--Andreas
In message <002101c2332c$9d6cb540$7b081b93 at telecom.tuc.gr>you wrote:
> Hi Andreas,
>
> Before five days I sent an e-mail at srilm-user at speech.sri.com
> and I still haven't receive an answer.
> I repeat it here. Please inform me...
>
> Hi
>
> I interpolate a 3-gram with a class 3-gram
>
> The output of compute-best-mix is:
> compute-best-mix debug2-LM1 debug2-LM2
> iteration 19, lambda = (0.849536 0.150464), ppl = 150.787
>
> The PP of the interpolated model at the held-out data I used to take
> debug2-LM1 and debug2-LM2 is 169.52
>
> This ain't to be the same with the output of compute-best-mix, eg 150.787?
> Do I something wrong?
>
> Regards,
> Dimitris
>
>
More information about the SRILM-User
mailing list