<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix"><br>
This is as expected. You have two estimators (of conditional word
probabilities, i.e., LMs), each with random deviations from the
true probabilities. By averaging their predictions you reduce the
deviation from the truth (assuming the deviations are randomly
distributed).<br>
<br>
For this reason you can almost always get a win out of
interpolating models that are approximately on par in their
individual performance. Other examples are<br>
<br>
- random forest models<br>
- sets of neural LMs initialized with different initial random
weights<br>
- log-linear combination of forward and backward running LMs<br>
- sets of LMs trained on random samples from the same training set<br>
<br>
These techniques all reduce the "variance" part of the <a
moz-do-not-send="true"
href="https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FBias%25E2%2580%2593variance_tradeoff&data=01%7C01%7Csrilm-user%40speech.sri.com%7Cbf1d15c6a918412c407c08d5ecd93a1b%7C40779d3379c44626b8bf140c4d5e9075%7C1&sdata=QOmL3M1bKDrphq%2FbwLkFVSVkqCxvLFOg3ibgx5FaAHw%3D&reserved=0" originalSrc="https://en.wikipedia.org/wiki/Bias%E2%80%93variance_tradeoff" shash="vVUcJEJR7Q3AKFLYQX/JQM/f66+frap2xCKtDwkj01U7+jSv88rKMJIt5D4U7w++gyFzq3lMdvuoeLkz9thwa79RSUNyosD0QR0aLEl3cHhObCVFNfu7adEg43r9SuH7dk3GpB3UbuAvonZzuulpcAMjAX/whgqMCKv3xDyu25s=">modeling
error</a>. Other techniques (like interpolating models trained
on different genres) do a similar thing for the "bias" part of
the error.<br>
<br>
Andreas<br>
<br>
On 7/17/2018 9:22 PM, Fed Ang wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CANXzshR1azT8RCqGZ0L-0Nquxvpr2Lwm87cwT3Pv_xecm79nBg@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<div dir="ltr">
<div>Hi,</div>
<div><br>
</div>
I don't know if it has been asked before, but does it make sense
to interpolate on the basis of smoothing instead of
domain/genre? What should be the assumptions in considering
this when the resulting perplexity is lower than any of the two
separately?
<div><br>
</div>
<div>Let's say: 5-gram Katz yields 100, and 5-gram Modified KN
yields 90</div>
<div>Then best-mix of the two yields 87</div>
<div><br>
</div>
<div class="cye-lm-tag">On a theoretical perspective, is it
sound to simply trust that the interpolated LM is
better/generalizable to different smoothing combinations?</div>
<div><br>
</div>
<div>-Fred</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
SRILM-User site list
<a class="moz-txt-link-abbreviated" href="mailto:SRILM-User@speech.sri.com">SRILM-User@speech.sri.com</a>
<a class="moz-txt-link-freetext" href="http://mailman.speech.sri.com/cgi-bin/mailman/listinfo/srilm-user">http://mailman.speech.sri.com/cgi-bin/mailman/listinfo/srilm-user</a></pre>
</blockquote>
<p><br>
</p>
</body>
</html>