[SRILM User List] Interpolating LMs with different smoothing

Wed Jul 18 23:17:12 PDT 2018

Thank you for your insights.

-Fred

On Thu, Jul 19, 2018 at 6:29 AM, Anand Venkataraman <
venkataraman.anand at gmail.com> wrote:

> Cool - Central Limit Theorem in action :-)
>
> &
>
> On Wed, Jul 18, 2018 at 11:06 AM, Andreas Stolcke <
> stolcke at icsi.berkeley.edu> wrote:
>
>>
>> This is as expected.  You have two estimators (of conditional word
>> probabilities, i.e., LMs), each with random deviations from the true
>> probabilities.  By averaging their predictions you reduce the deviation
>> from the truth (assuming the deviations are randomly distributed).
>>
>> For this reason you can almost always get a win out of interpolating
>> models that are approximately on par in their individual performance.
>> Other examples are
>>
>> - random forest models
>> - sets of neural LMs initialized with different initial random weights
>> - log-linear combination of forward and backward running LMs
>> - sets of LMs trained on random samples from the same training set
>>
>> These techniques all reduce the "variance" part of the modeling error
>> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FBias%25E2%2580%2593variance_tradeoff&data=01%7C01%7Csrilm-user%40speech.sri.com%7Cbf1d15c6a918412c407c08d5ecd93a1b%7C40779d3379c44626b8bf140c4d5e9075%7C1&sdata=QOmL3M1bKDrphq%2FbwLkFVSVkqCxvLFOg3ibgx5FaAHw%3D&reserved=0>.
>> Other techniques (like interpolating models trained on different genres) do
>> a similar thing for the "bias"  part of the error.
>>
>> Andreas
>>
>> On 7/17/2018 9:22 PM, Fed Ang wrote:
>>
>> Hi,
>>
>> I don't know if it has been asked before, but does it make sense to
>> interpolate on the basis of smoothing instead of domain/genre?  What should
>> be the assumptions in considering this when the resulting perplexity is
>> lower than any of the two separately?
>>
>> Let's say: 5-gram Katz yields 100, and 5-gram Modified KN yields 90
>> Then best-mix of the two yields 87
>>
>> On a theoretical perspective, is it sound to simply trust that the
>> interpolated LM is better/generalizable to different smoothing combinations?
>>
>> -Fred
>>
>>
>> _______________________________________________
>> SRILM-User site listSRILM-User at speech.sri.comhttp://mailman.speech.sri.com/cgi-bin/mailman/listinfo/srilm-user
>>
>>
>>
>> _______________________________________________
>> SRILM-User site list
>> SRILM-User at speech.sri.com
>> http://mailman.speech.sri.com/cgi-bin/mailman/listinfo/srilm-user
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.speech.sri.com/pipermail/srilm-user/attachments/20180719/521e7c67/attachment.html>