<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <div class="moz-cite-prefix">On 9/4/2017 4:24 PM, Kalpesh Krishna

      wrote:<br>

    </div>

    <blockquote type="cite"

cite="mid:CAGo=34Xf6bTQ8Wd3bOXJJViJ+oUNiPA0QgfUiHesnk5nyQ8k-g@mail.gmail.com">

      <div dir="ltr">Hi everyone,

        <div>I'm trying to implement the KN5+cache model mentioned in

          Mikolov's PhD Thesis, <a

            href="http://www.fit.vutbr.cz/%7Eimikolov/rnnlm/thesis.pdf"

            moz-do-not-send="true">http://www.fit.vutbr.cz/~imikolov/rnnlm/thesis.pdf</a> in

          Table 4.1. By using the command "./ngram -lm LM -ppl

          ptb.test.txt -unk -order 5 -cache 192 -cache-lambda 0.1" I

          managed to achieve a ppl value of 126.74 (I tuned `cache` and

          `cache-lambda`). What additional steps are needed to exactly

          reproduce the result? (125.7)</div>

        <div>I generated my LM using "./ngram-count -lm LM -unk

          -kndiscount -order 5 -text ptb.train.txt -interpolate -gt3min

          1 -gt4min 1 -gt5min 1".</div>

        <div><br>

        </div>

      </div>

    </blockquote>

    First off, does the ppl obtained with just the KN ngram model match?<br>

    <br>

    About the cache LM, Tomas writes <br>

    <br>

    <blockquote type="cite">We also report the perplexity of the best

      n-gram model (KN5) when

      <div style="left: 141.732px; top: 214.107px; font-size: 18.1818px;

        font-family: sans-serif; transform: scaleX(0.978068);">using

        unigram cache model (as implemented in the SRILM toolkit). We

        have used several</div>

      <div style="left: 141.732px; top: 247.98px; font-size: 18.1818px;

        font-family: sans-serif; transform: scaleX(1.01618);">unigram

        cache models interpolated together, with different lengths of

        the cache history</div>

      <div style="left: 141.732px; top: 281.854px; font-size: 18.1818px;

        font-family: sans-serif; transform: scaleX(1.02982);">(this

        works like a crude approximation of cache decay, ie. words

        further in the history</div>

      have lower weight). </blockquote>

    So he didn't just use a single cache LM as implemented by the ngram 

    -cash option.   He must have used multiple versions of this model

    (with different parameter values), saved out the word-level

    probabilities, and interpolated them off-line.   <br>

    <br>

    You can run an individual cache LM and save out the probabilities

    using <br>

    <br>

                ngram -vocab VOCAB -null -cache 192 -cache-lambda 1 -ppl

    TEST -debug 2 > TEST.ppl<br>

    <br>

    Repeat this several times with different -cache parameters, and also

    for the KN ngram.<br>

    <br>

    Then use compute-best-mix on all the output files to determine the

    best mixture weights (of course you need to do this using a held-out

    set, not the actual test set).<br>

    <br>

    Then you do the same for the test set, but use<br>

          <br>

        compute-best-mix lambda='....'  precision=1000 ppl-file ppl-file

    ...<br>

    <br>

    where you provide the weights from the held-out set to the lambda=

    parameter.  (The precision parameter is such that it won't

    iterate.)  This will give you the test-set perplexity.<br>

    <br>

    Of course you still might have trouble getting the exact same

    results since Tomas didn't disclose the exact parameter values he

    used.   But since you're already within 1 perplexity point of his

    results I would question whether this matters.<br>

    <br>

    Andreas<br>

    <br>

    <br>

    <br>

  </body>

</html>