[SRILM User List] Unigram Cache Model

Tue Sep 5 11:18:39 PDT 2017

Hi Andreas,

> First off, does the ppl obtained with just the KN ngram model match?
Yes, I could exactly reproduce the 3-gram and 5-gram KN ppl numbers. I had
to use the -interpolate and -gtXmin 1 flags to replicate the results though.

> Of course you still might have trouble getting the exact same results
since Tomas didn't disclose the exact parameter values he used.
Thanks a lot for the method! I was suspecting something was missing.

Best Regards,
Kalpesh

‌

On Tue, Sep 5, 2017 at 9:42 PM, Andreas Stolcke <stolcke at icsi.berkeley.edu>
wrote:

> On 9/4/2017 4:24 PM, Kalpesh Krishna wrote:
>
> Hi everyone,
> I'm trying to implement the KN5+cache model mentioned in Mikolov's PhD
> Thesis, http://www.fit.vutbr.cz/~imikolov/rnnlm/thesis.pdf in Table 4.1.
> By using the command "./ngram -lm LM -ppl ptb.test.txt -unk -order 5 -cache
> 192 -cache-lambda 0.1" I managed to achieve a ppl value of 126.74 (I tuned
> `cache` and `cache-lambda`). What additional steps are needed to exactly
> reproduce the result? (125.7)
> I generated my LM using "./ngram-count -lm LM -unk -kndiscount -order 5
> -text ptb.train.txt -interpolate -gt3min 1 -gt4min 1 -gt5min 1".
>
> First off, does the ppl obtained with just the KN ngram model match?
>
> About the cache LM, Tomas writes
>
> We also report the perplexity of the best n-gram model (KN5) when
> using unigram cache model (as implemented in the SRILM toolkit). We have
> used several
> unigram cache models interpolated together, with different lengths of the
> cache history
> (this works like a crude approximation of cache decay, ie. words further
> in the history
> have lower weight).
>
> So he didn't just use a single cache LM as implemented by the ngram  -cash
> option.   He must have used multiple versions of this model (with different
> parameter values), saved out the word-level probabilities, and interpolated
> them off-line.
>
> You can run an individual cache LM and save out the probabilities using
>
>             ngram -vocab VOCAB -null -cache 192 -cache-lambda 1 -ppl TEST
> -debug 2 > TEST.ppl
>
> Repeat this several times with different -cache parameters, and also for
> the KN ngram.
>
> Then use compute-best-mix on all the output files to determine the best
> mixture weights (of course you need to do this using a held-out set, not
> the actual test set).
>
> Then you do the same for the test set, but use
>
>     compute-best-mix lambda='....'  precision=1000 ppl-file ppl-file ...
>
> where you provide the weights from the held-out set to the lambda=
> parameter.  (The precision parameter is such that it won't iterate.)  This
> will give you the test-set perplexity.
>
> Of course you still might have trouble getting the exact same results
> since Tomas didn't disclose the exact parameter values he used.   But since
> you're already within 1 perplexity point of his results I would question
> whether this matters.
>
> Andreas
>
>
>
>

-- 
Kalpesh Krishna,
Junior Undergraduate,
Electrical Engineering,
IIT Bombay
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.speech.sri.com/pipermail/srilm-user/attachments/20170905/05125928/attachment.html>