[SRILM User List] Unigram Cache Model
Andreas Stolcke
stolcke at icsi.berkeley.edu
Tue Sep 5 09:12:38 PDT 2017
On 9/4/2017 4:24 PM, Kalpesh Krishna wrote:
> Hi everyone,
> I'm trying to implement the KN5+cache model mentioned in Mikolov's PhD
> Thesis, http://www.fit.vutbr.cz/~imikolov/rnnlm/thesis.pdf
> <http://www.fit.vutbr.cz/%7Eimikolov/rnnlm/thesis.pdf> in Table 4.1.
> By using the command "./ngram -lm LM -ppl ptb.test.txt -unk -order 5
> -cache 192 -cache-lambda 0.1" I managed to achieve a ppl value of
> 126.74 (I tuned `cache` and `cache-lambda`). What additional steps are
> needed to exactly reproduce the result? (125.7)
> I generated my LM using "./ngram-count -lm LM -unk -kndiscount -order
> 5 -text ptb.train.txt -interpolate -gt3min 1 -gt4min 1 -gt5min 1".
>
First off, does the ppl obtained with just the KN ngram model match?
About the cache LM, Tomas writes
> We also report the perplexity of the best n-gram model (KN5) when
> using unigram cache model (as implemented in the SRILM toolkit). We
> have used several
> unigram cache models interpolated together, with different lengths of
> the cache history
> (this works like a crude approximation of cache decay, ie. words
> further in the history
> have lower weight).
So he didn't just use a single cache LM as implemented by the ngram
-cash option. He must have used multiple versions of this model (with
different parameter values), saved out the word-level probabilities, and
interpolated them off-line.
You can run an individual cache LM and save out the probabilities using
ngram -vocab VOCAB -null -cache 192 -cache-lambda 1 -ppl
TEST -debug 2 > TEST.ppl
Repeat this several times with different -cache parameters, and also for
the KN ngram.
Then use compute-best-mix on all the output files to determine the best
mixture weights (of course you need to do this using a held-out set, not
the actual test set).
Then you do the same for the test set, but use
compute-best-mix lambda='....' precision=1000 ppl-file ppl-file ...
where you provide the weights from the held-out set to the lambda=
parameter. (The precision parameter is such that it won't iterate.)
This will give you the test-set perplexity.
Of course you still might have trouble getting the exact same results
since Tomas didn't disclose the exact parameter values he used. But
since you're already within 1 perplexity point of his results I would
question whether this matters.
Andreas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.speech.sri.com/pipermail/srilm-user/attachments/20170905/6662efe6/attachment.html>
More information about the SRILM-User
mailing list