[SRILM User List] Unigram Cache Model

Andreas Stolcke stolcke at icsi.berkeley.edu
Tue Sep 5 09:12:38 PDT 2017

On 9/4/2017 4:24 PM, Kalpesh Krishna wrote:
> Hi everyone,
> I'm trying to implement the KN5+cache model mentioned in Mikolov's PhD 
> Thesis, http://www.fit.vutbr.cz/~imikolov/rnnlm/thesis.pdf 
> <http://www.fit.vutbr.cz/%7Eimikolov/rnnlm/thesis.pdf> in Table 4.1. 
> By using the command "./ngram -lm LM -ppl ptb.test.txt -unk -order 5 
> -cache 192 -cache-lambda 0.1" I managed to achieve a ppl value of 
> 126.74 (I tuned `cache` and `cache-lambda`). What additional steps are 
> needed to exactly reproduce the result? (125.7)
> I generated my LM using "./ngram-count -lm LM -unk -kndiscount -order 
> 5 -text ptb.train.txt -interpolate -gt3min 1 -gt4min 1 -gt5min 1".
First off, does the ppl obtained with just the KN ngram model match?

About the cache LM, Tomas writes

> We also report the perplexity of the best n-gram model (KN5) when
> using unigram cache model (as implemented in the SRILM toolkit). We 
> have used several
> unigram cache models interpolated together, with different lengths of 
> the cache history
> (this works like a crude approximation of cache decay, ie. words 
> further in the history
> have lower weight). 
So he didn't just use a single cache LM as implemented by the ngram 
-cash option.   He must have used multiple versions of this model (with 
different parameter values), saved out the word-level probabilities, and 
interpolated them off-line.

You can run an individual cache LM and save out the probabilities using

             ngram -vocab VOCAB -null -cache 192 -cache-lambda 1 -ppl 
TEST -debug 2 > TEST.ppl

Repeat this several times with different -cache parameters, and also for 
the KN ngram.

Then use compute-best-mix on all the output files to determine the best 
mixture weights (of course you need to do this using a held-out set, not 
the actual test set).

Then you do the same for the test set, but use

     compute-best-mix lambda='....'  precision=1000 ppl-file ppl-file ...

where you provide the weights from the held-out set to the lambda= 
parameter.  (The precision parameter is such that it won't iterate.)  
This will give you the test-set perplexity.

Of course you still might have trouble getting the exact same results 
since Tomas didn't disclose the exact parameter values he used.   But 
since you're already within 1 perplexity point of his results I would 
question whether this matters.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.speech.sri.com/pipermail/srilm-user/attachments/20170905/6662efe6/attachment.html>

More information about the SRILM-User mailing list