[SRILM User List] Right way to build LM
ismail.indonesia at gmail.com
Tue Apr 29 23:53:39 PDT 2014
Right, thanks Andreas.
It's getting clearer to me now.
On 04/30/2014 01:39 PM, Andreas Stolcke wrote:
> On 4/28/2014 7:38 PM, Ismail Rusli wrote:
>> Thanks for the answer, Andreas.
>> As i read paper by
>> Chen and Goodman (1999), they used held-out data
>> to optimize parameters in language model. How do i
>> do this in SRILM? Does SRILM optimize parameters
>> when i use -kndiscount?
> SRILM just uses the formulas for estimating the discounts from the
> count-of-counts, i.e., equations (26) in the Chen & Goodman technical
>> I tried -kn to save
>> parameters in a file and included this file
>> when building LM but it turned out
>> my perplexity is getting bigger.
> You can save the discounting parameters using:
> 1) ngram-count -read COUNTS -kndiscount -kn1 K1 -kn2 K2 -kn3 K3
> (no -lm argument!)
> Then you can read them back in for LM estimation using
> 2) ngram-count -read COUNTS -kndiscount -kn1 K1 -kn2 K2 -kn3 K3 -lm LM
> and the result will be identical to the second command when run
> without -kn1/2/3 options.
> Now, if you want you can manipulate the discounting parameters before
> invoking command 2.
> For example, you could perform a search over the D1, D2, D3 parameters
> optimizing perplexity on a held-out set, just like C&G did. But you
> have to implement that search yourself by writing some wrapper scripts.
> Also consider the interpolated version of KN smoothing. Just add the
> ngram-count -interpolate option, it usually gives slightly better results.
>> And just one more,
>> do you have a link to good tutorial in using
>> class-based models with SRILM?
> There is a basic tutorial at
> http://ssli.ee.washington.edu/ssli/people/sarahs/srilm.html .
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the SRILM-User