[SRILM User List] Configuration for best language models

Gwénolé Lecorvé gwenole.lecorve at gmail.com
Wed Aug 10 02:23:30 PDT 2011


Luis,

I wouldn't say there is one absolute good recipe to build a language model
(though there are some good practices).
Regarding the smoothing, many papers have been studying different techniques
and have highlight their respective strenghs and weakness. Especially, it
has, for instance, recently been shown that KN smoothing does not well
behave along with strong entropy-based pruning (even if you don't seem to
use it).
As for the other parameters, this may depend on your target task.

Thus, I just would say :
- read papers about smoothing techniques, eg:
[1] Chen, S. F. & Goodman, J. An Empirical Study of Smoothing Techniques for
Language Modeling Harvard University, 1998
[2] Chelba, C.; Brants, T.; Neveitt, W. & Xu, P. Study on Interaction
Between Entropy Pruning and Kneser-Ney Smoothing Proc. of Interspeech, 2010,
2422-2425
- and compare the effect of different parameters/options (see the manual) in
terms of perplexity or what ever measure you're seeking to minimize in the
end. Especially, try to toggle on/off -interpolate, -gtnmin N (cutoff) as
well as pruning options.

Best regards,
Gwenole.

2011/8/9 Luis Uebel <lfu20 at hotmail.com>

>  I am producing some language models (3-grams) for HTK.
> What is the best configuration for produce the best language models using
> SRILM?
> My configuration is:
> $SRILM/ngram-count -memuse -order ${trigram} -interpolate -kndiscount -unk
> -vocab $wordlist -limit-vocab -text ${training} -lm ${train}-lm
> ${trigram}
>
>
> The script line is above and I am using -kndiscount
> Is there a better type of discount or parameters to produce better language
> models using SRILM?
>
> Number of words (unique): 38k
> Size: 93Mbytes
> Number of lines: 550656
> Number of words (total): 17166049 (17M)
>
> Thanks.
>
>
> Luis
>
>
>
> _______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://www.speech.sri.com/mailman/listinfo/srilm-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20110810/09c42eca/attachment.html>


More information about the SRILM-User mailing list