[SRILM User List] About the -prune option
Andreas Stolcke
stolcke at icsi.berkeley.edu
Mon Oct 29 12:44:11 PDT 2012
On 10/29/2012 3:09 AM, Meng Chen wrote:
> Hi, I need to obtain a small LM for ASR decoding by pruning from a
> large LM. The original large LM contains about 1.6 billion n-grams,
> and the small one should contains about 30 million n-grams. The -prune
> option in SRILM could do this. However, I want to ask if it's the same
> by pruning in one time and in serveral times. For example, there are
> two approaches to finish this pruning task.
>
> 1) Set a proper value and prune only one time to get the targe LM:
> ngram -lm LM_Large -prune 1e-9 -order 5 -write-lm LM_Small
>
> 2) Set several proper values to prune gradually to get the targe LM:
> ngram -lm LM_Large -prune 1e-10 -order 5 -write-lm LM_Small1
> ... ...
> ngram -lm LM_Small1 -prune 1e-9 -order 5 -write-lm LM_Small
>
> Are there any differences between above two approaches? Does the
> pruned LM have a lower perplexity by the second method?
Pruning tries to minimize the cross-entropy between the original and the
pruned model. Therefore, you are expected to get best results if you do
the pruning in one step (approach 1) since then you have to original
model to compare to for all pruning decisions (at the ngram level). I
have not investigate how much worse Approach 2 would do, so it might be
just fine in practice.
Andreas
More information about the SRILM-User
mailing list