[SRILM User List] Question about select-vocab

Meng Chen chenmengdx at gmail.com
Wed Sep 5 03:06:51 PDT 2012


Hi, I am using the *select-vocab* command to choose vocabulary from corpus
A and B in a Chinese speech recognition task, the command is as follows:
*select-vocab -heldout dev A B > vocab_with_weight*
Then I saw the prompts below:
*Iter 0: lambdas = (0.5 0.5)*
*Iter 1: lambdas = (0.443075 0.556925) log P(held-out) = -374805.0047 PPL =
6937.8495*
*Iter 2: lambdas = (0.399799 0.600201) log P(held-out) = -374319.5890 PPL =
6858.8301*
*Iter 3: lambdas = (0.366822 0.633178) log P(held-out) = -374032.9165 PPL =
6812.5869*
*Iter 4: lambdas = (0.341533 0.658467) log P(held-out) = -373860.8231 PPL =
6784.9764*
I want to ask what's the meaning of PPL. Does the command train a LM with
corpus A and B first, then calculate the PPL of heldout data with the LM?
If corpus A and B are 10GB each, how much the heldout data should be at
least in order to choose a reasonable vocabulary?

Thanks!
Meng CHEN
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20120905/8a146b61/attachment.html>


More information about the SRILM-User mailing list