stolcke at speech.sri.com
Thu Sep 25 12:57:36 PDT 2008
Nisha Yadav wrote:
> I am a new user of srilm toolkit and have been using the same to
> generate some language model. I will be grateful to have your advice
> regarding the following.
> 1) While assigning backoff probabilities <s> is assigned a very small
> probability i.e. 1E-99 but </s> is assigned a non-zero probability
> 0.181. That is to say in the output lm file I can see the following
> entries for <s> and </s>
> -0.7421436 </s>
> -99 <s> -0.3938685
> Can you please explain why is srilm doing this?
that's because an LM never needs to predict the beginning-of-sentence
token, only the end-of-sentence. The -99 is just a dummy entry to
satisfy the LM format.
> 2) For perplexity calculation, ppl command outputs 2 values ppl and
> ppl1. Which of these these two is to be taken into account to compare
> the model performance generated by 2-order, 3-order...ngrams and so on?
Please use the FAQ first for questions about SRILM. You will find the
When you cannot find the answer send email to srilm-user at speech.sri.com
(you need to join the mailing list first).
> 3) How much significance can be attached to these values when the
> difference between them is relatively small or lies in the first digit
> after decimal. That is to say if the perplexity value (ppl) for the
> language models for 1-gram, 2-gram, 3-gram etc. are
> for n = 1, 68.17368,
> for n = 2, 26.52578,
> for n = 3, 26.61326,
> for n = 4, 25.89838,
> for n = 5, 25.89838,
> can we say that the model performance is better with n = 4 in
> comparison to n = 3 and 2 based on these values? Please note that the
> size of our corpus is not very large, approximately 8000 tokens.
> Thanks in advance,
It looks like n=4 is better but obviously not by much. Whether
difference matters depends on your application (like MT, ASR, etc.).
More information about the SRILM-User