Please advise

Andreas Stolcke stolcke at
Thu Sep 25 12:57:36 PDT 2008

Nisha Yadav wrote:
> Hi,
> I am a new user of srilm toolkit and have been using the same to 
> generate some language model. I will be grateful to have your advice 
> regarding the following.
> 1) While assigning backoff probabilities <s> is assigned a very small 
> probability i.e. 1E-99 but </s> is assigned a non-zero probability 
> 0.181. That is to say in the output lm file I can see the following 
> entries for <s> and </s>
> -0.7421436    </s>   
> -99                <s>    -0.3938685
> Can you please explain why is srilm doing this?
that's because an LM never needs to predict the beginning-of-sentence 
token, only the end-of-sentence.  The -99 is just a dummy entry to 
satisfy the LM format.
> 2) For perplexity calculation, ppl command outputs 2 values ppl and 
> ppl1. Which of these these two is to be taken into account to compare 
> the model performance generated by 2-order, 3-order...ngrams and so on?
Please use the FAQ first for questions about SRILM.  You will find the 
answer in .
When you cannot find the answer send email to srilm-user at 
(you need to join the mailing list first).
> 3) How much significance can be attached to these values when the 
> difference between them is relatively small or lies in the first digit 
> after decimal. That is to say if the perplexity value (ppl) for the 
> language models for 1-gram, 2-gram, 3-gram etc. are
> for n = 1, 68.17368,
> for n = 2, 26.52578,
> for n = 3, 26.61326,
> for n = 4, 25.89838,
> for n = 5, 25.89838,
> can we say that the model performance is better with n = 4 in 
> comparison to n = 3 and 2 based on these values? Please note that the 
> size of our corpus is not very large, approximately 8000 tokens. 
> Thanks in advance,
It looks like n=4  is better but obviously not by much.  Whether 
difference matters depends on your application (like MT, ASR, etc.).


More information about the SRILM-User mailing list