[SRILM User List] A confusion of the interpolated language model
Yannick Estève
yannick.esteve at lium.univ-lemans.fr
Thu Aug 27 01:19:44 PDT 2009
Hi,
Back-off weights are not probabilities: they can be greater than 1.
So, your values are normal. You can have some explanations about back-
off weight computation here, particularly for the use of the modified
Kneser-Ney discounting method:
http://www.speech.sri.com/projects/srilm/manpages/pdfs/chen-goodman-tr-10-98.pdf
Regards,
Yannick Estève
LIUM - University of Le Mans
France
Le 27 août 09 à 09:21, 海龙 史 a écrit :
>
>
>
>
> I am a new student user of srilm from Asia.Here I used the command
> below to construct a interpolated mod-kn discount language model:
> ~ ngram-count -read merge_counts_1994-2003.gz -gt1min 0 -gt2min 0 -
> gt3min 2 -kndiscount -interpolate -order 3 -vocab ChWord.lexno -lm
> 1994-2003_lm_all_pruned.lm
>
> However in my model several N-grams' back-off werght(bow) appears
> to be greater than 1.That is ,in the text LM file,I've got a line:
> -6.457229 <s> 1635 0.1270406
> (Here we just use a kind of index to represent a chinese word)
> in whitch the 1og10(bow) is greater than 0.We don't think a normal
> interplotate discount method can produce an N-gram bow greater than
> 1,besides this circumstance only occured to several(less than 5)
> different N-grams.So I am confused and would like to ask if there is
> someyone who encounterd this circumstance or happens to know what is
> wrong.
> Thank you very much!
>
> 史海龙
> Hailoon Shi
> w63,EE Dpt.Thu Univ.PRC
>
>
>
>
>
> __________________________________________________
> 赶快注册雅虎超大容量免费邮箱?
> http://
> cn.mail.yahoo.com_______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://www.speech.sri.com/mailman/listinfo/srilm-user
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20090827/11b93d43/attachment.html>
More information about the SRILM-User
mailing list