[SRILM User List] WBDiscount backoff weights
shinichiro.hamada
shinichiro.hamada at gmail.com
Tue Aug 7 10:32:43 PDT 2012
Hi.
I did a small test described as below to understand SRILM behavior
of WBDiscount backoff weights (bow), and got a question.
The values of bows of "<s> context", "context word1", "context
word2" (2grams) are set to zero. Why?
They are the prefix of "<s> context word1" (or "<s> context word2"),
"context word1 </s>", "context word2 </s>" respetively, so I think
they are qualified to have bow values.
I read the explanation of WBDiscount and "Warning5" in the
ngram-discount manual (*1), but I couln't get it's answer.
Any advices will help me very much. Thank you.
(*1) ngram-discount manual
http://www-speech.sri.com/projects/srilm/manpages/ngram-discount.7.html
----------------------------------------------------------------------
$ cat > smp.txt << EOF
context word1
context word2
EOF
$ ngram-count -order 3 -wbdiscount -text smp.txt -gtmin 0 -gt1min0 -gt2min 0
-gt3min 0 -lm lm.arpa
$ cat lm.arpa
\data\
ngram 1=5
ngram 2=5
ngram 3=4
\1-grams:
-0.5228788 </s>
-99 <s> -0.3222193
-0.5228788 context -0.07918124
-0.69897 word1 -0.146128
-0.69897 word2 -0.146128
\2-grams:
-0.1760913 <s> context 0
-0.60206 context word1 0
-0.60206 context word2 0
-0.30103 word1 </s>
-0.30103 word2 </s>
\3-grams:
-0.60206 <s> context word1
-0.60206 <s> context word2
-0.30103 context word1 </s>
-0.30103 context word2 </s>
\end\
--
Shinichiro Hamada
More information about the SRILM-User
mailing list