[SRILM User List] WBDiscount backoff weights

shinichiro.hamada shinichiro.hamada at gmail.com
Tue Aug 7 10:32:43 PDT 2012


Hi.

I did a small test described as below to understand SRILM behavior
of WBDiscount backoff weights (bow), and got a question.

The values of bows of "<s> context", "context word1", "context 
word2" (2grams) are set to zero. Why?

They are the prefix of "<s> context word1" (or "<s> context word2"), 
"context word1 </s>", "context word2 </s>" respetively, so I think 
they are qualified to have bow values.

I read the explanation of WBDiscount and "Warning5" in the 
ngram-discount manual (*1), but I couln't get it's answer.

Any advices will help me very much. Thank you.

(*1) ngram-discount manual
http://www-speech.sri.com/projects/srilm/manpages/ngram-discount.7.html


----------------------------------------------------------------------
$ cat > smp.txt << EOF
context word1
context word2
EOF
$ ngram-count -order 3 -wbdiscount -text smp.txt -gtmin 0 -gt1min0 -gt2min 0
-gt3min 0 -lm lm.arpa
$ cat lm.arpa

\data\
ngram 1=5
ngram 2=5
ngram 3=4

\1-grams:
-0.5228788	</s>
-99	<s>	-0.3222193
-0.5228788	context -0.07918124
-0.69897	word1	-0.146128
-0.69897	word2	-0.146128

\2-grams:
-0.1760913	<s> context	0
-0.60206	context word1	0
-0.60206	context word2	0
-0.30103	word1 </s>
-0.30103	word2 </s>

\3-grams:
-0.60206	<s> context word1
-0.60206	<s> context word2
-0.30103	context word1 </s>
-0.30103	context word2 </s>

\end\

--
Shinichiro Hamada



More information about the SRILM-User mailing list