[SRILM User List] WBDiscount backoff weights

shinichiro.hamada shinichiro.hamada at gmail.com
Tue Aug 7 10:32:43 PDT 2012


I did a small test described as below to understand SRILM behavior
of WBDiscount backoff weights (bow), and got a question.

The values of bows of "<s> context", "context word1", "context 
word2" (2grams) are set to zero. Why?

They are the prefix of "<s> context word1" (or "<s> context word2"), 
"context word1 </s>", "context word2 </s>" respetively, so I think 
they are qualified to have bow values.

I read the explanation of WBDiscount and "Warning5" in the 
ngram-discount manual (*1), but I couln't get it's answer.

Any advices will help me very much. Thank you.

(*1) ngram-discount manual

$ cat > smp.txt << EOF
context word1
context word2
$ ngram-count -order 3 -wbdiscount -text smp.txt -gtmin 0 -gt1min0 -gt2min 0
-gt3min 0 -lm lm.arpa
$ cat lm.arpa

ngram 1=5
ngram 2=5
ngram 3=4

-0.5228788	</s>
-99	<s>	-0.3222193
-0.5228788	context -0.07918124
-0.69897	word1	-0.146128
-0.69897	word2	-0.146128

-0.1760913	<s> context	0
-0.60206	context word1	0
-0.60206	context word2	0
-0.30103	word1 </s>
-0.30103	word2 </s>

-0.60206	<s> context word1
-0.60206	<s> context word2
-0.30103	context word1 </s>
-0.30103	context word2 </s>


Shinichiro Hamada

More information about the SRILM-User mailing list