[SRILM User List] Regarding backoff using

Ananda K.C. kcananda at gmail.com
Sat Sep 15 08:01:12 PDT 2012


hi all of you,

I have send my test file containing corpus,vocab,and final output bigram
probability.Also i have send you all the command in command file.

My main problem is when we use Backoff with Good Turing discounting.Then

p( He | <s> )     = [2gram] 0.0348584 [ -1.45769 ]
 p( I | <s> )     = [2gram] 0.0348584 [ -1.45769 ]
 p( this | <s> )     = [2gram] 0.0348584 [ -1.45769 *2 ] is only find out.

But it should find the probabilty with all the words in the vocabulary,if
bigram count is zero then it should move towards unigram count to assign
some probabilty to bigram.

like p( am | <s> )
    p(going| <s> )
    p( kath | <s> )   and so on with all the word in the vocabulary,which
is not calculated.

Since we know that when the bigram count is zero ,we should get probability
from unigram count.May be i have done some mistake in commands.

Please help me to solve my problem.


regards,
Ananda
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20120915/8227cfe3/attachment.html>
-------------- next part --------------
this is ananda
this is bhawana
I am going to kath	
He is going to kath
-------------- next part --------------
A non-text attachment was scrubbed...
Name: command
Type: application/octet-stream
Size: 384 bytes
Desc: not available
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20120915/8227cfe3/attachment.obj>
-------------- next part --------------
<s>	4
<s> this	2
<s> I	1
<s> He	1
this	2
this is	2
is	3
is ananda	1
is bhawana	1
is going	1
ananda	1
ananda </s>	1
</s>	4
bhawana	1
bhawana </s>	1
I	1
I am	1
am	1
am going	1
going	2
going to	2
to	2
to kath	2
kath	2
kath </s>	2
He	1
He is	1
-------------- next part --------------
	p( </s> |  ) 	= [1gram] 0.2 [ -0.69897 *4 ]
	p( <s> |  ) 	= [1gram] 0 [ -inf *4 ]
	p( He |  ) 	= [1gram] 0.05 [ -1.30103 ]
	p( I |  ) 	= [1gram] 0.05 [ -1.30103 ]
	p( am |  ) 	= [1gram] 0.05 [ -1.30103 ]
	p( ananda |  ) 	= [1gram] 0.05 [ -1.30103 ]
	p( bhawana |  ) 	= [1gram] 0.05 [ -1.30103 ]
	p( going |  ) 	= [1gram] 0.1 [ -1 *2 ]
	p( is |  ) 	= [1gram] 0.15 [ -0.823909 *3 ]
	p( kath |  ) 	= [1gram] 0.1 [ -1 *2 ]
	p( this |  ) 	= [1gram] 0.1 [ -1 *2 ]
	p( to |  ) 	= [1gram] 0.1 [ -1 *2 ]
	p( He | <s> ) 	= [2gram] 0.2 [ -0.69897 ]
	p( I | <s> ) 	= [2gram] 0.2 [ -0.69897 ]
	p( this | <s> ) 	= [2gram] 0.4 [ -0.39794 *2 ]
	p( is | He ) 	= [2gram] 0.5 [ -0.30103 ]
	p( am | I ) 	= [2gram] 0.5 [ -0.30103 ]
	p( going | am ) 	= [2gram] 0.5 [ -0.30103 ]
	p( </s> | ananda ) 	= [2gram] 0.5 [ -0.30103 ]
	p( </s> | bhawana ) 	= [2gram] 0.5 [ -0.30103 ]
	p( to | going ) 	= [2gram] 0.666667 [ -0.176091 *2 ]
	p( ananda | is ) 	= [2gram] 0.25 [ -0.60206 ]
	p( bhawana | is ) 	= [2gram] 0.25 [ -0.60206 ]
	p( going | is ) 	= [2gram] 0.25 [ -0.60206 ]
	p( </s> | kath ) 	= [2gram] 0.666667 [ -0.176091 *2 ]
	p( is | this ) 	= [2gram] 0.666667 [ -0.176091 *2 ]
	p( kath | to ) 	= [2gram] 0.666667 [ -0.176091 *2 ]
8 sentences, 36 words, 0 OOVs
4 zeroprobs, logprob= -26.6866 ppl= 4.64693 ppl1= 6.82272

file /home/ananda/Desktop/countout.txt: 8 sentences, 36 words, 0 OOVs
4 zeroprobs, logprob= -26.6866 ppl= 4.64693 ppl1= 6.82272
-------------- next part --------------

\data\
ngram 1=12
ngram 2=15

\1-grams:
-0.69897	</s>
-99	<s>	-0.60206
-1.30103	He	-0.2304489
-1.30103	I	-0.2787536
-1.30103	am	-0.2552725
-1.30103	ananda	-0.20412
-1.30103	bhawana	-0.20412
-1	going	-0.4313638
-0.8239087	is	-0.5051499
-1	kath	-0.3802113
-1	this	-0.4065402
-1	to	-0.4313638

\2-grams:
-0.69897	<s> He
-0.69897	<s> I
-0.39794	<s> this
-0.30103	He is
-0.30103	I am
-0.30103	am going
-0.30103	ananda </s>
-0.30103	bhawana </s>
-0.1760913	going to
-0.60206	is ananda
-0.60206	is bhawana
-0.60206	is going
-0.1760913	kath </s>
-0.1760913	this is
-0.1760913	to kath

\end\
-------------- next part --------------
this
is
ananda
bhawana
I
am
going
to
kath
He 


More information about the SRILM-User mailing list