[SRILM User List] Regarding backoff using
Ananda K.C.
kcananda at gmail.com
Sat Sep 15 08:01:12 PDT 2012
hi all of you,
I have send my test file containing corpus,vocab,and final output bigram
probability.Also i have send you all the command in command file.
My main problem is when we use Backoff with Good Turing discounting.Then
p( He | <s> ) = [2gram] 0.0348584 [ -1.45769 ]
p( I | <s> ) = [2gram] 0.0348584 [ -1.45769 ]
p( this | <s> ) = [2gram] 0.0348584 [ -1.45769 *2 ] is only find out.
But it should find the probabilty with all the words in the vocabulary,if
bigram count is zero then it should move towards unigram count to assign
some probabilty to bigram.
like p( am | <s> )
p(going| <s> )
p( kath | <s> ) and so on with all the word in the vocabulary,which
is not calculated.
Since we know that when the bigram count is zero ,we should get probability
from unigram count.May be i have done some mistake in commands.
Please help me to solve my problem.
regards,
Ananda
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20120915/8227cfe3/attachment.html>
-------------- next part --------------
this is ananda
this is bhawana
I am going to kath
He is going to kath
-------------- next part --------------
A non-text attachment was scrubbed...
Name: command
Type: application/octet-stream
Size: 384 bytes
Desc: not available
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20120915/8227cfe3/attachment.obj>
-------------- next part --------------
<s> 4
<s> this 2
<s> I 1
<s> He 1
this 2
this is 2
is 3
is ananda 1
is bhawana 1
is going 1
ananda 1
ananda </s> 1
</s> 4
bhawana 1
bhawana </s> 1
I 1
I am 1
am 1
am going 1
going 2
going to 2
to 2
to kath 2
kath 2
kath </s> 2
He 1
He is 1
-------------- next part --------------
p( </s> | ) = [1gram] 0.2 [ -0.69897 *4 ]
p( <s> | ) = [1gram] 0 [ -inf *4 ]
p( He | ) = [1gram] 0.05 [ -1.30103 ]
p( I | ) = [1gram] 0.05 [ -1.30103 ]
p( am | ) = [1gram] 0.05 [ -1.30103 ]
p( ananda | ) = [1gram] 0.05 [ -1.30103 ]
p( bhawana | ) = [1gram] 0.05 [ -1.30103 ]
p( going | ) = [1gram] 0.1 [ -1 *2 ]
p( is | ) = [1gram] 0.15 [ -0.823909 *3 ]
p( kath | ) = [1gram] 0.1 [ -1 *2 ]
p( this | ) = [1gram] 0.1 [ -1 *2 ]
p( to | ) = [1gram] 0.1 [ -1 *2 ]
p( He | <s> ) = [2gram] 0.2 [ -0.69897 ]
p( I | <s> ) = [2gram] 0.2 [ -0.69897 ]
p( this | <s> ) = [2gram] 0.4 [ -0.39794 *2 ]
p( is | He ) = [2gram] 0.5 [ -0.30103 ]
p( am | I ) = [2gram] 0.5 [ -0.30103 ]
p( going | am ) = [2gram] 0.5 [ -0.30103 ]
p( </s> | ananda ) = [2gram] 0.5 [ -0.30103 ]
p( </s> | bhawana ) = [2gram] 0.5 [ -0.30103 ]
p( to | going ) = [2gram] 0.666667 [ -0.176091 *2 ]
p( ananda | is ) = [2gram] 0.25 [ -0.60206 ]
p( bhawana | is ) = [2gram] 0.25 [ -0.60206 ]
p( going | is ) = [2gram] 0.25 [ -0.60206 ]
p( </s> | kath ) = [2gram] 0.666667 [ -0.176091 *2 ]
p( is | this ) = [2gram] 0.666667 [ -0.176091 *2 ]
p( kath | to ) = [2gram] 0.666667 [ -0.176091 *2 ]
8 sentences, 36 words, 0 OOVs
4 zeroprobs, logprob= -26.6866 ppl= 4.64693 ppl1= 6.82272
file /home/ananda/Desktop/countout.txt: 8 sentences, 36 words, 0 OOVs
4 zeroprobs, logprob= -26.6866 ppl= 4.64693 ppl1= 6.82272
-------------- next part --------------
\data\
ngram 1=12
ngram 2=15
\1-grams:
-0.69897 </s>
-99 <s> -0.60206
-1.30103 He -0.2304489
-1.30103 I -0.2787536
-1.30103 am -0.2552725
-1.30103 ananda -0.20412
-1.30103 bhawana -0.20412
-1 going -0.4313638
-0.8239087 is -0.5051499
-1 kath -0.3802113
-1 this -0.4065402
-1 to -0.4313638
\2-grams:
-0.69897 <s> He
-0.69897 <s> I
-0.39794 <s> this
-0.30103 He is
-0.30103 I am
-0.30103 am going
-0.30103 ananda </s>
-0.30103 bhawana </s>
-0.1760913 going to
-0.60206 is ananda
-0.60206 is bhawana
-0.60206 is going
-0.1760913 kath </s>
-0.1760913 this is
-0.1760913 to kath
\end\
-------------- next part --------------
this
is
ananda
bhawana
I
am
going
to
kath
He
More information about the SRILM-User
mailing list