[SRILM User List] C(<s>) is always zero?

James Kirby j.kirby at ed.ac.uk
Wed Feb 15 04:00:33 PST 2012


Hello,

is there a reason why the unigram count of the auto-prepended sentence
start tag <s> is always zero? As can be seen from the output below, the log
probabilities are calculated counting the sentence send tags </s> but not
the start tags. Or have I just missed something horribly obvious?

Thanks,
James

----

[jkirby at Markov]$ more sentence.txt
Sentence number 1.
Sentence number 2.
Sentence number 3.

[jkirby at Markov]$ ngram-count -order 1 -text sentence.txt -tolower -lm
sentence.lm
warning: count of count 2 is zero -- lowering maxcount
GT discounting disabled

[jkirby at Markov]$ more sentence.lm

\data\
ngram 1=7

\1-grams:
-1.079181       1.
-1.079181       2.
-1.079181       3.
-0.60206        </s>
-99     <s>
-0.60206        number
-0.60206        sentence

\end\
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20120215/a23f138d/attachment.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20120215/a23f138d/attachment.ksh>


More information about the SRILM-User mailing list