[SRILM User List] C(<s>) is always zero?
James Kirby
j.kirby at ed.ac.uk
Wed Feb 15 04:00:33 PST 2012
Hello,
is there a reason why the unigram count of the auto-prepended sentence
start tag <s> is always zero? As can be seen from the output below, the log
probabilities are calculated counting the sentence send tags </s> but not
the start tags. Or have I just missed something horribly obvious?
Thanks,
James
----
[jkirby at Markov]$ more sentence.txt
Sentence number 1.
Sentence number 2.
Sentence number 3.
[jkirby at Markov]$ ngram-count -order 1 -text sentence.txt -tolower -lm
sentence.lm
warning: count of count 2 is zero -- lowering maxcount
GT discounting disabled
[jkirby at Markov]$ more sentence.lm
\data\
ngram 1=7
\1-grams:
-1.079181 1.
-1.079181 2.
-1.079181 3.
-0.60206 </s>
-99 <s>
-0.60206 number
-0.60206 sentence
\end\
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20120215/a23f138d/attachment.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20120215/a23f138d/attachment.ksh>
More information about the SRILM-User
mailing list