[SRILM User List] Question about SRILM and sentence boundary detection

Andreas Stolcke stolcke at icsi.berkeley.edu
Sun Feb 12 17:37:52 PST 2012


From: *L. Amber Wilcox-O'Hearn* <amber.wilcox.ohearn at gmail.com 
<mailto:amber.wilcox.ohearn at gmail.com>>
>
> Thank you, Andreas.  I wasn't aware of these capabilities.
>
> The server-port worked exactly as expected.  That is, if I give it w1
> w2 w3, it returns p(w3|w1w2).  Combined with the caching, it looks
> very promising for my applications.
>
> The other solution using -counts (or actually -ppl for my case) also
> worked, but of course if I give it w1 w2 w3, it returns the
> probability of that whole string, i.e.  p(w1) * p(w2|w1) * p(w3|w1w2),
> which would be redundant for my purposes.
That's not correct.    ngram -counts will output CONDITIONAL ngram 
probabilities.

*-counts*/countsfile/
    Perform a computation similar to *-ppl*, but based only on the
    N-gram counts found in /countsfile/. Probabilities are computed for
    the last word of each N-gram, using the other words as contexts, and
    scaling by the associated N-gram count. Summary statistics are
    output at the end, as well as before each escaped input line. 

So it should do exactly what you need.

Andreas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20120212/1d87121b/attachment.html>


More information about the SRILM-User mailing list