[SRILM User List] Question about SRILM and sentence boundary detection
Andreas Stolcke
stolcke at icsi.berkeley.edu
Sun Feb 12 17:37:52 PST 2012
From: *L. Amber Wilcox-O'Hearn* <amber.wilcox.ohearn at gmail.com
<mailto:amber.wilcox.ohearn at gmail.com>>
>
> Thank you, Andreas. I wasn't aware of these capabilities.
>
> The server-port worked exactly as expected. That is, if I give it w1
> w2 w3, it returns p(w3|w1w2). Combined with the caching, it looks
> very promising for my applications.
>
> The other solution using -counts (or actually -ppl for my case) also
> worked, but of course if I give it w1 w2 w3, it returns the
> probability of that whole string, i.e. p(w1) * p(w2|w1) * p(w3|w1w2),
> which would be redundant for my purposes.
That's not correct. ngram -counts will output CONDITIONAL ngram
probabilities.
*-counts*/countsfile/
Perform a computation similar to *-ppl*, but based only on the
N-gram counts found in /countsfile/. Probabilities are computed for
the last word of each N-gram, using the other words as contexts, and
scaling by the associated N-gram count. Summary statistics are
output at the end, as well as before each escaped input line.
So it should do exactly what you need.
Andreas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20120212/1d87121b/attachment.html>
More information about the SRILM-User
mailing list