[SRILM User List] Question about SRILM and sentence boundary detection
L. Amber Wilcox-O'Hearn
amber.wilcox.ohearn at gmail.com
Tue Feb 14 04:54:31 PST 2012
On Sun, Feb 12, 2012 at 6:37 PM, Andreas Stolcke
<stolcke at icsi.berkeley.edu> wrote:
> From: L. Amber Wilcox-O'Hearn <amber.wilcox.ohearn at gmail.com>
> Thank you, Andreas. I wasn't aware of these capabilities.
> The server-port worked exactly as expected. That is, if I give it w1
> w2 w3, it returns p(w3|w1w2). Combined with the caching, it looks
> very promising for my applications.
> The other solution using -counts (or actually -ppl for my case) also
> worked, but of course if I give it w1 w2 w3, it returns the
> probability of that whole string, i.e. p(w1) * p(w2|w1) * p(w3|w1w2),
> which would be redundant for my purposes.
> That's not correct. ngram -counts will output CONDITIONAL ngram
> -counts countsfile Perform a computation similar to -ppl, but based only on
> the N-gram counts found in countsfile. Probabilities are computed for the
> last word of each N-gram, using the other words as contexts, and scaling by
> the associated N-gram count. Summary statistics are output at the end, as
> well as before each escaped input line. So it should do exactly what you
I see. I misunderstood the difference between -ppl and -counts.
I did try this and the summary statistics at the end gave the correct
sum, but there weren't any statistics output before the escaped lines:
> cat testcounts | ngram -lm LM -escape "===" -counts - -unk
file -: 0 sentences, 4 words, 0 OOVs
0 zeroprobs, logprob= -9.87606 ppl= 294.452 ppl1= 294.452
Did I miss something?
More information about the SRILM-User