Querying count-based LM for specific n-gram probabilities
Andreas Stolcke
stolcke at speech.sri.com
Sat Apr 5 21:52:22 PDT 2008
Have a look at the ngram -counts option.
--Andreas
In message <55141B22-A42B-482C-A8B5-D9608AC6CE7E at cs.wisc.edu>you wrote:
> Dear list,
>
> I am using the Google 1T ngram corpus, and have successfully built a
> count-based LM as per the instructions on the FAQ. Thanks for those
> tips to get started! I have also been able to compute perplexities
> for test sentences using the -ppl option of the ngram program, and
> got this working with the newer server options, too! Very cool.
>
> However, what I really want to do is to be able to retrieve just the
> probabilities for particular n-grams to use them in another
> application. In other words, given a word and a history (say, words
> h1 h2 h3 h4), I would like to know the LM's probability P( word | h1
> h2 h3 ), after taking into account interpolation, etc. I know one
> hack-ish way to do this would be to put "h1 h2 h3 h4 w" in a test
> file, and then parse the debug output to get the desired probability.
> This would be complicated for higher-order ngrams since the output
> truncates the histories with "..."; plus this idea of parsing the
> output just seems really messy. Since I'm using the Google corpus
> with a count-based model, I don't think it's possible/feasible to
> write the model's probabilties to disk, but maybe there's a way
> around this using -limit-vocab.
>
> So my question is:
> Is there a direct way to query for a specific probability using one
> of the existing programs (i.e., to find P( is | my name), specify
> some options like -word "is" -history "my name")? Or is my only
> option to use the libraries to write my own tool for this purpose? If
> so, can you recommend an existing program that would be a good place
> to start? What would be especially great is if I could request ngram
> probabilities as described here using the LM server options (i.e.,
> start the server and load the counts for some limited vocab, then
> have a client program that can make requests).
>
> Thanks in advance!
>
> - Andrew
More information about the SRILM-User
mailing list