[SRILM User List] Generate Probability Distribution
Andreas Stolcke
stolcke at icsi.berkeley.edu
Mon Mar 13 10:11:06 PDT 2017
A brute force solution to this (if you don't want to modify any code)
is to generate an N-gram count file of the form
apple banana banana carrot apple 1
apple banana banana carrot banana 1
apple banana banana carrot carrot 1
and pass it to
ngram -lm LM -order 5 -counts COUNTS -debug 2
If you want to make a minimal code change to enumerate all conditional
probabilities for any context encountered, you could do so in
LM::wordProbSum() and have it dump out the word tokens and their log
probabilities. Then process some text with ngram -debug 3.
Andreas
On 3/12/2017 12:12 AM, Dávid Nemeskey wrote:
> Hi Kalpesh,
>
> well, there's LM::wordProb(VocabIndex word, const VocabIndex *context)
> in lm/src/LM.cc (and in lm/src/NgramLM.cc, if you are using an ngram
> model). You could simply call it on every word in the vocabulary.
> However, be warned that this will be very slow for any reasonable
> vocabulary size (say 10k and up). This function is also what
> generateWord() calls, that is why the latter is so slow.
>
> If you just wanted the top n most probable words, the situation would
> be a bit different. Then wordProb() wouldn't be the optimal solution
> because the trie built by ngram is reversed (meaning you have to go
> back from the word to the root, and not the other way around), and you
> had to query all words to get the most probably one. So when I wanted
> to do this, I built another trie (from the root up to the word), which
> made it much faster, though I am not sure it was 100% correct in the
> face of negative backoff weights. But it wouldn't help in your case, I
> guess.
>
> Best,
> Dávid
>
> On Sat, Mar 11, 2017 at 8:32 PM, Kalpesh Krishna
> <kalpeshk2011 at gmail.com <mailto:kalpeshk2011 at gmail.com>> wrote:
>
> Hello,
> I have a context of words and I've built an N-gram language model
> using ./ngram-count. I wish to generate a probability distribution
> (over the entire vocabulary of words) of the next word. I can't
> seem to be able to find a good way to do this with ./ngram.
> What's the best way to do this?
> For example, if my vocabulary has words "apple, banana, carrot",
> and my context is "apple banana banana carrot", I want a
> distribution like - {"apple": 0.25, "banana": 0.5, "carrot": 0.25}.
>
> Thank you,
> Kalpesh Krishna
> http://martiansideofthemoon.github.io/
> <http://martiansideofthemoon.github.io/>
>
> _______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com <mailto:SRILM-User at speech.sri.com>
> http://mailman.speech.sri.com/cgi-bin/mailman/listinfo/srilm-user
> <http://mailman.speech.sri.com/cgi-bin/mailman/listinfo/srilm-user>
>
>
>
>
> _______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://mailman.speech.sri.com/cgi-bin/mailman/listinfo/srilm-user
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.speech.sri.com/pipermail/srilm-user/attachments/20170313/6e2e61e7/attachment.html>
More information about the SRILM-User
mailing list