[SRILM User List] Generate Probability Distribution

Dávid Nemeskey nemeskeyd at gmail.com
Sun Mar 12 00:12:07 PST 2017


Hi Kalpesh,

well, there's LM::wordProb(VocabIndex word, const VocabIndex *context) in
lm/src/LM.cc (and in lm/src/NgramLM.cc, if you are using an ngram model).
You could simply call it on every word in the vocabulary. However, be
warned that this will be very slow for any reasonable vocabulary size (say
10k and up). This function is also what generateWord() calls, that is why
the latter is so slow.

If you just wanted the top n most probable words, the situation would be a
bit different. Then wordProb() wouldn't be the optimal solution because the
trie built by ngram is reversed (meaning you have to go back from the word
to the root, and not the other way around), and you had to query all words
to get the most probably one. So when I wanted to do this, I built another
trie (from the root up to the word), which made it much faster, though I am
not sure it was 100% correct in the face of negative backoff weights. But
it wouldn't help in your case, I guess.

Best,
Dávid

On Sat, Mar 11, 2017 at 8:32 PM, Kalpesh Krishna <kalpeshk2011 at gmail.com>
wrote:

> Hello,
> I have a context of words and I've built an N-gram language model using
> ./ngram-count. I wish to generate a probability distribution (over the
> entire vocabulary of words) of the next word. I can't seem to be able to
> find a good way to do this with ./ngram.
> What's the best way to do this?
> For example, if my vocabulary has words "apple, banana, carrot", and my
> context is "apple banana banana carrot", I want a distribution like -
> {"apple": 0.25, "banana": 0.5, "carrot": 0.25}.
>
> Thank you,
> Kalpesh Krishna
> http://martiansideofthemoon.github.io/
>
> _______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://mailman.speech.sri.com/cgi-bin/mailman/listinfo/srilm-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.speech.sri.com/pipermail/srilm-user/attachments/20170312/c1efa231/attachment.html>


More information about the SRILM-User mailing list