[SRILM User List] Generate Probability Distribution

Kalpesh Krishna kalpeshk2011 at gmail.com
Sun Mar 12 04:10:33 PDT 2017


Hi Dávid,
Thank you for your response. Are there any existing binaries which will
help me do this quickly? I don't mind a non-SRILM ARPA file reader either.
Yes, top N words might be good enough in my use case, especially when they
cover more than 99% of the probability mass. I like the idea of building a
trie to do this.

Thank you,
Kalpesh

On 12 Mar 2017 1:42 p.m., "Dávid Nemeskey" <nemeskeyd at gmail.com> wrote:

Hi Kalpesh,

well, there's LM::wordProb(VocabIndex word, const VocabIndex *context) in
lm/src/LM.cc (and in lm/src/NgramLM.cc, if you are using an ngram model).
You could simply call it on every word in the vocabulary. However, be
warned that this will be very slow for any reasonable vocabulary size (say
10k and up). This function is also what generateWord() calls, that is why
the latter is so slow.

If you just wanted the top n most probable words, the situation would be a
bit different. Then wordProb() wouldn't be the optimal solution because the
trie built by ngram is reversed (meaning you have to go back from the word
to the root, and not the other way around), and you had to query all words
to get the most probably one. So when I wanted to do this, I built another
trie (from the root up to the word), which made it much faster, though I am
not sure it was 100% correct in the face of negative backoff weights. But
it wouldn't help in your case, I guess.

Best,
Dávid

On Sat, Mar 11, 2017 at 8:32 PM, Kalpesh Krishna <kalpeshk2011 at gmail.com>
wrote:

> Hello,
> I have a context of words and I've built an N-gram language model using
> ./ngram-count. I wish to generate a probability distribution (over the
> entire vocabulary of words) of the next word. I can't seem to be able to
> find a good way to do this with ./ngram.
> What's the best way to do this?
> For example, if my vocabulary has words "apple, banana, carrot", and my
> context is "apple banana banana carrot", I want a distribution like -
> {"apple": 0.25, "banana": 0.5, "carrot": 0.25}.
>
> Thank you,
> Kalpesh Krishna
> http://martiansideofthemoon.github.io/
>
> _______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://mailman.speech.sri.com/cgi-bin/mailman/listinfo/srilm-user
>


_______________________________________________
SRILM-User site list
SRILM-User at speech.sri.com
http://mailman.speech.sri.com/cgi-bin/mailman/listinfo/srilm-user
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.speech.sri.com/pipermail/srilm-user/attachments/20170312/30b9bcb4/attachment.html>


More information about the SRILM-User mailing list