<div dir="auto"><div>Hi <span style="font-family:sans-serif">Dávid,</span></div><div dir="auto"><font face="sans-serif">Thank you for your response. Are there any existing binaries which will help me do this quickly? I don't mind a non-SRILM ARPA file reader either.</font></div><div dir="auto"><font face="sans-serif">Yes, top N words might be good enough in my use case, especially when they cover more than 99% of the probability mass. I like the idea of building a trie to do this.</font></div><div dir="auto"><font face="sans-serif"><br></font></div><div dir="auto"><font face="sans-serif">Thank you,</font></div><div dir="auto"><font face="sans-serif">Kalpesh<br></font><div class="gmail_extra" dir="auto"><br><div class="gmail_quote">On 12 Mar 2017 1:42 p.m., "Dávid Nemeskey" <<a href="mailto:nemeskeyd@gmail.com">nemeskeyd@gmail.com</a>> wrote:<br type="attribution"><blockquote class="quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div>Hi Kalpesh,<br><br></div>well, there's LM::<span class="m_4034246613824824293gmail-pl-en">wordProb</span>(VocabIndex word, <span class="m_4034246613824824293gmail-pl-k">const</span> VocabIndex *context<span class="m_4034246613824824293gmail-pl-k"></span><span class="m_4034246613824824293gmail-pl-k"></span>) in lm/src/LM.cc (and in lm/src/NgramLM.cc, if you are using an ngram model). You could simply call it on every word in the vocabulary. However, be warned that this will be very slow for any reasonable vocabulary size (say 10k and up). This function is also what generateWord() calls, that is why the latter is so slow.<br><br>If you just wanted the top n most probable words, the situation would be a bit different. Then wordProb() wouldn't be the optimal solution because the trie built by ngram is reversed (meaning you have to go back from the word to the root, and not the other way around), and you had to query all words to get the most probably one. So when I wanted to do this, I built another trie (from the root up to the word), which made it much faster, though I am not sure it was 100% correct in the face of negative backoff weights. But it wouldn't help in your case, I guess.<br><br></div><div>Best,<br></div><div>Dávid<br></div></div><div class="gmail_extra"><br><div class="gmail_quote"><div class="elided-text">On Sat, Mar 11, 2017 at 8:32 PM, Kalpesh Krishna <span dir="ltr"><<a href="mailto:kalpeshk2011@gmail.com" target="_blank">kalpeshk2011@gmail.com</a>></span> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="elided-text"><div dir="ltr">Hello,<div>I have a context of words and I've built an N-gram language model using ./ngram-count. I wish to generate a probability distribution (over the entire vocabulary of words) of the next word. I can't seem to be able to find a good way to do this with ./ngram.</div><div>What's the best way to do this?</div><div>For example, if my vocabulary has words "apple, banana, carrot", and my context is "apple banana banana carrot", I want a distribution like - {"apple": 0.25, "banana": 0.5, "carrot": 0.25}.</div><div><br></div><div>Thank you,</div><div>Kalpesh Krishna</div><div><a href="http://martiansideofthemoon.github.io/" target="_blank">http://martiansideofthemoon.gi<wbr>thub.io/</a><br></div><img class="m_4034246613824824293m_8658768150482666164mailtrack-img" height="0" width="0"></div>
<br></div>______________________________<wbr>_________________<br>
SRILM-User site list<br>
<a href="mailto:SRILM-User@speech.sri.com" target="_blank">SRILM-User@speech.sri.com</a><br>
<a href="http://mailman.speech.sri.com/cgi-bin/mailman/listinfo/srilm-user" rel="noreferrer" target="_blank">http://mailman.speech.sri.com/<wbr>cgi-bin/mailman/listinfo/srilm<wbr>-user</a><br></blockquote></div><br></div>
<br>______________________________<wbr>_________________<br>
SRILM-User site list<br>
<a href="mailto:SRILM-User@speech.sri.com">SRILM-User@speech.sri.com</a><br>
<a href="http://mailman.speech.sri.com/cgi-bin/mailman/listinfo/srilm-user" rel="noreferrer" target="_blank">http://mailman.speech.sri.com/<wbr>cgi-bin/mailman/listinfo/<wbr>srilm-user</a><br></blockquote></div><br></div></div></div>