<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix"><br>
A brute force solution to this (if you don't want to modify any
code) is to generate an N-gram count file of the form<br>
<br>
apple banana banana carrot apple 1<br>
apple banana banana carrot banana 1<br>
apple banana banana carrot carrot 1<br>
<br>
and pass it to <br>
<br>
ngram -lm LM -order 5 -counts COUNTS -debug 2 <br>
<br>
If you want to make a minimal code change to enumerate all
conditional probabilities for any context encountered, you could
do so in LM::wordProbSum() and have it dump out the word tokens
and their log probabilities. Then process some text with ngram
-debug 3.<br>
<br>
Andreas<br>
<br>
<br>
<br>
On 3/12/2017 12:12 AM, Dávid Nemeskey wrote:<br>
</div>
<blockquote
cite="mid:CAHOrvWeeQZvdwYgWrCYz3vow47hUEEmhjLceBySm992_TecUZg@mail.gmail.com"
type="cite">
<div dir="ltr">
<div>
<div>Hi Kalpesh,<br>
<br>
</div>
well, there's LM::<span class="gmail-pl-en">wordProb</span>(VocabIndex
word, <span class="gmail-pl-k">const</span> VocabIndex
*context<span class="gmail-pl-k"></span><span
class="gmail-pl-k"></span>) in lm/src/LM.cc (and in
lm/src/NgramLM.cc, if you are using an ngram model). You could
simply call it on every word in the vocabulary. However, be
warned that this will be very slow for any reasonable
vocabulary size (say 10k and up). This function is also what
generateWord() calls, that is why the latter is so slow.<br>
<br>
If you just wanted the top n most probable words, the
situation would be a bit different. Then wordProb() wouldn't
be the optimal solution because the trie built by ngram is
reversed (meaning you have to go back from the word to the
root, and not the other way around), and you had to query all
words to get the most probably one. So when I wanted to do
this, I built another trie (from the root up to the word),
which made it much faster, though I am not sure it was 100%
correct in the face of negative backoff weights. But it
wouldn't help in your case, I guess.<br>
<br>
</div>
<div>Best,<br>
</div>
<div>Dávid<br>
</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Sat, Mar 11, 2017 at 8:32 PM,
Kalpesh Krishna <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:kalpeshk2011@gmail.com" target="_blank">kalpeshk2011@gmail.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">Hello,
<div>I have a context of words and I've built an N-gram
language model using ./ngram-count. I wish to generate a
probability distribution (over the entire vocabulary of
words) of the next word. I can't seem to be able to find
a good way to do this with ./ngram.</div>
<div>What's the best way to do this?</div>
<div>For example, if my vocabulary has words "apple,
banana, carrot", and my context is "apple banana banana
carrot", I want a distribution like - {"apple": 0.25,
"banana": 0.5, "carrot": 0.25}.</div>
<div><br>
</div>
<div>Thank you,</div>
<div>Kalpesh Krishna</div>
<div><a moz-do-not-send="true"
href="http://martiansideofthemoon.github.io/"
target="_blank">http://martiansideofthemoon.<wbr>github.io/</a><br>
</div>
<img moz-do-not-send="true"
class="m_8658768150482666164mailtrack-img" height="0"
width="0"></div>
<br>
______________________________<wbr>_________________<br>
SRILM-User site list<br>
<a moz-do-not-send="true"
href="mailto:SRILM-User@speech.sri.com">SRILM-User@speech.sri.com</a><br>
<a moz-do-not-send="true"
href="http://mailman.speech.sri.com/cgi-bin/mailman/listinfo/srilm-user"
rel="noreferrer" target="_blank">http://mailman.speech.sri.com/<wbr>cgi-bin/mailman/listinfo/<wbr>srilm-user</a><br>
</blockquote>
</div>
<br>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
SRILM-User site list
<a class="moz-txt-link-abbreviated" href="mailto:SRILM-User@speech.sri.com">SRILM-User@speech.sri.com</a>
<a class="moz-txt-link-freetext" href="http://mailman.speech.sri.com/cgi-bin/mailman/listinfo/srilm-user">http://mailman.speech.sri.com/cgi-bin/mailman/listinfo/srilm-user</a></pre>
</blockquote>
<p><br>
</p>
</body>
</html>