<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix"><br>

      A brute force solution to this (if you don't want to modify any

      code)  is to generate an N-gram count file of the form<br>

      <br>

      apple banana banana carrot apple        1<br>

      apple banana banana carrot banana        1<br>

      apple banana banana carrot carrot        1<br>

      <br>

      and pass it to <br>

      <br>

          ngram -lm LM    -order 5 -counts COUNTS -debug 2 <br>

      <br>

      If you want to make a minimal code change to enumerate all

      conditional probabilities for any context encountered, you could

      do so in LM::wordProbSum() and have it dump out the word tokens

      and their log probabilities.  Then process some text with ngram

      -debug 3.<br>

      <br>

      Andreas<br>

      <br>

      <br>

      <br>

      On 3/12/2017 12:12 AM, Dávid Nemeskey wrote:<br>

    </div>

    <blockquote

cite="mid:CAHOrvWeeQZvdwYgWrCYz3vow47hUEEmhjLceBySm992_TecUZg@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div>

          <div>Hi Kalpesh,<br>

            <br>

          </div>

          well, there's LM::<span class="gmail-pl-en">wordProb</span>(VocabIndex

          word, <span class="gmail-pl-k">const</span> VocabIndex

          *context<span class="gmail-pl-k"></span><span

            class="gmail-pl-k"></span>) in lm/src/LM.cc (and in

          lm/src/NgramLM.cc, if you are using an ngram model). You could

          simply call it on every word in the vocabulary. However, be

          warned that this will be very slow for any reasonable

          vocabulary size (say 10k and up). This function is also what

          generateWord() calls, that is why the latter is so slow.<br>

          <br>

          If you just wanted the top n most probable words, the

          situation would be a bit different. Then wordProb() wouldn't

          be the optimal solution because the trie built by ngram is

          reversed (meaning you have to go back from the word to the

          root, and not the other way around), and you had to query all

          words to get the most probably one. So when I wanted to do

          this, I built another trie (from the root up to the word),

          which made it much faster, though I am not sure it was 100%

          correct in the face of negative backoff weights. But it

          wouldn't help in your case, I guess.<br>

          <br>

        </div>

        <div>Best,<br>

        </div>

        <div>Dávid<br>

        </div>

      </div>

      <div class="gmail_extra"><br>

        <div class="gmail_quote">On Sat, Mar 11, 2017 at 8:32 PM,

          Kalpesh Krishna <span dir="ltr"><<a moz-do-not-send="true"

              href="mailto:kalpeshk2011@gmail.com" target="_blank">kalpeshk2011@gmail.com</a>></span>

          wrote:<br>

          <blockquote class="gmail_quote" style="margin:0 0 0

            .8ex;border-left:1px #ccc solid;padding-left:1ex">

            <div dir="ltr">Hello,

              <div>I have a context of words and I've built an N-gram

                language model using ./ngram-count. I wish to generate a

                probability distribution (over the entire vocabulary of

                words) of the next word. I can't seem to be able to find

                a good way to do this with ./ngram.</div>

              <div>What's the best way to do this?</div>

              <div>For example, if my vocabulary has words "apple,

                banana, carrot", and my context is "apple banana banana

                carrot", I want a distribution like - {"apple": 0.25,

                "banana": 0.5, "carrot": 0.25}.</div>

              <div><br>

              </div>

              <div>Thank you,</div>

              <div>Kalpesh Krishna</div>

              <div><a moz-do-not-send="true"

                  href="http://martiansideofthemoon.github.io/"

                  target="_blank">http://martiansideofthemoon.<wbr>github.io/</a><br>

              </div>

              <img moz-do-not-send="true"

                class="m_8658768150482666164mailtrack-img" height="0"

                width="0"></div>

            <br>

            ______________________________<wbr>_________________<br>

            SRILM-User site list<br>

            <a moz-do-not-send="true"

              href="mailto:SRILM-User@speech.sri.com">SRILM-User@speech.sri.com</a><br>

            <a moz-do-not-send="true"

              href="http://mailman.speech.sri.com/cgi-bin/mailman/listinfo/srilm-user"

              rel="noreferrer" target="_blank">http://mailman.speech.sri.com/<wbr>cgi-bin/mailman/listinfo/<wbr>srilm-user</a><br>

          </blockquote>

        </div>

        <br>

      </div>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

SRILM-User site list

<a class="moz-txt-link-abbreviated" href="mailto:SRILM-User@speech.sri.com">SRILM-User@speech.sri.com</a>

<a class="moz-txt-link-freetext" href="http://mailman.speech.sri.com/cgi-bin/mailman/listinfo/srilm-user">http://mailman.speech.sri.com/cgi-bin/mailman/listinfo/srilm-user</a></pre>

    </blockquote>

    <p><br>

    </p>

  </body>

</html>