[SRILM User List] Predicting words

Andreas Stolcke stolcke at icsi.berkeley.edu
Wed Aug 8 22:09:35 PDT 2012


On 7/20/2012 5:04 AM, Nouf Al-Harbi wrote:
> Hello,
>
> I am new to language modeling and was hoping that someone can help me 
> with the following.
>
> I try to predict a word given an input sentence. For example, I would 
> like to get a word replacing the ... that has the
> highest probability in sentences such as ' A man is ...' (e.g. sitting).
>
> I try to use disambig tool but I couldn't found any example illustrate 
> how to use it especially how how I can create the map file and what is 
> the type of this file ( e.g. txt, arpa, ...).

Indeed you can use disambig, at least in theory to solve this problem.

1. prepare a map file of the form:

     a       a
     man    man
     ...   [for all words occurring in your data]
     UNKNOWN_WORD  word1 word2  ....  [list all words in the vocabulary 
here]

2. train an LM of word sequences.

3. prepare disambig input of the form

                 a man is sitting UNKNOWN_WORD

    You can also add known words to the right of UKNOWN_WORD if you have 
that information (see the note about -fw-only below).

4. run disambig

             disambig -map MAPFILE -lm LMFILE -text INPUTFILE

If you want to use only the left context of the UNKNOWN_WORD use the 
-fw-only option.

This is in theory.  If your vocabulary is large it may be very slow and 
take too much memory.  I haven't tried it, so let me know if it works 
for you.

Andreas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20120808/4a346594/attachment.html>


More information about the SRILM-User mailing list