[SRILM User List] predicting words

Andreas Stolcke stolcke at speech.sri.com
Fri Apr 9 22:39:31 PDT 2010


On 4/6/2010 8:07 AM, simon wrote:
> I am new to language modeling and was hoping that someone can help me 
> with the following.
>
> Is it possible to use SRILM to predict a word given an input sentence.
>
> More concrete, I would like to get a word replacing the ... that has 
> the highest probability in sentences such as
> 'She ate the green .....'
> (e.g. broccoli)
>
> Furthermore, will this give sensible results even when the bigram 
> 'green broccoli' is not observed?
> I suspect that this kind of functionality is easily implemented or 
> readily avaible,
>
This kind of problem is indeed solvable with SRILM but there is no 
ready-made program to solve this task specifically.
you have to encode the task as a special case of a more general problem 
for which SRILM has tools.

The most straightforward solution involves the disambig tool (details 
see man page).
You would prepare a map file that allows all possible words for the ... 
position in your sentence, and only the given words everywhere else.  
The map file would look like this :

a a
the the
i i
am am
(and so ony for the entire vocabulary)
UNKNOWN  a the i am broccoli ... (a list of all words)

You then give this map file, and a LM trained on a large corpus to disambig.
The input would be your sentence(s), with the token "UKNOWN" to mark the 
unknown word positions.

There might be an issue with maximum number of words allowed per line in 
the map file, which might limit the length of the UKNOWN entry.  But you 
can change the constant maxWordsPerLine in misc/src/File.h as needed.

Andreas





More information about the SRILM-User mailing list