[SRILM User List] predicting words
Andreas Stolcke
stolcke at speech.sri.com
Fri Apr 9 22:39:31 PDT 2010
On 4/6/2010 8:07 AM, simon wrote:
> I am new to language modeling and was hoping that someone can help me
> with the following.
>
> Is it possible to use SRILM to predict a word given an input sentence.
>
> More concrete, I would like to get a word replacing the ... that has
> the highest probability in sentences such as
> 'She ate the green .....'
> (e.g. broccoli)
>
> Furthermore, will this give sensible results even when the bigram
> 'green broccoli' is not observed?
> I suspect that this kind of functionality is easily implemented or
> readily avaible,
>
This kind of problem is indeed solvable with SRILM but there is no
ready-made program to solve this task specifically.
you have to encode the task as a special case of a more general problem
for which SRILM has tools.
The most straightforward solution involves the disambig tool (details
see man page).
You would prepare a map file that allows all possible words for the ...
position in your sentence, and only the given words everywhere else.
The map file would look like this :
a a
the the
i i
am am
(and so ony for the entire vocabulary)
UNKNOWN a the i am broccoli ... (a list of all words)
You then give this map file, and a LM trained on a large corpus to disambig.
The input would be your sentence(s), with the token "UKNOWN" to mark the
unknown word positions.
There might be an issue with maximum number of words allowed per line in
the map file, which might limit the length of the UKNOWN entry. But you
can change the constant maxWordsPerLine in misc/src/File.h as needed.
Andreas
More information about the SRILM-User
mailing list