[SRILM User List] Detailed description of ngram-count's -skip option
Andreas Stolcke
stolcke at icsi.berkeley.edu
Thu Mar 28 10:03:11 PDT 2013
On 3/27/2013 5:44 AM, Sander Maijers wrote:
> Hello everyone,
>
> Where can I find more detailed information about the word skipping
> algorithm provided by ngram-count? Thus far I found this:
>
> "Skip language models — In this LM, words in the history are
> probabilistically skipped, allowing more distant words to take their
> places. The skipping probabilities associated with each word are
> estimated using expectation maximization." (In SRILM — AN EXTENSIBLE
> LANGUAGE MODELING TOOLKIT, Stolcke, 2002)
>
> As I want to refer to this I prefer published scientific work authored
> by Andreas Stolcke.
The last statement makes me uncomfortable. Skip ngrams are a variant of
"distant ngram" models that you can find in the literature prior to me
writing this particular implementation. I am having trouble finding a
good reference prior to 1995, but Roni Rosenfeld's 1994 thesis certainly
had similar ideas, though in the context of maxent modeling.
The SkipNgram model is essentially an interpolation between a straight
ngram P(w_n | w_{n-1} w_{n-2} ...) and another ngram model where the
preceding word is skipped: P(w_n | w_{n-2} P_{n-1} where the
interpolation weight is a function of the skipped word w_{n-1}.
So you have a "skipping probability" associated with each word, and that
is estimated in a straightforward way using EM. You can read the code
for the details, it should be pretty easy to follow.
Andreas
More information about the SRILM-User
mailing list