[SRILM User List] Detailed description of ngram-count's -skip option

Thu Mar 28 10:03:11 PDT 2013

On 3/27/2013 5:44 AM, Sander Maijers wrote:
> Hello everyone,
>
> Where can I find more detailed information about the word skipping 
> algorithm provided by ngram-count? Thus far I found this:
>
> "Skip language models — In this LM, words in the history are 
> probabilistically skipped, allowing more distant words to take their 
> places. The skipping probabilities associated with each word are 
> estimated using expectation maximization." (In SRILM — AN EXTENSIBLE 
> LANGUAGE MODELING TOOLKIT, Stolcke, 2002)
>
> As I want to refer to this I prefer published scientific work authored 
> by Andreas Stolcke.

The last statement makes me uncomfortable.  Skip ngrams are a variant of 
"distant ngram" models that you can find in the literature prior to me 
writing this particular implementation.  I am having trouble finding a 
good reference prior to 1995, but Roni Rosenfeld's 1994 thesis certainly 
had similar ideas, though in the context of maxent modeling.

The SkipNgram model is essentially an interpolation between a straight 
ngram P(w_n | w_{n-1} w_{n-2} ...)  and another ngram model where the 
preceding word is skipped: P(w_n | w_{n-2} P_{n-1} where the 
interpolation weight is a function of the skipped word w_{n-1}.
So you have a "skipping probability" associated with each word, and that 
is estimated in a straightforward way using EM.  You can read the code 
for the details, it should be pretty easy to follow.

Andreas