[SRILM User List] ngram-count's ARPA N-gram LM extensions beyond "\end\" marker

Tue Jun 18 15:39:15 PDT 2013

On 16-06-13 21:13, Andreas Stolcke wrote:
> On 6/15/2013 12:39 PM, Sander Maijers wrote:
>> In the case of an LM created with '-skip', what is the meaning of the
>> values past "\end\"?
>>
>> They are of the form:
>>
>> a-team 0.5
>> a-teens 0.5
>> a-test 0.5
>
> These are the skip probabilities estimated by the model.    0.5 is the
> default initial value, but after doing the EM estimation each word would
> have its individual probability of being skipped in the computation of
> condition probabilities.   With the above values you would get
>
> P(w | a b "a-team" ) =  0.5 P'(w | a b)  + 0.5 P'(w | a b "a-team" )
>
> and so on for all words.  Here P' is the probability as determined by a
> standard n-gram LM.
> Note:  "a-team" is the word right before the word being predicted (w).
>
>>
>>
>> I do not understand their relation to these 'ngram-count' parameters:
>>
>> -init-lm lmfile
>>     Load an LM to initialize the parameters of the skip-N-gram.
> As it says, you can start the estimation process with a preexisting set
> of parameters, read from a model file "lmfile".
>
>> -skip-init value
>>     The initial skip probability for all words.
> Alternatively, you can initialize all skip probabilities to the same
> fixed value.
>> -em-iters n
>>     The maximum number of EM iterations.
>> -em-delta d
>>     The convergence criterion for EM: if the relative change in log
>> likelihood falls below the given value, iteration stops.
> These are just standard parameters for an EM-type algorithm.
>
> Andreas
>

1. Can only the first preceding word ("a-team") be skipped in this kind 
of skip LM? I first believed all history words could be skipped, except 
for the very last (most distant from w_n), but now I am not sure anymore.

2. In this case, what kind of smoothing goes on under the hood of P'? I 
have created my skip LM with the following parameters to 'ngram-count':
-vocab %s -prune %s -skip -debug 1 -order 3 -text %s -sort -lm %s 
-limit-vocab -tolower
does that also incorporate backoff and Good-Turing discounting like it 
would without '-skip'?