[SRILM User List] ngram-count's ARPA N-gram LM extensions beyond "\end\" marker

Tue Jun 18 16:44:55 PDT 2013

On 6/18/2013 3:39 PM, Sander Maijers wrote:
> On 16-06-13 21:13, Andreas Stolcke wrote:
>> On 6/15/2013 12:39 PM, Sander Maijers wrote:
>>> In the case of an LM created with '-skip', what is the meaning of the
>>> values past "\end\"?
>>>
>>> They are of the form:
>>>
>>> a-team 0.5
>>> a-teens 0.5
>>> a-test 0.5
>>
>> These are the skip probabilities estimated by the model.    0.5 is the
>> default initial value, but after doing the EM estimation each word would
>> have its individual probability of being skipped in the computation of
>> condition probabilities.   With the above values you would get
>>
>> P(w | a b "a-team" ) =  0.5 P'(w | a b)  + 0.5 P'(w | a b "a-team" )
>>
>> and so on for all words.  Here P' is the probability as determined by a
>> standard n-gram LM.
>> Note:  "a-team" is the word right before the word being predicted (w).
>>
>>>
>>>
>>> I do not understand their relation to these 'ngram-count' parameters:
>>>
>>> -init-lm lmfile
>>>     Load an LM to initialize the parameters of the skip-N-gram.
>> As it says, you can start the estimation process with a preexisting set
>> of parameters, read from a model file "lmfile".
>>
>>> -skip-init value
>>>     The initial skip probability for all words.
>> Alternatively, you can initialize all skip probabilities to the same
>> fixed value.
>>> -em-iters n
>>>     The maximum number of EM iterations.
>>> -em-delta d
>>>     The convergence criterion for EM: if the relative change in log
>>> likelihood falls below the given value, iteration stops.
>> These are just standard parameters for an EM-type algorithm.
>>
>> Andreas
>>
>
> 1. Can only the first preceding word ("a-team") be skipped in this 
> kind of skip LM? I first believed all history words could be skipped, 
> except for the very last (most distant from w_n), but now I am not 
> sure anymore.
No, to keep it simple, the current implementation only considers 
skipping the word immediately preceding the word being predicted.

> 2. In this case, what kind of smoothing goes on under the hood of P'? 
> I have created my skip LM with the following parameters to 'ngram-count':
> -vocab %s -prune %s -skip -debug 1 -order 3 -text %s -sort -lm %s 
> -limit-vocab -tolower
> does that also incorporate backoff and Good-Turing discounting like it 
> would without '-skip'?
Yes, the underlying estimation algorithm (the M-step of the EM 
algorithm) is a standard backoff ngram estimation.
The only thing that's nonstandard is that the ngram counts going into 
the estimation are fractional counts, as computed in the E-step.
Therefore, the same limitations as triggered by the ngram-count 
-float-counts option apply.   Mainly, you can use only certain 
discounting methods, those that can deal with fractional counts.  In 
particular, the methods based on counts-of-counts are out, so no GT or 
KN discounting.  You should get an error message if you try to use them.

Andreas


>
> _______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://www.speech.sri.com/mailman/listinfo/srilm-user