[SRILM User List] ngram-count's ARPA N-gram LM extensions beyond "\end\" marker
Andreas Stolcke
stolcke at icsi.berkeley.edu
Tue Jun 18 16:44:55 PDT 2013
On 6/18/2013 3:39 PM, Sander Maijers wrote:
> On 16-06-13 21:13, Andreas Stolcke wrote:
>> On 6/15/2013 12:39 PM, Sander Maijers wrote:
>>> In the case of an LM created with '-skip', what is the meaning of the
>>> values past "\end\"?
>>>
>>> They are of the form:
>>>
>>> a-team 0.5
>>> a-teens 0.5
>>> a-test 0.5
>>
>> These are the skip probabilities estimated by the model. 0.5 is the
>> default initial value, but after doing the EM estimation each word would
>> have its individual probability of being skipped in the computation of
>> condition probabilities. With the above values you would get
>>
>> P(w | a b "a-team" ) = 0.5 P'(w | a b) + 0.5 P'(w | a b "a-team" )
>>
>> and so on for all words. Here P' is the probability as determined by a
>> standard n-gram LM.
>> Note: "a-team" is the word right before the word being predicted (w).
>>
>>>
>>>
>>> I do not understand their relation to these 'ngram-count' parameters:
>>>
>>> -init-lm lmfile
>>> Load an LM to initialize the parameters of the skip-N-gram.
>> As it says, you can start the estimation process with a preexisting set
>> of parameters, read from a model file "lmfile".
>>
>>> -skip-init value
>>> The initial skip probability for all words.
>> Alternatively, you can initialize all skip probabilities to the same
>> fixed value.
>>> -em-iters n
>>> The maximum number of EM iterations.
>>> -em-delta d
>>> The convergence criterion for EM: if the relative change in log
>>> likelihood falls below the given value, iteration stops.
>> These are just standard parameters for an EM-type algorithm.
>>
>> Andreas
>>
>
> 1. Can only the first preceding word ("a-team") be skipped in this
> kind of skip LM? I first believed all history words could be skipped,
> except for the very last (most distant from w_n), but now I am not
> sure anymore.
No, to keep it simple, the current implementation only considers
skipping the word immediately preceding the word being predicted.
> 2. In this case, what kind of smoothing goes on under the hood of P'?
> I have created my skip LM with the following parameters to 'ngram-count':
> -vocab %s -prune %s -skip -debug 1 -order 3 -text %s -sort -lm %s
> -limit-vocab -tolower
> does that also incorporate backoff and Good-Turing discounting like it
> would without '-skip'?
Yes, the underlying estimation algorithm (the M-step of the EM
algorithm) is a standard backoff ngram estimation.
The only thing that's nonstandard is that the ngram counts going into
the estimation are fractional counts, as computed in the E-step.
Therefore, the same limitations as triggered by the ngram-count
-float-counts option apply. Mainly, you can use only certain
discounting methods, those that can deal with fractional counts. In
particular, the methods based on counts-of-counts are out, so no GT or
KN discounting. You should get an error message if you try to use them.
Andreas
>
> _______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://www.speech.sri.com/mailman/listinfo/srilm-user
More information about the SRILM-User
mailing list