[SRILM User List] ngram-count's ARPA N-gram LM extensions beyond "\end\" marker
Andreas Stolcke
stolcke at icsi.berkeley.edu
Sun Jun 16 12:13:00 PDT 2013
On 6/15/2013 12:39 PM, Sander Maijers wrote:
> In the case of an LM created with '-skip', what is the meaning of the
> values past "\end\"?
>
> They are of the form:
>
> a-team 0.5
> a-teens 0.5
> a-test 0.5
These are the skip probabilities estimated by the model. 0.5 is the
default initial value, but after doing the EM estimation each word would
have its individual probability of being skipped in the computation of
condition probabilities. With the above values you would get
P(w | a b "a-team" ) = 0.5 P'(w | a b) + 0.5 P'(w | a b "a-team" )
and so on for all words. Here P' is the probability as determined by a
standard n-gram LM.
Note: "a-team" is the word right before the word being predicted (w).
>
>
> I do not understand their relation to these 'ngram-count' parameters:
>
> -init-lm lmfile
> Load an LM to initialize the parameters of the skip-N-gram.
As it says, you can start the estimation process with a preexisting set
of parameters, read from a model file "lmfile".
> -skip-init value
> The initial skip probability for all words.
Alternatively, you can initialize all skip probabilities to the same
fixed value.
> -em-iters n
> The maximum number of EM iterations.
> -em-delta d
> The convergence criterion for EM: if the relative change in log
> likelihood falls below the given value, iteration stops.
These are just standard parameters for an EM-type algorithm.
Andreas
More information about the SRILM-User
mailing list