inquiry about SRI Toolkit

Anand Venkataraman anand at speech.sri.com
Tue Jul 16 12:55:34 PDT 2002


Man page for compute-mixed-logprob:

compute-mixed-logprob computes the log  probability  of  a  given
corpus  of  text  according to the best mixture of the given com-
ponent language models.  The interpolation is done  fairly.  That
is, the given corpus is split into two sets (with alternate lines
belonging to different sets) and  the  mixture  coefficients  for
each  set are those computed using EM on the other set.  Upto six
language models may be specified on the command  line  using  the
-lm flag.  If the splitting of the corpus into two sets by alter-
nate line order is not the method desired, the  user  may  expli-
citly  specify two sets on the command line using -sets set1 set2
instead of giving a single -text corpus  option.   The  -lm-flags
option  may  be  given  to supply additional options passed on to
ngram  during  perplexity  calculations,  for  instance,  if  the
language  models are class language models and a class file needs
to be specified with -classes  classfile.  Language  model  ngram
orders  may  also  likewise be passed on to ngram using -lm-flags
'-order n'. All such options that are to be passed to ngram  must
be quoted and passed to compute-mixed-logprob as a single option.
However, note that the supplied ngram options will  be  used  for
all the language models specified.

Further, the -expt exptID option may be used to specify the  pre-
fix  used  for  all  ancillary files created by the program.  The
exptID may include a path and any  missing  directories  in  this
path will be created.

Final output will include the ngram outputs for each separate set
and  a  combined output in the same format for both sets.  A log-
file of the procedure is produced in exptID.log

Examples:

compute-mixed-logprob   -expt   001/mix   -text   swbd.txt    -lm
swbd.4bo.gz  -lm  bn.3bo.gz  -lm  ch.3bo.gz  -lm-flags  "-order 4
-classes train400.classes"

compute-mixed-logprob -expt  001/mix  -sets  swbd-set1.txt  swbd-
set2.txt  -lm  swbd.4bo.gz  -lm bn.3bo.gz -lm ch.3bo.gz -lm-flags
"-order 4 -classes train.400classes"

&



More information about the SRILM-User mailing list