SRILM 1.5.8 released
Andreas Stolcke
stolcke at speech.sri.com
Sun May 10 11:09:29 PDT 2009
The latest version of SRILM is now avialable from
http://www.speech.sri.com/projects/srilm/download.html .
A list of changes appears below.
Enjoy,
Andreas
1.5.8 10 May 2009
Functionality:
* merge-batch-counts -float-counts option for merging of fractional
counts.
* compare-sclite now includes statistical significance computation
based on a matched-pair Sign test.
* Added a Perl tool to compute the cumulative binomial distribution,
contributed by Brett Kessler and David Gelbart.
* Don't output LM server banner message for ngram -use-server -debug 0.
* The LM::generateSentence() function now takes option argument to
specify sentence prefix that is to be used to condition subsequent
word generation (suggested by Alexy Khrabrov). The default is to
condition on <s> as before, or an empty context if no start-of-sentence
tag is defined.
* A new option ngram -gen-prefixes to read conditioning prefixes
from a file, and generate random sentences based on them.
* New options in nbest-optimize that modify -print-hyps output so that
only unique hypotheses are included (-print-unique-hyps), and to print
the original ranks of hypotheses (-print-old-ranks) (from Jing Zheng).
* The -version option reports whether support for compressed files
is available.
* Added merge-batch-count -l option to control how many files to merge
in each iteration.
Bug fixes:
* ngram-count, NgramLM: disable the Doug Paul smoothing hack (add one
to denominator when smoothing results in 0 backoff mass) in contexts
where the entire vocabulary has been observed.
* nbest-optimize fixes to the -minimum-bleu-reference functionality
(from Jing Zheng).
* Fixed nbest-optimize bug that was causing incorrect log output with
gcc 4.x.
* Output vocabulary index map in binary ngram count and LM format
in numerical index order. This avoids a performance bug whereby
reading the data structures back into _c binary version could take
a long time due to inefficient insertion order.
* Fix ngram -counts with -use-server (from Ergun Bicici).
* Fixed memory allocation bug in FLM tag vocabulary handling that could
lead to crash when interpolating several FLMs.
* Rewrote make-batch-counts scripts to
- avoid problems with limits on command line length
- support systems that don't have compressed file I/O.
* Modified merge-batch-counts script to
- ensure that unmerged files are always merged in the next iteration,
to avoid file size imbalance (suggested by Alex Marin)
- support systems that don't have compressed file I/O.
* Fixed a portability issue with Intel icc version 7.0.
* compute-sclite fixed to invoke csrfilt.sh script with -t option.
More information about the SRILM-User
mailing list