[SRILM User List] SRILM 1.7.3 released
Victor Abrash
victor.abrash at sri.com
Tue Sep 17 15:34:06 PDT 2019
The latest version of SRILM, 1.7.3, is now available from
http://www.speech.sri.com/projects/srilm/download.html
A list of changes appears below
Functionality
* Added nbest-oov-counts script to generate OOV counts for nbest hypotheses.
* Added a simple mechanism for weight tying in nbest-rover control files. A system weight of = indicates that it should be tied to the previously listed system. This is useful for reducing the number of free parameters when searching for good system combinations (search-rover-combo).
* Add Map_noKey() and Map_noKeyP() for unsigned long long type, to enable use with size_t on Windows MSVC.
* Output from -version now includes compile-time options.
* Added option ngram -minbackoff to fix up models that have unnormalized probabilities or that are not smoothed.
* Added option ngram -unk-probs to override unknown word probabilities.
* Added nbest-optimize-args-from-rover-control script, convenient for extracting initialization parameters for nbest-optimize from existing nbest-rover control file.
* Added ngram-count -text-has-weights-last option to allow text input with count values at ends of lines.
* Added nbest-rover -missing-nbest option to treat missing nbest lists as if an empty hypothesis (no words) had been output, rather than simply skipping that nbest list.
* Added nbest-lattice -time-penalty option, implementing a soft constraint on time stamps (when present) during confusion network building and alignment.
* Added nbest-lattice -average-times option, to average word times instead of picking the timing of the highest posterior hypothesis.
* Added nbest-lattice -suppress-vocab option to disallow certain words in posterior decoding.
* New scripts concat-sausages for chaining word confusion networks together.
* Added nbest-lattice -dump-lattice-alignments option to output mappings between sausage positions and alignment costs.
* Updated Android build for 64-bit development for armv8 using NDK r20 and clang. This almost certainly breaks the 32-bit build for armv7. The last known good 32-bit build is in common/Makefile.core.android.r11c, last built using NDK r11c. To use this, copy Makefile.core.android.r11c to Makefile.core.android. See doc/README.android.
Bug fixes:
* Added a new tool nbest-rover-helper that combines the functions of the combine-acoustic-scores and nbest-posteriors scripts, doing these computations in double precision and faster. nbest-rover now uses this tool (except when certain options like -nbest-backtrace are used).
* nbest-rover strips DOS end-of-line CR characters from the control file, so they no longer mess up the parsing of the file.
* Rationalize the way ties are broken when decoding word confusion networks. The word with the lowest internal index is now preferred (and the *DELETEtoken always comes before all other words), unless the new nbest-lattice option -random-tie-break is given. The output order of alternative word hypotheses to sausage files is always by probability rank first, then by internal index.
* The reverse-ngram-counts script now replaces <s> with </s> and vice-versa, as required for training reverse-direction LMs, and consistent with reverse-text.
* Handle comment lines starting with '##' and empty lines in nbest-rover control files the same way as in File::getline(), i.e., ignore them.
* Fixed the syntax for the nbest-optimize -dynamic-random-series options (now starts with single dash, as described in man page).
* Don't let compute-best-mix complain about word mismatches if <unk> is involved.
* Cast input to isspace() to (unsigned char) to guarantee input is non-negative.
* Fixed memory management problems in MEModel.
* Work around a bug in zlib's gzprintf() printing of very long %s arguments; was causing long word strings not to be output into .gz files.
* Removed word string length limit.
* Removed limit on total line length in outputting ngram count files.
* Zlib updated to version 1.2.11.
* nbest-posteriors ensures that bytelog scores are output in fixed-point format.
* Allow floating point values when parsing bytelog scores in nbest lists.
* Most robustness to word sausages input files that have missing data for some position.
* Fixed a performance bug when nbest-rover is invoked with -output-ctm option.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.speech.sri.com/pipermail/srilm-user/attachments/20190917/c4212b0a/attachment.html>
More information about the SRILM-User
mailing list