From wen.wang at sri.com Thu Nov 10 17:32:08 2016 From: wen.wang at sri.com (Wen Wang) Date: Thu, 10 Nov 2016 17:32:08 -0800 Subject: [Srilm-announce] [SRILM-Announce] SRILM 1.7.2 released Message-ID: <7e785062-61d4-195d-fea8-83c6e71692ab@sri.com> The latest version of SRILM, 1.7.2, is now available from http://www.speech.sri.com/projects/srilm/download.html A list of changes appears below. Functionality: * Added interfaces to Lattice and WordMesh that allows external programs to map sausage nodes to their original lattice nodes. * New VocabDistance subclass StemDistance, comparing words only based on their stems. * New lattice-tool option -stem-dist triggers StemDistance use in confusion network alignments, including -add-hyps and -add-refs processing. * Add optional support for keyword spotting (in Lattice.h and LatticeIndex.cc) when writing a 1-gram index. * Added new File field NBestOptions::nbestRttm2, if it exists then write (an approximation to) the NBestList2.0 format output. * Added simple Trellis pruning based on relative thresholding of forward probabilities (Trellis::prune()). * make-big-lm now understands the -ukndiscount option. The make-kn-discounts helper script has an option to compute unmodified KN discounts. * The -version option now reports the compiler version used. * Added ngram-count -write-text option to test conversion of UTF-16 files to ASCII/UTF-8. * Added ngram -text-has-weights option to allow weighting sentences in ppl computation. * Added scripts nbest-words and compute-sclite-nbest for conveniently computing nbest-optimize -errors information using sclite. * Added the nbest-optimize -xval-files option to support cross-validation. * Added script search-rover-combo for searching for best combination among a list of systems. * Added confidence value fields to NBestWordInfo class. * Added check to compute-best-mix to warn about word label mismatches between input files. Portability: * Honor TMPDIR environment variable in various scripts. * Miscellaneous MacosX fixes. * Include BSD rand48 functions so that random sentence generation gives same result on all platforms. Bug fixes: * Avoid leaky backoff by mapping very small probability sums to 0 in BOW computation. Otherwise unseen ngrams may end up with nonzero probabilities in unsmoothed LMs. * Fixed compare-ppls compute-best-mix compute-best-sentence-mix ppl-from-log to recognize the MSVC representation of -infinity. * Fixed a bug in the handling of zero prefix probabilities in ClassNgram, HiddenNgram and HMMofNgrams. * Fixed a memory allocation bug that caused the ngram-count-maxent test to crash. * Fixes to lattice-tool rttm nbest output. * Fix for possible endless loop in lattice-tool -posterior-prune due to limited float precision (from Seppo Enarvi). * Fixed a problem with declaration of Map_nokeyP() that takes reference arguments and were missing "const"; was causing crash in segment tool. * Workaround for what looks like an optimizer bug in gcc >= 4.9 that can cause ngram -prune to core dump. * Output TextStats quantities (sentence/word counts, log probs, perplexities), model parameters, nbest and lattices scores, and other quantities with full precision so as to avoid loss of information. * nbest-optimize -1best now outputs a rover-control file that simulates Viterbi decoding (by using a small posterior scale). * nbest-optimize -errors now tolerates varying number of reference words for the same sentence. This can arise from sclite references with alternate words strings. * Fixed a stupid bug in uniform-classes.gawk script. * Allow combine-rover-controls to merge control files with the same systems in them, adding their weights. * Updated zlib to version 1.2.8. This fixes a bug whereby gzipped output files could end up with zero size (instead of a legal gzipped file that results in a zero-length file when decompressed). Cheers, Wen