SRILM 1.5.8 released

Sun May 10 11:09:29 PDT 2009

The latest version of SRILM is now avialable from 
http://www.speech.sri.com/projects/srilm/download.html .

A list of changes appears below.

Enjoy,

Andreas

1.5.8   10 May 2009

        Functionality:

        * merge-batch-counts -float-counts option for merging of fractional
        counts.

        * compare-sclite now includes statistical significance computation
        based on a matched-pair Sign test.

        * Added a Perl tool to compute the cumulative binomial distribution,
        contributed by Brett Kessler and David Gelbart.

        * Don't output LM server banner message for ngram -use-server -debug 0.

        * The LM::generateSentence() function now takes option argument to
        specify sentence prefix that is to be used to condition subsequent
        word generation (suggested by Alexy Khrabrov).  The default is to
        condition on <s> as before, or an empty context if no start-of-sentence
        tag is defined.

        * A new option ngram -gen-prefixes to read conditioning prefixes
        from a file, and generate random sentences based on them.

        * New options in nbest-optimize that modify -print-hyps output so that
        only unique hypotheses are included (-print-unique-hyps), and to print
        the original ranks of hypotheses (-print-old-ranks) (from Jing Zheng).

        * The -version option reports whether support for compressed files
        is available.

       * Added merge-batch-count -l option to control how many files to merge
        in each iteration.

        Bug fixes:

        * ngram-count, NgramLM: disable the Doug Paul smoothing hack (add one
        to denominator when smoothing results in 0 backoff mass) in contexts
        where the entire vocabulary has been observed.

        * nbest-optimize fixes to the -minimum-bleu-reference functionality
        (from Jing Zheng).

        * Fixed nbest-optimize bug that was causing incorrect log output with
        gcc 4.x.

        * Output vocabulary index map in binary ngram count and LM format
        in numerical index order.  This avoids a performance bug whereby
        reading the data structures back into _c binary version could take
        a long time due to inefficient insertion order.

        * Fix ngram -counts with -use-server (from Ergun Bicici).

        * Fixed memory allocation bug in FLM tag vocabulary handling that could
        lead to crash when interpolating several FLMs.

        * Rewrote make-batch-counts scripts to
          - avoid problems with limits on command line length
          - support systems that don't have compressed file I/O.

        * Modified merge-batch-counts script to
          - ensure that unmerged files are always merged in the next iteration,
          to avoid file size imbalance (suggested by Alex Marin)
          - support systems that don't have compressed file I/O.

        * Fixed a portability issue with Intel icc version 7.0.

        * compute-sclite fixed to invoke csrfilt.sh script with -t option.