[SRILM User List] nan in language model

Rico Sennrich rico.sennrich at gmx.ch
Mon Mar 12 06:10:00 PDT 2012


Hi list,

Occasionally, I get 'nan' as probability or backoff weight in LMs
trained with SRILM. This is not expected in an ARPA file and eventually
leads to crashes / undefined behaviour in other programs that use the
model.

Here's some statistics:

\data\
ngram 1=2054819
ngram 2=40441708
ngram 3=187680929
ngram 4=382878635
ngram 5=519867931

probability nan:
1 0
2 0
3 0
4 0
5 1233183

backoff nan:
1 0
2 0
3 0
4 415865
5 0


Here's the training parameters:

make-batch-counts file-list.txt 10 cat /wrk/smt/tmp -order 5

make-big-lm -kndiscount -interpolate -order 5 -read \
tmp/file-list.txt-1.ngrams.gz -unk -lm hugelm.gz

This happened with SRILM 1.5.9 and 1.6.0-beta, and stderr didn't show
any errors/warnings.

best wishes,
Rico




More information about the SRILM-User mailing list