[SRILM User List] nan in language model
Rico Sennrich
rico.sennrich at gmx.ch
Mon Mar 12 06:10:00 PDT 2012
Hi list,
Occasionally, I get 'nan' as probability or backoff weight in LMs
trained with SRILM. This is not expected in an ARPA file and eventually
leads to crashes / undefined behaviour in other programs that use the
model.
Here's some statistics:
\data\
ngram 1=2054819
ngram 2=40441708
ngram 3=187680929
ngram 4=382878635
ngram 5=519867931
probability nan:
1 0
2 0
3 0
4 0
5 1233183
backoff nan:
1 0
2 0
3 0
4 415865
5 0
Here's the training parameters:
make-batch-counts file-list.txt 10 cat /wrk/smt/tmp -order 5
make-big-lm -kndiscount -interpolate -order 5 -read \
tmp/file-list.txt-1.ngrams.gz -unk -lm hugelm.gz
This happened with SRILM 1.5.9 and 1.6.0-beta, and stderr didn't show
any errors/warnings.
best wishes,
Rico
More information about the SRILM-User
mailing list