[SRILM User List] LM whose counts are multiplied

shinichiro.hamada shinichiro.hamada at gmail.com
Wed Feb 1 07:01:52 PST 2012

Hello, all.

I want to make a language model with data which have fraction counts. But
not all smoothing method can handle them, so I'll try to multiply each count
by 10 and make it integer by rounding.

I did a preliminary experiment.

* count-file with integers : a.count
* the file whose counts are multiplied by 10 : b.count

ngram-count -read a.count -order 3 -lm a.lm -wbdiscount -wbdiscount1
-wbdiscount2 -wbdiscount3 -interpolate
ngram-count -read b.count -order 3 -lm b.lm -wbdiscount -wbdiscount1
-wbdiscount2 -wbdiscount3 -interpolate

I expected same language models are generated, but they differed. Why?
Followings are their heading parts.

ngram 1=1055
ngram 2=2240
ngram 3=87

\1-grams: ..

ngram 1=1055
ngram 2=2240
ngram 3=2548

\1-grams: ..

More information about the SRILM-User mailing list