[SRILM User List] LM whose counts are multiplied
shinichiro.hamada
shinichiro.hamada at gmail.com
Wed Feb 1 07:01:52 PST 2012
Hello, all.
I want to make a language model with data which have fraction counts. But
not all smoothing method can handle them, so I'll try to multiply each count
by 10 and make it integer by rounding.
--
I did a preliminary experiment.
Files:
* count-file with integers : a.count
* the file whose counts are multiplied by 10 : b.count
Command:
ngram-count -read a.count -order 3 -lm a.lm -wbdiscount -wbdiscount1
-wbdiscount2 -wbdiscount3 -interpolate
ngram-count -read b.count -order 3 -lm b.lm -wbdiscount -wbdiscount1
-wbdiscount2 -wbdiscount3 -interpolate
I expected same language models are generated, but they differed. Why?
Followings are their heading parts.
------------------------
[a.lm]
\Data\
ngram 1=1055
ngram 2=2240
ngram 3=87
\1-grams: ..
------------------------
[b.lm]
\data\
ngram 1=1055
ngram 2=2240
ngram 3=2548
\1-grams: ..
More information about the SRILM-User
mailing list