[SRILM User List] counts in ngram-count output
shinichiro.hamada
shinichiro.hamada at gmail.com
Thu Jul 19 16:47:19 PDT 2012
Hi, I have a question if my outputs of ngram-count are correct or not.
I made a fractional word-count file by my own program and executed
ngram-count command with wb discount. The header of outputs were
bellow:
--------------------------
[4gram wb float-count]
ngram-count -read countfile_float -float-counts -order 4 -lm outfile \
-wbdiscount -wbdiscount1 -wbdiscount2 -wbdiscount3
ngram 1=780387
ngram 2=20321
ngram 3=2692
ngram 4=2622
..
--------------------------
I thought higher order models have always more counts than lower
order ones, but the above result wasn't so. Does this result
designate that my word-count file has bug?
----------------------------------------------------------------------
For further investigation, I made a integer word-count file by
scaling and truncating (I know it is inappropriate approximation)
and executed ngram-count with other discount methods. But higher
order models doesn't have always more counts than lower order ones
in this result too.
--------------------------
[4gram none int-count]
ngram-count -read countfile_int -order 3 -lm outfile \
-gt1min 0 -gt1max 0 -gt2min 0 -gt2max 0 -gt3min 0 -gt3max 0
ngram 1=780387
ngram 2=871835
ngram 3=1310979
ngram 4=1038980
--------------------------
[4gram gt int-count]
ngram-count -read countfile_int -order 3 -lm outfile \
ngram 1=780387
ngram 2=871835
ngram 3=1170462
ngram 4=1038980
--------------------------
[4gram natural int-count]
ngram-count -read countfile_int -order 3 -lm outfile \
-ndiscount -ndiscount1 -ndiscount2 -ndiscount3
ngram 1=780387
ngram 2=871835
ngram 3=1170339
ngram 4=1038858
Any advices will help me very much. Thank you in advance.
--
Shincihiro Hamada
More information about the SRILM-User
mailing list