[SRILM User List] counts in ngram-count output

shinichiro.hamada shinichiro.hamada at gmail.com
Thu Jul 19 16:47:19 PDT 2012


Hi, I have a question if my outputs of ngram-count are correct or not.

I made a fractional word-count file by my own program and executed 
ngram-count command with wb discount. The header of outputs were 
bellow:

--------------------------
[4gram wb float-count]
ngram-count -read countfile_float -float-counts -order 4 -lm outfile \
 -wbdiscount -wbdiscount1 -wbdiscount2 -wbdiscount3

ngram 1=780387
ngram 2=20321
ngram 3=2692
ngram 4=2622
..
--------------------------

I thought higher order models have always more counts than lower 
order ones, but the above result wasn't so. Does this result 
designate that my word-count file has bug?


----------------------------------------------------------------------
For further investigation, I made a integer word-count file by 
scaling and truncating (I know it is inappropriate approximation) 
and executed ngram-count with other discount methods. But higher 
order models doesn't have always more counts than lower order ones 
in this result too.

--------------------------
[4gram none int-count]
ngram-count -read countfile_int -order 3 -lm outfile \
 -gt1min 0 -gt1max 0 -gt2min 0 -gt2max 0 -gt3min 0 -gt3max 0

ngram 1=780387
ngram 2=871835
ngram 3=1310979
ngram 4=1038980

--------------------------
[4gram gt int-count]
ngram-count -read countfile_int -order 3 -lm outfile \

ngram 1=780387
ngram 2=871835
ngram 3=1170462
ngram 4=1038980

--------------------------
[4gram natural int-count]
ngram-count -read countfile_int -order 3 -lm outfile \
 -ndiscount -ndiscount1 -ndiscount2 -ndiscount3

ngram 1=780387
ngram 2=871835
ngram 3=1170339
ngram 4=1038858



Any advices will help me very much. Thank you in advance.

--
Shincihiro Hamada



More information about the SRILM-User mailing list