[SRILM User List] A problem with ngram-count with option "-text-has-weights"
tuzhaopeng
tuzhaopeng at ict.ac.cn
Mon Mar 22 23:33:29 PDT 2010
Hi People,
I meet a problem when I train a language model with option "-text-has-weights".
The input text with fraction count is as below:
======================================
1 china_H today
1 on_H
1 smuggling_H scale
0.000283545 less_H or
1 under_H
1 's_H last year
0.202422 more_H
0.000283545 more_H
1 less_H more or
1 brought_H
1 crackdown_H the
1.41754e-05 smuggling_H large - scale
1 of_H
0.105263 less_H more or
0.0021736 brought_H more
1.02756e-05 brought_H less
0.202422 been_H
1 been_H
0.105263 been_H
0.0021736 been_H
=======================================
The fraction count and sentence are separated by space.
And when we use the kn-discount, it went wrong, the command is:
./ngram-count -text-has-weights test -order 3 -lm test.o3.lm.gz -float-counts -unk -kndiscount
and the error message is:
error in discount estimator for order 1
Then I went to look for more information on Internet, and found that for the option "-float-counts", only certain discounting methods support non-integer counts (wbdiscount and cdiscount). So I use the wb-discount with the command:
./ngram-count -text-has-weights test -order 3 -lm test.o3.lm.gz -float-counts -unk -wbdiscount -debug 3
and the output information is:
using WittenBell for 1-grams
using WittenBell for 2-grams
using WittenBell for 3-grams
warning: distributing 1 left-over probability mass over 2 zeroton words
writing 3 1-grams
writing 0 2-grams
writing 0 3-grams
It seems that everything goes well, however, in the lm file, there is only:
\data\
ngram 1=3
ngram 2=0
ngram 3=0
\1-grams:
-0.30103 </s>
-99 <s>
-0.30103 <unk>
\2-grams:
\3-grams:
\end\
So what is the problem? Is there something wrong with the input file or the command line?
Thanks and Regards
Tu Zhaopeng
2010-03-23
---------------------------------------------------
Tu Zhaopeng
Institute of Computing Technology,
Chinese Academy of Sciences
http://nlp.ict.ac.cn/~tuzhaopeng/
---------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20100323/d865c3f1/attachment-0002.html>
More information about the SRILM-User
mailing list