[SRILM User List] A problem with ngram-count with option "-text-has-weights"

Andreas Stolcke stolcke at speech.sri.com
Tue Mar 23 12:21:41 PDT 2010

On 3/22/2010 11:33 PM, tuzhaopeng wrote:
> Hi  People,
> I meet a problem when I train a language model with option 
> "-text-has-weights".

> Then I went to look for more information on Internet, and found that 
> for the option "-float-counts", only certain discounting
> methods support non-integer counts (wbdiscount and cdiscount). So I 
> use the wb-discount with the command:
> *./ngram-count -text-has-weights test -order 3 -lm test.o3.lm.gz -float-counts -unk -wbdiscount -debug 3*

The problem here is

1) you forgot the -text option before your filename.   -text-has-weights 
is a switch that itself doesn't take an argument.
2) With fractional counts the default minimum counts for retaining 
ngrams in the LM still apply.  So you might want to add these options to 
ensure that all your ngrams end up in the model:

         -gt1min 0 -gt2min 0 -gt3min 0

FYI, the default values are :

         -gt1min 1 -g2min 1 -gt3min 2


> and the output information is:
> using WittenBell for 1-grams
> using WittenBell for 2-grams
> using WittenBell for 3-grams
> warning: distributing 1 left-over probability mass over 2 zeroton words
> writing 3 1-grams
> writing 0 2-grams
> writing 0 3-grams

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20100323/01592e31/attachment.html>

More information about the SRILM-User mailing list