[SRILM User List] A problem with ngram-count with option "-text-has-weights"
Andreas Stolcke
stolcke at speech.sri.com
Tue Mar 23 12:21:41 PDT 2010
On 3/22/2010 11:33 PM, tuzhaopeng wrote:
> Hi People,
> I meet a problem when I train a language model with option
> "-text-has-weights".
>
> Then I went to look for more information on Internet, and found that
> for the option "-float-counts", only certain discounting
correct.
> methods support non-integer counts (wbdiscount and cdiscount). So I
> use the wb-discount with the command:
> *./ngram-count -text-has-weights test -order 3 -lm test.o3.lm.gz -float-counts -unk -wbdiscount -debug 3*
The problem here is
1) you forgot the -text option before your filename. -text-has-weights
is a switch that itself doesn't take an argument.
2) With fractional counts the default minimum counts for retaining
ngrams in the LM still apply. So you might want to add these options to
ensure that all your ngrams end up in the model:
-gt1min 0 -gt2min 0 -gt3min 0
FYI, the default values are :
-gt1min 1 -g2min 1 -gt3min 2
Andreas
> and the output information is:
> using WittenBell for 1-grams
> using WittenBell for 2-grams
> using WittenBell for 3-grams
> warning: distributing 1 left-over probability mass over 2 zeroton words
> writing 3 1-grams
> writing 0 2-grams
> writing 0 3-grams
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20100323/01592e31/attachment.html>
More information about the SRILM-User
mailing list