Where have all the 3-grams gone?
Andreas Stolcke
stolcke at speech.sri.com
Tue Mar 18 14:43:42 PST 2003
In message <Pine.LNX.4.44.0303182241510.28027-100000 at linux14.phonetik.uni-muenc
hen.de>you wrote:
> Hi Andreas,
>
> experimenting a little with SRILM, I found that ngram-count does not enter
> trigrams into the language model, that occur only once, while it does so
> with bigrams. The command
>
> echo "the man hit the ball" | ngram-count -order 3 -text - -cdiscount3 0.5
> -cdiscount2 0.5 -cdiscount1 0.5 -unk -lm test_C3gram.lm
The default minimum counts are as follows:
1grams 1
2grams 1
3grams 2
4grams 2
You can use the -gt1min, -gt2min, etc. options to change these thresholds
at will. (Maybe counter-intuitively, these options apply to all smoothing
schemes.)
--Andreas
More information about the SRILM-User
mailing list