SRILM help needed
Andreas Stolcke
stolcke at speech.sri.com
Sat Apr 6 14:18:58 PST 2002
Zhu,
the default smoothing algorithm in ngram-count is Good-Turing.
The default parameters (as displayed by ngram-count -help) are:
-gt1min: lower 1gram discounting cutoff
Default value: 1
-gt1max: upper 1gram discounting cutoff
Default value: 1
-gt2min: lower 2gram discounting cutoff
Default value: 1
-gt2max: upper 2gram discounting cutoff
Default value: 7
-gt3min: lower 3gram discounting cutoff
Default value: 2
-gt3max: upper 3gram discounting cutoff
Default value: 7
-gt4min: lower 4gram discounting cutoff
Default value: 2
-gt4max: upper 4gram discounting cutoff
Default value: 7
-gt5min: lower 5gram discounting cutoff
Default value: 2
-gt5max: upper 5gram discounting cutoff
Default value: 7
-gt6min: lower 6gram discounting cutoff
Default value: 2
-gt6max: upper 6gram discounting cutoff
Default value: 7
So all unigram and bigrams are kept, but singleton ngrams of higher orders
are discarded (which is a pretty standard choice).
I'm not sure I understand your question about hidden-ngram.
It doesn't use any "cut-offs". Cut-offs apply in N-gram model
training, hidden-ngram only uses the model as it is produced by
ngram-count (or some other program).
--Andreas
PS. Your message to srilm-user didn't make it to the list because you are
not a subscriber. As way to control junk mail, only subscribers can post
to the list. To join, send a message containing "subscribe srilm-user"
to majordomo at speech.sri.com.
------- Forwarded Message
Date: Thu, 4 Apr 2002 20:55:13 -0500 (EST)
From: Zhu Zhang <zhuzhang at umich.edu>
X-X-Sender: zhuzhang at mspacman.gpcc.itd.umich.edu
To: srilm-user at speech.sri.com
Subject: SRILM help needed
Message-ID: <Pine.SOL.4.44.0204042045410.16911-100000 at mspacman.gpcc.itd.umich.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Content-Length: 313
Hi,
Could anybody provide the following info about SRILM, which doesn't seem
to be very clear from the documentation:
- - What is the defaul smoothing algorithm for ngram-count?
- - what are the smoothing parameters?
- - In hidden-ngram, what are the event cut-off frequencies?
Thanks in advance for any help!
------- End of Forwarded Message
More information about the SRILM-User
mailing list