SRILM help needed

Andreas Stolcke stolcke at speech.sri.com
Sat Apr 6 14:18:58 PST 2002


Zhu,

the default smoothing algorithm in ngram-count is Good-Turing.
The default parameters (as displayed by ngram-count -help) are:

 -gt1min:       lower 1gram discounting cutoff
                Default value: 1
 -gt1max:       upper 1gram discounting cutoff
                Default value: 1
 -gt2min:       lower 2gram discounting cutoff
                Default value: 1
 -gt2max:       upper 2gram discounting cutoff
                Default value: 7
 -gt3min:       lower 3gram discounting cutoff
                Default value: 2
 -gt3max:       upper 3gram discounting cutoff
                Default value: 7
 -gt4min:       lower 4gram discounting cutoff
                Default value: 2
 -gt4max:       upper 4gram discounting cutoff
                Default value: 7
 -gt5min:       lower 5gram discounting cutoff
                Default value: 2
 -gt5max:       upper 5gram discounting cutoff
                Default value: 7
 -gt6min:       lower 6gram discounting cutoff
                Default value: 2
 -gt6max:       upper 6gram discounting cutoff
                Default value: 7

So all unigram and bigrams are kept, but singleton ngrams of higher orders
are discarded (which is a pretty standard choice).

I'm not sure I understand your question about hidden-ngram.
It doesn't use any "cut-offs".   Cut-offs apply in N-gram model
training, hidden-ngram only uses the model as it is produced by 
ngram-count (or some other program).

--Andreas

PS.  Your message to srilm-user didn't make it to the list because you are
not a subscriber.  As way to control junk mail, only subscribers can post
to the list.  To join, send a message containing "subscribe srilm-user"
to majordomo at speech.sri.com.

------- Forwarded Message

Date: Thu, 4 Apr 2002 20:55:13 -0500 (EST)
From: Zhu Zhang <zhuzhang at umich.edu>
X-X-Sender: zhuzhang at mspacman.gpcc.itd.umich.edu
To: srilm-user at speech.sri.com
Subject: SRILM help needed
Message-ID: <Pine.SOL.4.44.0204042045410.16911-100000 at mspacman.gpcc.itd.umich.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Content-Length: 313


Hi,

Could anybody provide the following info about SRILM, which doesn't seem
to be very clear from the documentation:

- -  What is the defaul smoothing algorithm for ngram-count?
- -  what are the smoothing parameters?
- -  In hidden-ngram, what are the event cut-off frequencies?

Thanks in advance for any help!

------- End of Forwarded Message




More information about the SRILM-User mailing list