SRILM 1.4

Andreas Stolcke stolcke at speech.sri.com
Thu Mar 4 13:35:58 PST 2004


> 
> > This would be one solution.  Use ngram-counts -read 
> > and then ngram -counts.   Just reorder the words in the N-grams to 
> > reflect the 
> > backoff order you want.
> > 
> 
> So how exactly would I reorder them supposing I wanted to do the backoff 
> as I explained earlier?  Can you just give a concrete example of 
> reordering them...?

This works only if each backoff level drops exactly one of the history
elements.  So if you want to backoff

	p(a|b,c,d) -> p(a|b,c) -> p(a|c)

you are dropping history words in the order 3 (farthest), then 1 (nearest),
then 2.
To achieve this extract N-grams (d c b a) from your data and prepare a count
file with 

	d b c a	<count>
	
For training (ngram-count) you also need to generate the lower-order counts,
ie.

	b c a	<count>
	c a	<count>
	a	<count>

For testing (ngram -counts) you only need the highest order counts.
(except at the start of sentence where the length of the N-grams is 
liminted by the <s> tag).

--Andreas 




More information about the SRILM-User mailing list