SRILM 1.4
Andreas Stolcke
stolcke at speech.sri.com
Thu Mar 4 13:35:58 PST 2004
>
> > This would be one solution. Use ngram-counts -read
> > and then ngram -counts. Just reorder the words in the N-grams to
> > reflect the
> > backoff order you want.
> >
>
> So how exactly would I reorder them supposing I wanted to do the backoff
> as I explained earlier? Can you just give a concrete example of
> reordering them...?
This works only if each backoff level drops exactly one of the history
elements. So if you want to backoff
p(a|b,c,d) -> p(a|b,c) -> p(a|c)
you are dropping history words in the order 3 (farthest), then 1 (nearest),
then 2.
To achieve this extract N-grams (d c b a) from your data and prepare a count
file with
d b c a <count>
For training (ngram-count) you also need to generate the lower-order counts,
ie.
b c a <count>
c a <count>
a <count>
For testing (ngram -counts) you only need the highest order counts.
(except at the start of sentence where the length of the N-grams is
liminted by the <s> tag).
--Andreas
More information about the SRILM-User
mailing list