Ask about the practical usage of SRILM for Machine Translation

Thu Aug 3 10:34:23 PDT 2006

In message <44D221E8.4050406 at idiap.ch>you wrote:
> Andreas Stolcke wrote:
> > In message <44D1E6CD.8070606 at idiap.ch>you wrote:
> >   
> >> Hi every one
> >>
> >> This question is for SRILM - 1.4.1
> >>     
> >
> > Before anything else, please get the lastest version (1.5.0) and see if 
> > it solves your problems.
> >
> > --Andreas 
> >
> >   
> Thank you Andreas,
> My 3rd question will be checked once I run on 1.5.0, but these 2 
> questions are version-independent:
> 
> 1. Which is the state-of-the-art combination of several options 
> currently available with ngram-count I should use.

-kndiscount -interpolate

> 2. How many words per parameter should I use . (Joshua Goodman on his 
> tutorial research.microsoft.com/~joshuago/lm-tutorial-v7-handouts.ps 
> recommend the ratio between Number of words/Number of parameters to be 
> greater than 100 or 1000) .

I'm not sure I agree with Josh's rule if it means reducing the 
size of the model simple based on the total number of ngrams in it.

By reducing the number of parameters (pruning ngrams from the model,
or having a higher minumum count) you are not improving the estimates
of the parameters that remain.  So this is different from other types
of models where there is set of parameters that is shared among all the 
data.  If you can afford it you should use all the ngrams in your data in
your model.  When in doubt, try different settings on held-out data
and "cross-validate" your choices.

If you are using class-based models then you do share parameters between
different ngrams and then a rule of the sort Josh suggested makes sense.

--Andreas