Interpolation vs ngram-merge

Mon Jun 11 12:11:36 PDT 2007

I have experience with training LMs on huge data
(hundreds millions wordfors). If this is the case for
you it can be actually be more efficient (or even
possible at all) to interpolate trained LMs, than join
the counts and train (due to time and memory
expenses).
Moreover, it allows to give models different weights
and tune those according to perplexity results on some
test data if the "target speech" for recognition is
already known. 

--- marco turchi <marco.turchi at gmail.com> wrote:

> Dear experts,
> i have a question for u.
> I have two dataset, and I want to construct a LM
> that contains both the dataset.
> srilm provides me two different paths:
> 1)to create 2 different LMs and then  interpolate
> them
> 2)to count the n-gram for each dataset, merge these
> counts using
> ngram-merge, and at the end construct the final LM.
> which are the differences of these methods?
> Can u suggest me a paper or book where I can
> understand these differences?
> 
> Thanks a lot
> Marco
> 

best regards,
Ilya

      ___________________________________________________________ 
Yahoo! Mail is the world's favourite email. Don't settle for less, sign up for
your free account today http://uk.rd.yahoo.com/evt=44106/*http://uk.docs.yahoo.com/mail/winter07.html