[SRILM User List] Generate new model with existed model and text
Andreas Stolcke
stolcke at icsi.berkeley.edu
Wed Aug 9 13:55:51 PDT 2017
On 8/2/2017 12:41 AM, 徐 wrote:
> Hi,
> I trained a LM model, then my boss give me some text and tell me
> Strengthen the probability of ngram in these texts, what i used to do
> is generate the count from the text and merge them with old ngram
> count, then retrain a model. Is there some command or method to do
> this faster?
Combining the counts of your main training data with those from the
adaptation data is one approach. There is no shortcut for this: you
have to actually combine the counts (which you can do by just cat'ing
the two files together), then train a new model.
The other approach is to train a separate model on the adaptation data,
then interpolate that model with the base model. This is usually more
convenient because (1) you process the training data for the base model
only once and (2) you can control the influence of the adaptation data
by changing the weight of the models in adaptation.
To interpolate two ngram models use
ngram -order N -lm BASEMODEL -mix-lm NEWMODEL -lambda
WEIGHT -write-lm ADAPTEDMODEL
WEIGHT is the weight of the BASEMODEL, typically something close to 1,
like 0.9, assuming the adaptation data is small compared to the main
training corpus.
For a comparison of the two LM adaptation approaches and more background
see http://www.sciencedirect.com/science/article/pii/S0167639303001055 .
Make sure you are not adapting on the test data that you use to get a
realistic performance estimate. Otherwise your result with be overly
optimistic and your boss will be disappointed later ;-)
Andreas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.speech.sri.com/pipermail/srilm-user/attachments/20170809/e397f99c/attachment.html>
More information about the SRILM-User
mailing list