[SRILM User List] Generate new model with existed model and text

Wed Aug 9 13:55:51 PDT 2017

On 8/2/2017 12:41 AM, 徐 wrote:
> Hi，
>     I trained a LM model, then my boss give me some text and tell me 
>  Strengthen the probability of ngram in these texts, what i used to do 
> is generate the count from the text and merge them with old ngram 
> count, then retrain a model. Is there some command or method to do 
> this faster?

Combining the counts of your main training data with those from the 
adaptation data is one approach.  There is no shortcut for this: you 
have to actually combine the counts (which you can do by just cat'ing 
the two files together), then train a new model.

The other approach is to train a separate model on the adaptation data, 
then interpolate that model with the base model.  This is usually more 
convenient because (1) you process the training data for the base model 
only once and (2) you can control the influence of the adaptation data 
by changing the weight of the models in adaptation.

To interpolate two ngram models use

                 ngram -order N -lm BASEMODEL -mix-lm NEWMODEL -lambda 
WEIGHT -write-lm ADAPTEDMODEL

WEIGHT is the weight of the BASEMODEL, typically something close to 1, 
like 0.9, assuming the adaptation data is small compared to the main 
training corpus.

For a comparison of the two LM adaptation approaches and more background 
see http://www.sciencedirect.com/science/article/pii/S0167639303001055 .

Make sure you are not adapting on the test data that you use to get a 
realistic performance estimate.  Otherwise your result with be overly 
optimistic and your boss will be disappointed later ;-)

Andreas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.speech.sri.com/pipermail/srilm-user/attachments/20170809/e397f99c/attachment.html>