<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">On 8/2/2017 12:41 AM, 徐 wrote:<br>
</div>
<blockquote type="cite"
cite="mid:3de486a8.951e.15da1e45479.Coremail.xulikui123321@163.com">
<div
style="line-height:1.7;color:#000000;font-size:14px;font-family:Arial">
<div>Hi,</div>
<div> I trained a LM model, then my boss give me some text
and tell me Strengthen the probability of ngram in these
texts, what i used to do is generate the count from the text
and merge them with old ngram count, then retrain a model. Is
there some command or method to do this faster?</div>
</div>
</blockquote>
<br>
Combining the counts of your main training data with those from the
adaptation data is one approach. There is no shortcut for this:
you have to actually combine the counts (which you can do by just
cat'ing the two files together), then train a new model.<br>
<br>
The other approach is to train a separate model on the adaptation
data, then interpolate that model with the base model. This is
usually more convenient because (1) you process the training data
for the base model only once and (2) you can control the influence
of the adaptation data by changing the weight of the models in
adaptation.<br>
<br>
To interpolate two ngram models use<br>
<br>
ngram -order N -lm BASEMODEL -mix-lm NEWMODEL
-lambda WEIGHT -write-lm ADAPTEDMODEL<br>
<br>
WEIGHT is the weight of the BASEMODEL, typically something close to
1, like 0.9, assuming the adaptation data is small compared to the
main training corpus.<br>
<br>
For a comparison of the two LM adaptation approaches and more
background see
<a class="moz-txt-link-freetext" href="http://www.sciencedirect.com/science/article/pii/S0167639303001055">http://www.sciencedirect.com/science/article/pii/S0167639303001055</a> .<br>
<br>
Make sure you are not adapting on the test data that you use to get
a realistic performance estimate. Otherwise your result with be
overly optimistic and your boss will be disappointed later ;-)<br>
<br>
Andreas<br>
<br>
</body>
</html>