[SRILM User List] Interpolate trigram Probabilities to an n-gram LM
Andreas Stolcke
stolcke at icsi.berkeley.edu
Tue Sep 24 12:04:00 PDT 2013
On 9/23/2013 1:33 PM, Md. Akmal Haidar wrote:
> Hi,
>
> 1. Is it possible to interpolate some trigram probabilities (say they
> are in file t.txt) with an n-gram LM ?
> SRILM gives results with the warning (no bow for prefix of trigram of
> t.txt).
> -lm n-gram.lm -lambda .9 -mix-lm t.txt -ppl test.txt
> 2. When the trigram probabilities in t.txt changes (newt.txt), the
> results are exactly the same as above.
> -lm n-gram.lm -lambda .9 -mix-lm newt.txt -ppl test.txt
>
> Is above interpolation is OK?Is there any other methods that are
> required to interpolate these trigram probabilities to an n-gram LM?
The above would be fine if newt.txt contained a well-formed LM. The
format you generated is incomplete.
As implied by the warning message, for each trigram "a b c" also need
the history portion ("a b") to be included as a bigram.
Therefore, you should include a line
-99 a b 0
for every such history (plus the appropriate ngram count information in
the header). You also need a unigram section containing all words of
your vocabulary.
-99 a 0
(the final 0's are the log backoff weights).
Now, giving 0 (log = -99) probabilities to all your unigrams and bigrams
is suboptimal because there will be cases where you don't have a
matching trigram and then the backoff will result in probability 0.
This is not the end of the world since you presumably are interpolating
with another model that will yield a non-zero probability, but it should
be better to estimate a non-zero probability for those unigrams and
bigrams. If you do, then run the resulting model through
ngram -lm newt.txt -renorm -write-lm newt-norm.txt
to recompute the backoff weights. Finally, interpolate.
Andreas
>
> Format of t.txt/newt.txt
> \data\
> ngram 3=242
> \3-grams:
> ....
> \end\
>
> Thanks
> Best Regards
> Akmal
>
>
>
> _______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://www.speech.sri.com/mailman/listinfo/srilm-user
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20130924/f2e334ce/attachment.html>
More information about the SRILM-User
mailing list