<html>
<head>
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">You are correct, -renorm normalizes the
model assuming the probabilities for each history sum up to <=
1.</div>
<div class="moz-cite-prefix">There is no option to rescale the ngram
probabilities themselves.</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">However, you are already doing your own
processing to transfer the NN outputs to the ngram model format.
It would be trivial to add a normalization step that sums them up
(for each history), and rescales them if the sum is > 1.</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">The more serious question is, how much
probability mass should you allocate to unseen ngrams? If the NN
estimates probabilities that sum to 1 you have a normalized model,
but not a very good one because it doesn't anticipate ever seeing
a word that you haven't already seen in that context. So you
should find a way to estimate the "unseen word" probability in
your framework, and then include that in your normalization step.</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">Andreas<br>
</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">On 8/24/2019 2:31 PM, Van der Merwe, W,
Mnr [<a class="moz-txt-link-abbreviated" href="mailto:20076223@sun.ac.za">20076223@sun.ac.za</a>] wrote:<br>
</div>
<blockquote type="cite"
cite="mid:VI1PR07MB58542DF06EA09900F9E442DE8BA70@VI1PR07MB5854.eurprd07.prod.outlook.com">
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0);">
<div style="margin: 0px; font-size: 12pt; font-family: Calibri,
Arial, Helvetica, sans-serif; background-color: rgb(255, 255,
255)">
Hi,</div>
<div style="margin: 0px; font-size: 12pt; font-family: Calibri,
Arial, Helvetica, sans-serif; background-color: rgb(255, 255,
255)">
<br>
</div>
<div style="margin: 0px; font-size: 12pt; font-family: Calibri,
Arial, Helvetica, sans-serif; background-color: rgb(255, 255,
255)">
I am a student at Stellenbosch University currently using the
SRILM toolkit for one of my projects. I would like to know if
the toolkit is able to renormalize the probabilities, given an
ARPA file, so that they sum to 1. I've read the documentation
and am aware of the -renorm parameter option, however, I am
not seeking to renormalize backoff weights, only the
probabilities.</div>
<div style="margin: 0px; font-size: 12pt; font-family: Calibri,
Arial, Helvetica, sans-serif; background-color: rgb(255, 255,
255)">
<br>
</div>
<div style="margin: 0px; font-size: 12pt; font-family: Calibri,
Arial, Helvetica, sans-serif; background-color: rgb(255, 255,
255)">
The reason I ask this is that I am writing an ARPA file
myself, taking probabilities produced by a neural network.
Because these probabilities are estimated by a neural net,
they tend not to sum not 1 perfectly. I am hoping that SRILM
can correct this. Otherwise, I will have to write a script to
brute force it.</div>
<div style="margin: 0px; font-size: 12pt; font-family: Calibri,
Arial, Helvetica, sans-serif; background-color: rgb(255, 255,
255)">
<br>
</div>
<div style="margin: 0px; font-size: 12pt; font-family: Calibri,
Arial, Helvetica, sans-serif; background-color: rgb(255, 255,
255)">
Werner</div>
<br>
</div>
<div><a
href="https://gcc02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.sun.ac.za%2Fenglish%2Fabout-us%2Fstrategic-documents&data=01%7C01%7Csrilm-user%40speech.sri.com%7Cd443b6b9943f498dfd5908d728ef58cd%7C40779d3379c44626b8bf140c4d5e9075%7C1&sdata=y1wlD1TMitrr5%2Bbb6ln9l0CKkKRkh8vLuZU9RcP8AGI%3D&reserved=0" originalSrc="http://www.sun.ac.za/english/about-us/strategic-documents" shash="jpTzyTlZ7IzywKZ0lzF9+um3tio+1jhm4DQQR9oOUZkHozpIYYCXucVeTl6kwxoUDV3p0YcdSf5Fbv7LhqBRRfSHzbLZ/K9muhSS1fwU6GHrSNAmk8afqCihzsSPuGp8tPnoyW5tSn0BWok8q50q7kCofb/Sg8MV0eQlogp9Lus="
originalsrc="http://www.sun.ac.za/english/about-us/strategic-documents"
shash="Pa+DT3ctCyafxhOqhglMWbaJh3HdLy1M0KEdPoU9DrVUGNG1swxlOUXzsMZjN+rbOrSZrHn4WJM+k90pYyQr3PVVJo0CDbjgtAqNSl5bBQJzJxot8NB1vnO167oUHOfvAx3ykRSZECgk3qOPRaK+8EPMv5tU2tVIaWBZXYmlo0c="
moz-do-not-send="true"><img
src="http://cdn.sun.ac.za/100/ProductionFooter.jpg"
moz-do-not-send="true"></a></div>
<br>
<span style="font-size: 11px; font-family: 'Verdana';
color:#9b9f9e;">The integrity and confidentiality of this email
are governed by these terms.
<a
href="https://gcc02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.sun.ac.za%2Femaildisclaimer&data=01%7C01%7Csrilm-user%40speech.sri.com%7Cd443b6b9943f498dfd5908d728ef58cd%7C40779d3379c44626b8bf140c4d5e9075%7C1&sdata=tFFDwIA9FROFkatqxx90CkkUIvu45QbFHurS2IDZFNQ%3D&reserved=0
" moz-do-not-send="true">Disclaimer</a><br>
Die integriteit en vertroulikheid van hierdie e-pos word deur
die volgende bepalings bereël.
<a
href="https://gcc02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.sun.ac.za%2Femaildisclaimer&data=01%7C01%7Csrilm-user%40speech.sri.com%7Cd443b6b9943f498dfd5908d728ef58cd%7C40779d3379c44626b8bf140c4d5e9075%7C1&sdata=tFFDwIA9FROFkatqxx90CkkUIvu45QbFHurS2IDZFNQ%3D&reserved=0
" moz-do-not-send="true">Vrywaringsklousule</a></span>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<pre class="moz-quote-pre" wrap="">_______________________________________________
SRILM-User site list
<a class="moz-txt-link-abbreviated" href="mailto:SRILM-User@speech.sri.com">SRILM-User@speech.sri.com</a>
<a class="moz-txt-link-freetext" href="http://mailman.speech.sri.com/cgi-bin/mailman/listinfo/srilm-user">http://mailman.speech.sri.com/cgi-bin/mailman/listinfo/srilm-user</a></pre>
</blockquote>
<p><br>
</p>
</body>
</html>