[SRILM User List] [External Sender] Renormalising probabilities to 1

Andreas Stolcke stolcke at icsi.berkeley.edu
Sat Aug 24 17:01:11 PDT 2019


You are correct, -renorm normalizes the model assuming the probabilities 
for each history sum up to <= 1.
There is no option to rescale the ngram probabilities themselves.

However, you are already doing your own processing to transfer the NN 
outputs to the ngram model format. It would be trivial to add a 
normalization step that sums them up (for each history), and rescales 
them if the sum is > 1.

The more serious question is, how much probability mass should you 
allocate to unseen ngrams?  If the NN estimates probabilities that sum 
to 1 you have a normalized model, but not a very good one because it 
doesn't anticipate ever seeing a word that you haven't already seen in 
that context.  So you should find a way to estimate the "unseen word" 
probability in your framework, and then include that in your 
normalization step.

Andreas

On 8/24/2019 2:31 PM, Van der Merwe, W, Mnr [20076223 at sun.ac.za] wrote:
> Hi,
>
> I am a student at Stellenbosch University currently using the SRILM 
> toolkit for one of my projects. I would like to know if the toolkit is 
> able to renormalize the probabilities, given an ARPA file, so that 
> they sum to 1. I've read the documentation and am aware of the -renorm 
> parameter option, however, I am not seeking to renormalize backoff 
> weights, only the probabilities.
>
> The reason I ask this is that I am writing an ARPA file myself, taking 
> probabilities produced by a neural network. Because these 
> probabilities are estimated by a neural net, they tend not to sum not 
> 1 perfectly. I am hoping that SRILM can correct this. Otherwise, I 
> will have to write a script to brute force it.
>
> Werner
>
> <https://gcc02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.sun.ac.za%2Fenglish%2Fabout-us%2Fstrategic-documents&data=01%7C01%7Csrilm-user%40speech.sri.com%7Cd443b6b9943f498dfd5908d728ef58cd%7C40779d3379c44626b8bf140c4d5e9075%7C1&sdata=y1wlD1TMitrr5%2Bbb6ln9l0CKkKRkh8vLuZU9RcP8AGI%3D&reserved=0>
>
> The integrity and confidentiality of this email are governed by these 
> terms. Disclaimer 
> <https://gcc02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.sun.ac.za%2Femaildisclaimer&data=01%7C01%7Csrilm-user%40speech.sri.com%7Cd443b6b9943f498dfd5908d728ef58cd%7C40779d3379c44626b8bf140c4d5e9075%7C1&sdata=tFFDwIA9FROFkatqxx90CkkUIvu45QbFHurS2IDZFNQ%3D&reserved=0 
> >
> Die integriteit en vertroulikheid van hierdie e-pos word deur die 
> volgende bepalings bereël. Vrywaringsklousule 
> <https://gcc02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.sun.ac.za%2Femaildisclaimer&data=01%7C01%7Csrilm-user%40speech.sri.com%7Cd443b6b9943f498dfd5908d728ef58cd%7C40779d3379c44626b8bf140c4d5e9075%7C1&sdata=tFFDwIA9FROFkatqxx90CkkUIvu45QbFHurS2IDZFNQ%3D&reserved=0 
> >
>
> _______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://mailman.speech.sri.com/cgi-bin/mailman/listinfo/srilm-user


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.speech.sri.com/pipermail/srilm-user/attachments/20190824/2f6db726/attachment.html>


More information about the SRILM-User mailing list