The softmax function takes as input a real -vector and returns the vector with elements given by
It arises in machine learning, game theory, and statistics. Since and , the softmax function is often used to convert a vector into a vector of probabilities, with the more positive entries giving the larger probabilities.
The softmax function is the gradient of the log-sum-exp function
where is the natural logarithm, that is, .
The following plots show the two components of softmax for . Note that they are constant on lines , as shown by the contours.
Here are some examples:
>> softmax([-1 0 1]) ans = 9.0031e-02 2.4473e-01 6.6524e-01 >> softmax([-1 0 10]) ans = 1.6701e-05 4.5397e-05 9.9994e-01
Note how softmax increases the relative weighting of the larger components over the smaller ones. The MATLAB function
softmax used here is available at https://github.com/higham/logsumexp-softmax.
A concise alternative formula, which removes the denominator of by rewriting it as the exponential of and moving it into the numerator, is
Straightforward evaluation of softmax from either or is not recommended, because of the possibility of overflow. Overflow can be avoided in by shifting the components of , just as for the log-sum-exp function, to obtain
where . It can be shown that computing softmax via this formula is numerically reliable. The shifted version of tends to be less accurate, so () is preferred.
This is a minimal set of references, which contain further useful references within.
- Pierre Blanchard, Desmond J. Higham, and Nicholas J. Higham, Accurately Computing the Log-Sum-Exp and Softmax Functions, IMA J. Numer. Anal., Advance access, 2020.
- Bolin Gao and Lacra Pavel, On the Properties of the Softmax Function with Application in Game Theory and Reinforcement Learning, ArXiv:1209.5145, 2018.
- Ian Goodfellow, Yoshua Bengio, and Aaron Courville, Deep Learning, MIT Press, 2016.