The log-sum-exp function takes as input a real -vector and returns the scalar
where is the natural logarithm. It provides an approximation to the largest element of , which is given by the function, . Indeed,
and on taking logs we obtain
The log-sum-exp function can be thought of as a smoothed version of the max function, because whereas the max function is not differentiable at points where the maximum is achieved in two different components, the log-sum-exp function is infinitely differentiable everywhere. The following plots of and for show this connection.
The log-sum-exp function appears in a variety of settings, including statistics, optimization, and machine learning.
For the special case where , we obtain the function , which is known as the softplus function in machine learning. The softplus function approximates the ReLU (rectified linear unit) activation function and satisfies, by ,
Two points are worth noting.
- While , in general, we do (trivially) have , and more generally .
- The log-sum-exp function is not to be confused with the exp-sum-log function: .
Here are some examples:
>> format long e >> logsumexp([1 2 3]) ans = 3.407605964444380e+00 >> logsumexp([1 2 30]) ans = 3.000000000000095e+01 >> logsumexp([1 2 -3]) ans = 2.318175429247454e+00
The MATLAB function
logsumexp used here is available at https://github.com/higham/logsumexp-softmax.
Straightforward evaluation of log-sum-exp from its definition is not recommended, because of the possibility of overflow. Indeed, overflows for , , and in IEEE half, single, and double precision arithmetic, respectively. Overflow can be avoided by writing
We take , so that all exponentiations are of nonpositive numbers and therefore overflow is avoided. Any underflows are harmless. A refinement is to write
where (if there is more than one such , we can take any of them). Here, is a function provided in MATLAB and various other languages that accurately evaluates even when is small, in which case would suffer a loss of precision if it was explicitly computed.
Whereas the original formula involves the logarithm of a sum of nonnegative quantities, when the shifted formula computes as the sum of two terms of opposite sign, so could potentially suffer from numerical cancellation. It can be shown by rounding error analysis, however, that computing log-sum-exp via is numerically reliable.
This is a minimal set of references, which contain further useful references within.
- Pierre Blanchard, Desmond J. Higham, and Nicholas J. Higham, Accurately Computing the Log-Sum-Exp and Softmax Functions, IMA J. Numer. Anal., Advance access, 2020.
- Ian Goodfellow, Yoshua Bengio, and Aaron Courville, Deep Learning, MIT Press, 2016.