The log-sum-exp function takes as input a real -vector
and returns the scalar
where is the natural logarithm. It provides an approximation to the largest element of
, which is given by the
function,
. Indeed,
and on taking logs we obtain
The log-sum-exp function can be thought of as a smoothed version of the max function, because whereas the max function is not differentiable at points where the maximum is achieved in two different components, the log-sum-exp function is infinitely differentiable everywhere. The following plots of and
for
show this connection.
The log-sum-exp function appears in a variety of settings, including statistics, optimization, and machine learning.
For the special case where , we obtain the function
, which is known as the softplus function in machine learning. The softplus function approximates the ReLU (rectified linear unit) activation function
and satisfies, by
,
Two points are worth noting.
- While
, in general, we do (trivially) have
, and more generally
.
- The log-sum-exp function is not to be confused with the exp-sum-log function:
.
Here are some examples:
>> format long e >> logsumexp([1 2 3]) ans = 3.407605964444380e+00 >> logsumexp([1 2 30]) ans = 3.000000000000095e+01 >> logsumexp([1 2 -3]) ans = 2.318175429247454e+00
The MATLAB function logsumexp
used here is available at https://github.com/higham/logsumexp-softmax.
Straightforward evaluation of log-sum-exp from its definition is not recommended, because of the possibility of overflow. Indeed, overflows for
,
, and
in IEEE half, single, and double precision arithmetic, respectively. Overflow can be avoided by writing
which gives
We take , so that all exponentiations are of nonpositive numbers and therefore overflow is avoided. Any underflows are harmless. A refinement is to write
where (if there is more than one such
, we can take any of them). Here,
is a function provided in MATLAB and various other languages that accurately evaluates
even when
is small, in which case
would suffer a loss of precision if it was explicitly computed.
Whereas the original formula involves the logarithm of a sum of nonnegative quantities, when the shifted formula
computes
as the sum of two terms of opposite sign, so could potentially suffer from numerical cancellation. It can be shown by rounding error analysis, however, that computing log-sum-exp via
is numerically reliable.
References
This is a minimal set of references, which contain further useful references within.
- Pierre Blanchard, Desmond J. Higham, and Nicholas J. Higham, Accurately Computing the Log-Sum-Exp and Softmax Functions, IMA J. Numer. Anal., Advance access, 2020.
- Ian Goodfellow, Yoshua Bengio, and Aaron Courville, Deep Learning, MIT Press, 2016.
Related Blog Posts
- What is Numerical Stability? (2020)
This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.
Nice post! It’s nice to know the sign issue doesn’t cause problem, and to formalize the pulling-the-max-out trick. How is the `scipy.special.logsumexp` implementation?
Two variants that I would find interesting:
(1) if we want to approximate $\|x\|_\infty$, then we can use $f(x) = \log( \sum_i exp^{x_i} + exp^{-x_i} )$. In this case, we’d pull out $max |x_i|$ instead of $max x_i$, but I’m not sure we’d want to do log1p
(2) as for the logsumexp and its relationship to softmax (its derivative), many times each $x_i$ is parameterized by a vector $\theta$ and we want the gradient with respect to $\theta$, so then we have to modify the softmax formula to include the gradients of the $x_i$ terms. I’m thinking the naive implementation is not stable, but there ought to be similar tricks.