What Is the Log-Sum-Exp Function?

The log-sum-exp function takes as input a real n-vector x and returns the scalar

\notag      \mathrm{lse}(x) = \log \displaystyle\sum_{i=1}^n \mathrm{e}^{x_i},

where \log is the natural logarithm. It provides an approximation to the largest element of x, which is given by the \max function, \max(x) = \max_i x_i. Indeed,

\notag   \mathrm{e}^{\max(x)} \le \displaystyle\sum_{i=1}^n \mathrm{e}^{x_i}                        \le n \mskip1mu \mathrm{e}^{\max(x)},

and on taking logs we obtain

\notag   \qquad\qquad       \max(x) \le \mathrm{lse}(x) \le \max(x) + \log n.   \qquad\qquad (*)

The log-sum-exp function can be thought of as a smoothed version of the max function, because whereas the max function is not differentiable at points where the maximum is achieved in two different components, the log-sum-exp function is infinitely differentiable everywhere. The following plots of \mathrm{lse}(x) and \max(x) for n = 2 show this connection.

lse2d.jpg

The log-sum-exp function appears in a variety of settings, including statistics, optimization, and machine learning.

For the special case where x = [0~t]^T, we obtain the function f(t) = \log(1+\mathrm{e}^t), which is known as the softplus function in machine learning. The softplus function approximates the ReLU (rectified linear unit) activation function \max(t,0) and satisfies, by (*),

\notag     \max(t,0) \le f(t) \le \max(t,0) + \log 2.

Two points are worth noting.

  • While \log(x_1 + x_2) \ne \log x_1 + \log x_2, in general, we do (trivially) have \log(x_1 + x_2) = \mathrm{lse}(\log x_1,\log x_2), and more generally \log(x_1 + x_2 + \cdots + x_n) = \mathrm{lse}(\log x_1,\log   x_2,\dots,\log x_n).
  • The log-sum-exp function is not to be confused with the exp-sum-log function: \exp \sum_{i=1}^n \log x_i = x_1x_2\dots x_n.

Here are some examples:

>> format long e
>> logsumexp([1 2 3])
ans =
     3.407605964444380e+00

>> logsumexp([1 2 30])
ans =
     3.000000000000095e+01

>> logsumexp([1 2 -3])
ans =
     2.318175429247454e+00

The MATLAB function logsumexp used here is available at https://github.com/higham/logsumexp-softmax.

Straightforward evaluation of log-sum-exp from its definition is not recommended, because of the possibility of overflow. Indeed, \exp(x) overflows for x = 12, x = 89, and x = 710 in IEEE half, single, and double precision arithmetic, respectively. Overflow can be avoided by writing

\notag \begin{aligned}   \mathrm{lse}(x) &= \log \sum_{i=1}^n \mathrm{e}^{x_i}          = \log \sum_{i=1}^n \mathrm{e}^a \mathrm{e}^{x_i - a}          = \log  \left(\mathrm{e}^a\sum_{i=1}^n \mathrm{e}^{x_i - a}\right), \end{aligned}

which gives

\notag     \mathrm{lse}(x) = a + \log\displaystyle\sum_{i=1}^n \mathrm{e}^{x_i - a}.

We take a = \max(x), so that all exponentiations are of nonpositive numbers and therefore overflow is avoided. Any underflows are harmless. A refinement is to write

\notag     \qquad\qquad     \mathrm{lse}(x) = \max(x) + \mathrm{log1p}\Biggl( \displaystyle\sum_{i=1 \atop i\ne k}^n                                 \mathrm{e}^{x_i - \max(x)}\Biggr),     \qquad\qquad (\#)

where x_k = \max(x) (if there is more than one such k, we can take any of them). Here, \mathrm{log1p}(x) = \log(1+x) is a function provided in MATLAB and various other languages that accurately evaluates \log(1+x) even when x is small, in which case 1+x would suffer a loss of precision if it was explicitly computed.

Whereas the original formula involves the logarithm of a sum of nonnegative quantities, when \max(x) < 0 the shifted formula (\#) computes \mathrm{lse}(x) as the sum of two terms of opposite sign, so could potentially suffer from numerical cancellation. It can be shown by rounding error analysis, however, that computing log-sum-exp via (\#) is numerically reliable.

References

This is a minimal set of references, which contain further useful references within.

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

One thought on “What Is the Log-Sum-Exp Function?

  1. Nice post! It’s nice to know the sign issue doesn’t cause problem, and to formalize the pulling-the-max-out trick. How is the `scipy.special.logsumexp` implementation?

    Two variants that I would find interesting:

    (1) if we want to approximate $\|x\|_\infty$, then we can use $f(x) = \log( \sum_i exp^{x_i} + exp^{-x_i} )$. In this case, we’d pull out $max |x_i|$ instead of $max x_i$, but I’m not sure we’d want to do log1p

    (2) as for the logsumexp and its relationship to softmax (its derivative), many times each $x_i$ is parameterized by a vector $\theta$ and we want the gradient with respect to $\theta$, so then we have to modify the softmax formula to include the gradients of the $x_i$ terms. I’m thinking the naive implementation is not stable, but there ought to be similar tricks.

Leave a comment