What Is the Nearest Symmetric Matrix?

What is the nearest symmetric matrix to a real, nonsymmetric square matrix A? This question arises whenever a symmetric matrix is approximated by formulas that do not respect the symmetry. For example, a finite difference approximation to a Hessian matrix can be nonsymmetric even though the Hessian is symmetric. In some cases, lack of symmetry is caused by rounding errors. The natural way to symmetrize A is to replace it by (A + A^T)/2. Is this the best choice?

As our criterion of optimality we take that \| A - X \| is minimized over symmetric X for some suitable norm. Fan and Hoffman (1955) showed that (A + A^T)/2 is a solution in any unitarily invariant norm. A norm is unitarily invariant if \|A\| = \|UAV\| for all unitary U and V. Such a norm depends only on the singular values of A, and hence \|A\| = \|A^T\| since A and A^T have the same singular values. The most important examples of unitarily invariant norms are the 2-norm and the Frobenius norm.

The proof that (A+A^T)/2 is optimal is simple. For any symmetric Y,

\notag  \begin{aligned}   \Bigl\| A - \frac{1}{2}(A + A^T) \Bigr\|         &=  \frac{1}{2} \|A - A^T \|          = \frac{1}{2} \| A - Y + Y^T - A^T \| \\         &\le \frac{1}{2} \| A - Y \| + \frac{1}{2} \|(Y - A)^T \| \\         &= \| A - Y\|.   \end{aligned}

Simple examples of a matrix and a nearest symmetric matrix are

\notag    A = \begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix}, \;\;    X = \begin{bmatrix} 0 & \frac{1}{2} \\ \frac{1}{2} & 0 \end{bmatrix},\qquad    A = \begin{bmatrix} 0 & 1 \\ -1 & 0 \end{bmatrix}, \;\;    X = \begin{bmatrix} 0 & 0 \\ 0 & 0 \end{bmatrix}.

Note that any A can be written

\notag   A = \frac{1}{2} (A + A^T ) + \frac{1}{2} (A - A^T )  \equiv B + C,

where B and C are the symmetric part and the skew-symmetric part of A, respectively, so the nearest symmetric matrix to A is the symmetric part of A.

For the Frobenius norm, (A + A^T)/2 is the unique nearest symmetric matrix, which follows from the fact that \|S + K\|_F^2 = \|S\|_F^2 + \|K\|_F^2 for symmetric S and skew-symmetric K. For the 2-norm, however, the nearest symmetric matrix is not unique in general. An example of non-uniqueness is

\notag    A = \begin{bmatrix} 0 & 0 & 0 \\ 0 & 0 & 1 \\ 0 & 0 & 0 \end{bmatrix}, \quad    X = \begin{bmatrix} 0 & 0 & 0 \\ 0 & 0 & \frac{1}{2} \\ 0 & \frac{1}{2} & 0 \end{bmatrix}, \quad    Y = \begin{bmatrix} \theta & 0 & 0 \\ 0 & 0 & \frac{1}{2} \\ 0 & \frac{1}{2} & 0 \end{bmatrix},

for which \|A - X\|_2 = 0.5, and \|A - Y\|_2 = 0.5 for any \theta such that |\theta| \le 0.5.

Entirely analogous to the above is the nearest skew-symmetric matrix problem, for which the solution is the skew-symmetric part for any unitarily invariant norm. For complex matrices, these results generalize in the obvious way: (A + A^*)/2 is the nearest Hermitian matrix to A and (A - A^*)/2 is the nearest skew-Hermitian matrix to A in any unitarily invariant norm.

Reference

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

What Is the Softmax Function?

The softmax function takes as input a real n-vector x and returns the vector g with elements given by

\notag   \qquad\qquad   g_j(x) =  \displaystyle\frac{\mathrm{e}^{x_j}}{\sum_{i=1}^n \mathrm{e}^{x_i}}, \quad          j=1\colon n.   \qquad\qquad (*)

It arises in machine learning, game theory, and statistics. Since \mathrm{e}^{x_j} \ge 0 and \sum_{j=1}^n g_j = 1, the softmax function is often used to convert a vector x into a vector of probabilities, with the more positive entries giving the larger probabilities.

The softmax function is the gradient of the log-sum-exp function

\notag      \mathrm{lse}(x) = \log\displaystyle\sum_{i=1}^n \mathrm{e}^{x_i},

where \log is the natural logarithm, that is, g_j(x) = (\partial/\partial x_j) \mathrm{lse}(x).

The following plots show the two components of softmax for n = 2. Note that they are constant on lines x_1 - x_2 = \mathrm{constant}, as shown by the contours.

softmax2d.jpg

Here are some examples:

>> softmax([-1 0 1])
ans =
   9.0031e-02
   2.4473e-01
   6.6524e-01
>> softmax([-1 0 10])
ans =
   1.6701e-05
   4.5397e-05
   9.9994e-01

Note how softmax increases the relative weighting of the larger components over the smaller ones. The MATLAB function softmax used here is available at https://github.com/higham/logsumexp-softmax.

A concise alternative formula, which removes the denominator of (*) by rewriting it as the exponential of \mathrm{lse}(x) and moving it into the numerator, is

\notag   \qquad\qquad     g_j = \exp\bigl(x_j - \mathrm{lse}(x)\bigr).   \qquad\qquad (\#)

Straightforward evaluation of softmax from either (*) or (\#) is not recommended, because of the possibility of overflow. Overflow can be avoided in (*) by shifting the components of x, just as for the log-sum-exp function, to obtain

\notag   \qquad\qquad   g_j(x) =  \displaystyle\frac{\mathrm{e}^{x_j-\max(x)}}{\sum_{i=1}^n \mathrm{e}^{x_i-\max(x)}}, \quad             j=1\colon n.   \qquad\qquad (\dagger)

where \max(x) = \max_i x_i. It can be shown that computing softmax via this formula is numerically reliable. The shifted version of (\#) tends to be less accurate, so (\dagger) is preferred.

References

This is a minimal set of references, which contain further useful references within.

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

What Is the Log-Sum-Exp Function?

The log-sum-exp function takes as input a real n-vector x and returns the scalar

\notag      \mathrm{lse}(x) = \log \displaystyle\sum_{i=1}^n \mathrm{e}^{x_i},

where \log is the natural logarithm. It provides an approximation to the largest element of x, which is given by the \max function, \max(x) = \max_i x_i. Indeed,

\notag   \mathrm{e}^{\max(x)} \le \displaystyle\sum_{i=1}^n \mathrm{e}^{x_i}                        \le n \mskip1mu \mathrm{e}^{\max(x)},

and on taking logs we obtain

\notag   \qquad\qquad       \max(x) \le \mathrm{lse}(x) \le \max(x) + \log n.   \qquad\qquad (*)

The log-sum-exp function can be thought of as a smoothed version of the max function, because whereas the max function is not differentiable at points where the maximum is achieved in two different components, the log-sum-exp function is infinitely differentiable everywhere. The following plots of \mathrm{lse}(x) and \max(x) for n = 2 show this connection.

lse2d.jpg

The log-sum-exp function appears in a variety of settings, including statistics, optimization, and machine learning.

For the special case where x = [0~t]^T, we obtain the function f(t) = \log(1+\mathrm{e}^t), which is known as the softplus function in machine learning. The softplus function approximates the ReLU (rectified linear unit) activation function \max(t,0) and satisfies, by (*),

\notag     \max(t,0) \le f(t) \le \max(t,0) + \log 2.

Two points are worth noting.

  • While \log(x_1 + x_2) \ne \log x_1 + \log x_2, in general, we do (trivially) have \log(x_1 + x_2) = \mathrm{lse}(\log x_1,\log x_2), and more generally \log(x_1 + x_2 + \cdots + x_n) = \mathrm{lse}(\log x_1,\log   x_2,\dots,\log x_n).
  • The log-sum-exp function is not to be confused with the exp-sum-log function: \exp \sum_{i=1}^n \log x_i = x_1x_2\dots x_n.

Here are some examples:

>> format long e
>> logsumexp([1 2 3])
ans =
     3.407605964444380e+00

>> logsumexp([1 2 30])
ans =
     3.000000000000095e+01

>> logsumexp([1 2 -3])
ans =
     2.318175429247454e+00

The MATLAB function logsumexp used here is available at https://github.com/higham/logsumexp-softmax.

Straightforward evaluation of log-sum-exp from its definition is not recommended, because of the possibility of overflow. Indeed, \exp(x) overflows for x = 12, x = 89, and x = 710 in IEEE half, single, and double precision arithmetic, respectively. Overflow can be avoided by writing

\notag \begin{aligned}   \mathrm{lse}(x) &= \log \sum_{i=1}^n \mathrm{e}^{x_i}          = \log \sum_{i=1}^n \mathrm{e}^a \mathrm{e}^{x_i - a}          = \log  \left(\mathrm{e}^a\sum_{i=1}^n \mathrm{e}^{x_i - a}\right), \end{aligned}

which gives

\notag     \mathrm{lse}(x) = a + \log\displaystyle\sum_{i=1}^n \mathrm{e}^{x_i - a}.

We take a = \max(x), so that all exponentiations are of nonpositive numbers and therefore overflow is avoided. Any underflows are harmless. A refinement is to write

\notag     \qquad\qquad     \mathrm{lse}(x) = \max(x) + \mathrm{log1p}\Biggl( \displaystyle\sum_{i=1 \atop i\ne k}^n                                 \mathrm{e}^{x_i - \max(x)}\Biggr),     \qquad\qquad (\#)

where x_k = \max(x) (if there is more than one such k, we can take any of them). Here, \mathrm{log1p}(x) = \log(1+x) is a function provided in MATLAB and various other languages that accurately evaluates \log(1+x) even when x is small, in which case 1+x would suffer a loss of precision if it was explicitly computed.

Whereas the original formula involves the logarithm of a sum of nonnegative quantities, when \max(x) < 0 the shifted formula (\#) computes \mathrm{lse}(x) as the sum of two terms of opposite sign, so could potentially suffer from numerical cancellation. It can be shown by rounding error analysis, however, that computing log-sum-exp via (\#) is numerically reliable.

References

This is a minimal set of references, which contain further useful references within.

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.