# What Is the Nearest Symmetric Matrix?

What is the nearest symmetric matrix to a real, nonsymmetric square matrix $A$? This question arises whenever a symmetric matrix is approximated by formulas that do not respect the symmetry. For example, a finite difference approximation to a Hessian matrix can be nonsymmetric even though the Hessian is symmetric. In some cases, lack of symmetry is caused by rounding errors. The natural way to symmetrize $A$ is to replace it by $(A + A^T)/2$. Is this the best choice?

As our criterion of optimality we take that $\| A - X \|$ is minimized over symmetric $X$ for some suitable norm. Fan and Hoffman (1955) showed that $(A + A^T)/2$ is a solution in any unitarily invariant norm. A norm is unitarily invariant if $\|A\| = \|UAV\|$ for all unitary $U$ and $V$. Such a norm depends only on the singular values of $A$, and hence $\|A\| = \|A^T\|$ since $A$ and $A^T$ have the same singular values. The most important examples of unitarily invariant norms are the $2$-norm and the Frobenius norm.

The proof that $(A+A^T)/2$ is optimal is simple. For any symmetric $Y$,

\notag \begin{aligned} \Bigl\| A - \frac{1}{2}(A + A^T) \Bigr\| &= \frac{1}{2} \|A - A^T \| = \frac{1}{2} \| A - Y + Y^T - A^T \| \\ &\le \frac{1}{2} \| A - Y \| + \frac{1}{2} \|(Y - A)^T \| \\ &= \| A - Y\|. \end{aligned}

Simple examples of a matrix and a nearest symmetric matrix are

$\notag A = \begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix}, \;\; X = \begin{bmatrix} 0 & \frac{1}{2} \\ \frac{1}{2} & 0 \end{bmatrix},\qquad A = \begin{bmatrix} 0 & 1 \\ -1 & 0 \end{bmatrix}, \;\; X = \begin{bmatrix} 0 & 0 \\ 0 & 0 \end{bmatrix}.$

Note that any $A$ can be written

$\notag A = \frac{1}{2} (A + A^T ) + \frac{1}{2} (A - A^T ) \equiv B + C,$

where $B$ and $C$ are the symmetric part and the skew-symmetric part of $A$, respectively, so the nearest symmetric matrix to $A$ is the symmetric part of $A$.

For the Frobenius norm, $(A + A^T)/2$ is the unique nearest symmetric matrix, which follows from the fact that $\|S + K\|_F^2 = \|S\|_F^2 + \|K\|_F^2$ for symmetric $S$ and skew-symmetric $K$. For the $2$-norm, however, the nearest symmetric matrix is not unique in general. An example of non-uniqueness is

$\notag A = \begin{bmatrix} 0 & 0 & 0 \\ 0 & 0 & 1 \\ 0 & 0 & 0 \end{bmatrix}, \quad X = \begin{bmatrix} 0 & 0 & 0 \\ 0 & 0 & \frac{1}{2} \\ 0 & \frac{1}{2} & 0 \end{bmatrix}, \quad Y = \begin{bmatrix} \theta & 0 & 0 \\ 0 & 0 & \frac{1}{2} \\ 0 & \frac{1}{2} & 0 \end{bmatrix},$

for which $\|A - X\|_2 = 0.5$, and $\|A - Y\|_2 = 0.5$ for any $\theta$ such that $|\theta| \le 0.5$.

Entirely analogous to the above is the nearest skew-symmetric matrix problem, for which the solution is the skew-symmetric part for any unitarily invariant norm. For complex matrices, these results generalize in the obvious way: $(A + A^*)/2$ is the nearest Hermitian matrix to $A$ and $(A - A^*)/2$ is the nearest skew-Hermitian matrix to $A$ in any unitarily invariant norm.

# What Is the Softmax Function?

The softmax function takes as input a real $n$-vector $x$ and returns the vector $g$ with elements given by

$\notag \qquad\qquad g_j(x) = \displaystyle\frac{\mathrm{e}^{x_j}}{\sum_{i=1}^n \mathrm{e}^{x_i}}, \quad j=1\colon n. \qquad\qquad (*)$

It arises in machine learning, game theory, and statistics. Since $\mathrm{e}^{x_j} \ge 0$ and $\sum_{j=1}^n g_j = 1$, the softmax function is often used to convert a vector $x$ into a vector of probabilities, with the more positive entries giving the larger probabilities.

The softmax function is the gradient of the log-sum-exp function

$\notag \mathrm{lse}(x) = \log\displaystyle\sum_{i=1}^n \mathrm{e}^{x_i},$

where $\log$ is the natural logarithm, that is, $g_j(x) = (\partial/\partial x_j) \mathrm{lse}(x)$.

The following plots show the two components of softmax for $n = 2$. Note that they are constant on lines $x_1 - x_2 = \mathrm{constant}$, as shown by the contours.

Here are some examples:

>> softmax([-1 0 1])
ans =
9.0031e-02
2.4473e-01
6.6524e-01
>> softmax([-1 0 10])
ans =
1.6701e-05
4.5397e-05
9.9994e-01


Note how softmax increases the relative weighting of the larger components over the smaller ones. The MATLAB function softmax used here is available at https://github.com/higham/logsumexp-softmax.

A concise alternative formula, which removes the denominator of $(*)$ by rewriting it as the exponential of $\mathrm{lse}(x)$ and moving it into the numerator, is

$\notag \qquad\qquad g_j = \exp\bigl(x_j - \mathrm{lse}(x)\bigr). \qquad\qquad (\#)$

Straightforward evaluation of softmax from either $(*)$ or $(\#)$ is not recommended, because of the possibility of overflow. Overflow can be avoided in $(*)$ by shifting the components of $x$, just as for the log-sum-exp function, to obtain

$\notag \qquad\qquad g_j(x) = \displaystyle\frac{\mathrm{e}^{x_j-\max(x)}}{\sum_{i=1}^n \mathrm{e}^{x_i-\max(x)}}, \quad j=1\colon n. \qquad\qquad (\dagger)$

where $\max(x) = \max_i x_i$. It can be shown that computing softmax via this formula is numerically reliable. The shifted version of $(\#)$ tends to be less accurate, so ($\dagger$) is preferred.

## References

This is a minimal set of references, which contain further useful references within.

# What Is the Log-Sum-Exp Function?

The log-sum-exp function takes as input a real $n$-vector $x$ and returns the scalar

$\notag \mathrm{lse}(x) = \log \displaystyle\sum_{i=1}^n \mathrm{e}^{x_i},$

where $\log$ is the natural logarithm. It provides an approximation to the largest element of $x$, which is given by the $\max$ function, $\max(x) = \max_i x_i$. Indeed,

$\notag \mathrm{e}^{\max(x)} \le \displaystyle\sum_{i=1}^n \mathrm{e}^{x_i} \le n \mskip1mu \mathrm{e}^{\max(x)},$

and on taking logs we obtain

$\notag \qquad\qquad \max(x) \le \mathrm{lse}(x) \le \max(x) + \log n. \qquad\qquad (*)$

The log-sum-exp function can be thought of as a smoothed version of the max function, because whereas the max function is not differentiable at points where the maximum is achieved in two different components, the log-sum-exp function is infinitely differentiable everywhere. The following plots of $\mathrm{lse}(x)$ and $\max(x)$ for $n = 2$ show this connection.

The log-sum-exp function appears in a variety of settings, including statistics, optimization, and machine learning.

For the special case where $x = [0~t]^T$, we obtain the function $f(t) = \log(1+\mathrm{e}^t)$, which is known as the softplus function in machine learning. The softplus function approximates the ReLU (rectified linear unit) activation function $\max(t,0)$ and satisfies, by $(*)$,

$\notag \max(t,0) \le f(t) \le \max(t,0) + \log 2.$

Two points are worth noting.

• While $\log(x_1 + x_2) \ne \log x_1 + \log x_2$, in general, we do (trivially) have $\log(x_1 + x_2) = \mathrm{lse}(\log x_1,\log x_2)$, and more generally $\log(x_1 + x_2 + \cdots + x_n) = \mathrm{lse}(\log x_1,\log x_2,\dots,\log x_n)$.
• The log-sum-exp function is not to be confused with the exp-sum-log function: $\exp \sum_{i=1}^n \log x_i = x_1x_2\dots x_n$.

Here are some examples:

>> format long e
>> logsumexp([1 2 3])
ans =
3.407605964444380e+00

>> logsumexp([1 2 30])
ans =
3.000000000000095e+01

>> logsumexp([1 2 -3])
ans =
2.318175429247454e+00


The MATLAB function logsumexp used here is available at https://github.com/higham/logsumexp-softmax.

Straightforward evaluation of log-sum-exp from its definition is not recommended, because of the possibility of overflow. Indeed, $\exp(x)$ overflows for $x = 12$, $x = 89$, and $x = 710$ in IEEE half, single, and double precision arithmetic, respectively. Overflow can be avoided by writing

\notag \begin{aligned} \mathrm{lse}(x) &= \log \sum_{i=1}^n \mathrm{e}^{x_i} = \log \sum_{i=1}^n \mathrm{e}^a \mathrm{e}^{x_i - a} = \log \left(\mathrm{e}^a\sum_{i=1}^n \mathrm{e}^{x_i - a}\right), \end{aligned}

which gives

$\notag \mathrm{lse}(x) = a + \log\displaystyle\sum_{i=1}^n \mathrm{e}^{x_i - a}.$

We take $a = \max(x)$, so that all exponentiations are of nonpositive numbers and therefore overflow is avoided. Any underflows are harmless. A refinement is to write

$\notag \qquad\qquad \mathrm{lse}(x) = \max(x) + \mathrm{log1p}\Biggl( \displaystyle\sum_{i=1 \atop i\ne k}^n \mathrm{e}^{x_i - \max(x)}\Biggr), \qquad\qquad (\#)$

where $x_k = \max(x)$ (if there is more than one such $k$, we can take any of them). Here, $\mathrm{log1p}(x) = \log(1+x)$ is a function provided in MATLAB and various other languages that accurately evaluates $\log(1+x)$ even when $x$ is small, in which case $1+x$ would suffer a loss of precision if it was explicitly computed.

Whereas the original formula involves the logarithm of a sum of nonnegative quantities, when $\max(x) < 0$ the shifted formula $(\#)$ computes $\mathrm{lse}(x)$ as the sum of two terms of opposite sign, so could potentially suffer from numerical cancellation. It can be shown by rounding error analysis, however, that computing log-sum-exp via $(\#)$ is numerically reliable.

## References

This is a minimal set of references, which contain further useful references within.

# What Is a Modified Cholesky Factorization?

Newton methods for minimizing a function $F:\mathbb{R}^n \to \mathbb{R}$ generate a sequence of points $x_k$, where the step from $x_k$ to $x_{k+1}$ is along a search direction $p_k$ determined from a linear system $G_k p_k = -g_k$, where $g_k = \nabla F(x_k)$ is the gradient and $G_k$ is an approximation to the Hessian matrix $\nabla^2 F(x_k)$. The equation $g_k^Tp_k = - p_k^T G_k p_k$ shows that $G_k$ is a descent direction if $p_k^T G_k p_k > 0$, and in order to guarantee that this condition holds for all $p_k$ we need $G_k$ to be positive definite. But even if $G_k$ is the exact Hessian, positive definiteness is not guaranteed far from a minimizer. We can modify the method to ensure definiteness of $G_k$, as with quasi-Newton methods. Or we can perturb the matrix, if necessary, to make it positive definite. Modified Cholesky factorization perturbs and factorizes the matrix at the same time. It is useful in other situations, too, such as in constructing preconditioners and in bounding the distance to a positive semidefinite matrix.

A modified Cholesky factorization of a symmetric matrix $A$ is a factorization $P(A + E)P^T = LDL^T$, where $P$ is a permutation matrix, $L$ is unit lower triangular, and $D$ is diagonal or block diagonal and positive definite. It follows that $A+E$ is a positive definite matrix.

A natural way to compute a modified Cholesky factorization is to modify the Cholesky factorization algorithm. Cholesky factorization fails when it tries to take the square root of a negative quantity or divide by zero. We can avoid both possibilities by increasing nonpositive pivots when they are encountered. This corresponds to making a diagonal perturbation $E$ to $A$ and computing a Cholesky factorization $A + E = R^T\!R$. However, choosing a suitable $E$ is more difficult than it might seem.

Consider the matrix

$\notag A = \begin{bmatrix} 1 & 1 & 1 & 0 \\ 1 & 1-\epsilon & 2 & 1\\ 1 & 2 & 1 & 1\\ 0 & 1 & 1 & 1 \end{bmatrix}, \quad 0 < \epsilon \ll 1.$

Since Cholesky factorization generates the same sequence of Schur complements as Gaussian elimination, it suffices to consider Gaussian elimination. The diagonal elements of $R$ are the square roots of the pivots. After one step of elimination the reduced matrix is

$\notag A^{(2)} = \left[\begin{array}{c|ccc} 1 & 1 & 1 & 1\\\hline 0 & -\epsilon & 1 & 1\\ 0 & 1 & 0 & 1\\ 0 & 1 & 1 & 1 \end{array}\right],$

and the trailing $3\times 3$ matrix (a Schur complement) is clearly indefinite because the $(2,2)$ entry, which is the next pivot, is negative. We can make the (2,2) entry positive by adding $2\epsilon$ to it:

$\notag A^{(2)} + E = \left[\begin{array}{c|ccc} 1 & 1 & 1 & 1\\\hline 0 & \epsilon & 1 & 1\\ 0 & 1 & 0 & 1\\ 0 & 1 & 1 & 1 \end{array}\right] \quad (E = 2 \mskip1mu\epsilon \mskip1mu e_2e_2^T).$

The next stage of the factorization can complete and it yields

$\notag A^{(3)} = \left[\begin{array}{cc|cc} 1 & 1 & 1 & 1 \\ 0 & \epsilon & 1 & 1 \\\hline\rule{0cm}{18pt} 0 & 0 & -\displaystyle\frac{1}{\epsilon} & 1 - \displaystyle\frac{1}{\epsilon}\\\rule{0cm}{20pt} 0 & 0 &1 - \displaystyle\frac{1}{\epsilon} & 1 - \displaystyle\frac{1}{\epsilon} \end{array}\right],$

The trailing $2\times 2$ submatrix has elements of order $\epsilon^{-1} \gg 1$. Not only will a perturbation of order $\epsilon^{-1}$ be required to the $(3,3)$ element to allow the Cholesky factorization to continue, but the Cholesky factor will have elements of order $\epsilon^{-1/2}$ so numerical stability will likely be lost. Yet the smallest eigenvalue of $A$ is of order $1$, so it should have been possible to make only an $O(1)$ perturbation to $A$ in order for the factorization to succeed.

This example shows that if we are to increase a pivot element then we need a more sophisticated strategy that takes account of the size of the resulting elements of the factors and the effect on later stages of the factorization.

A modified Cholesky factorization should satisfy, as far as possible, four objectives.

• If $A$ is “sufficiently positive definite” then $E$ is zero.
• If $A$ is indefinite, $\|E\|$ is not much larger than

$\notag \min \{\, \|\Delta A\| : A+\Delta A~ \text{is positive semidefinite} \,\}$

for some appropriate norm.

• The matrix $A+E$ is reasonably well conditioned.
• The cost of the algorithm is the same as the cost of standard Cholesky factorization, that is, $n^3/3 + O(n^2)$ flops for an $n\times n$ matrix.

Gill and Murray (1974) gave the first modified Cholesky algorithm, which computes $P(A + E)P^T = LDL^T$ with diagonal $D$ and $E$. It was refined by Gill, Murray, and Wright in 1981. Schnabel and Eskow (1990) gave an algorithm that attempts to produce smaller values of $\|E\|$, partly by exploiting eigenvalue bounds obtained from Gershgorin’s theorem. That algorithm was subsequently improved by Schnabel and Eskow (1999).

A different approach was taken by Cheng and Higham (1998), building on an earlier idea by Bunch and Sorensen. This approach computes a block $\mathrm{LDL^T}$ factorization $PAP^T = LDL^T$, were $P$ is a permutation matrix, $L$ is unit lower triangular, and $D$ is block diagonal with diagonal blocks of size $1$ or $2$. The pivoting strategy is the symmetric rook pivoting strategy of Ashcraft, Grimes, and Lewis (1998), which has the key property of producing a bounded $L$ factor. The cost of pivoting is typically $O(n^2)$ comparisons but can be as large as $O(n^3)$ in the worst case. Cheng and Higham compute the perturbation $\Delta D$ of minimal Frobenius norm such that $D + \Delta D$ has eigenvalues greater than or equal to $\delta$, where $\delta > 0$ is a parameter. The modified Cholesky factorization is then $P(A+E)P^T = L(D+\Delta D)L^T$.

A significant advantage of the block $\mathrm{LDL^T}$ approach is that it is modular: any implementation of the factorization can be used and the modification is simply inexpensive postprocessing of the $D$ factor. The other algorithms are not simple modifications of an $\mathrm{LDL^T}$ factorization and are not straightforward to implement in an efficient way as a block algorithm. Note that in the block $\mathrm{LDL^T}$ approach $E$ is a full matrix and it is not explicitly computed.

Modified Cholesky software is not widely available in libraries. Implementations of the Cheng–Higham algorithm are available in

## Example

We take the $4\times 4$ matrix above with $\epsilon = 10^{-2}$:

$\notag A = \begin{bmatrix} 1 & 1 & 1 & 0 \\ 1 & 0.99 & 2 & 1\\ 1 & 2 & 1 & 1\\ 0 & 1 & 1 & 1 \end{bmatrix}.$

It has eigenvalues

-1.0050e+00  -2.3744e-01   1.0000e+00   4.2325e+00


The Gill–Murray–Wright algorithm computes as $E$ the diagonal matrix with diagonal elements

0   2.0200e+00   2.0000e+00            0


while the Schnabel–Eskow algorithm (1999) computes $E$ with diagonal elements

1.0000e+00   1.0050e+00   1.0050e+00   1.0000e+00


For the Cheng–Higham algorithm with $\delta = (2u)^{1/2} \|A\|_F = 6.7 \times10^{-8}$ (where $u \approx 1.11 \times 10^{-16}$ is the unit roundoff), the perturbed matrix $A+E$ is

1.0000e+00   1.0000e+00   1.0000e+00            0
1.0000e+00   1.4950e+00   1.4975e+00   9.9749e-01
1.0000e+00   1.4975e+00   1.5000e+00   1.0025e+00
0   9.9749e-01   1.0025e+00   2.0100e+00


The Frobenius norms of the perturbations to $A$ are $2.84$, $2.00$, and $1.43$, respectively, and the 2-norm condition numbers are $33.8$, $43.2$, and $4.67 \times 10^8$. The large condition number for the Cheng–Higham algorithm is caused by the value of the parameter $\delta$. With $\delta = 0.1$, the perturbed matrix is

1.0000e+00   1.0000e+00   1.0000e+00            0
1.0000e+00   1.5453e+00   1.4475e+00   9.9724e-01
1.0000e+00   1.4475e+00   1.5497e+00   1.0027e+00
0   9.9724e-01   1.0027e+00   2.1100e+00


at Frobenius norm distance $1.57$ from $A$ and with $2$-norm condition number $327.3$. For comparison, the symmetric matrix with all eigenvalues greater than or equal to $0.1$ that is closest to $A$ in the Frobenius norm is at a distance $1.15$ from $A$.

In general, there is no clear ordering of the different modified Cholesky methods in terms of their ability to satisfy the four criteria.

## References

This is a minimal set of references, which contain further useful references within.

# What Is the Matrix Sign Function?

The matrix sign function is the matrix function corresponding to the scalar function of a complex variable

$\notag \mathrm{sign}(z) = \begin{cases} 1, & \mathop{\mathrm{Re}} z > 0, \\ -1, & \mathop{\mathrm{Re}} z < 0. \end{cases}$

Note that this function is undefined on the imaginary axis. The matrix sign function can be obtained from the Jordan canonical form definition of a matrix function: if $A = XJX^{-1}$ is a Jordan decomposition with $\mathrm{diag}(J) = \mathrm{diag}(\lambda_i)$ then

$\notag S = \mathrm{sign}(A) = X \mathrm{sign}(J)X^{-1} = X \mathrm{diag}(\mathrm{sign}(\lambda_i))X^{-1},$

since all the derivatives of the sign function are zero. The eigenvalues of $S$ are therefore all $\pm 1$. Moreover, $S^2 = I$, so $S$ is an involutory matrix.

The matrix sign function was introduced by Roberts in 1971 as a tool for model reduction and for solving Lyapunov and algebraic Riccati equations. The fundamental property that Roberts employed is that $(I+S)/2$ and $(I-S)/2$ are projectors onto the invariant subspaces associated with the eigenvalues of $A$ in the open right-half plane and open left-half plane, respectively. Indeed without loss of generality we can assume that the eigenvalues of $A$ are ordered so that $J = \mathrm{diag}(J_1,J_2)$, with the eigenvalues of $J_1\in\mathbb{C}^{p\times p}$ in the open left half-plane and those of $J_2\in\mathbb{C}^{q\times q}$ in the open right half-plane ($p + q = n$). Then

$\notag S = X \begin{bmatrix} -I_p & 0 \\ 0 & I_q \end{bmatrix}X^{-1}$

and, writing $X = [X_1~X_2]$, where $X_1$ is $n\times p$ and $X_2$ is $n\times q$, we have

\notag \begin{aligned} \displaystyle\frac{I+S}{2} &= X \begin{bmatrix} 0 & 0 \\ 0 & I_q \end{bmatrix}X^{-1} = X_2 X^{-1}(p+1\colon n,:),\\[\smallskipamount] \displaystyle\frac{I-S}{2} &= X \begin{bmatrix} I_p & 0 \\ 0 & 0 \end{bmatrix}X^{-1} = X_1 X^{-1}(1\colon p,:). \end{aligned}

Also worth noting are the integral representation

$\notag \mathrm{sign}(A) = \displaystyle\frac{2}{\pi} A \int_0^{\infty} (t^2I+A^2)^{-1} \,\mathrm{d}t$

and the concise formula

$\notag \mathrm{sign}(A) = A (A^2)^{-1/2}.$

## Application to Sylvester Equation

To see how the matrix sign function can be used, consider the Sylvester equation

$\notag AX - XB = C, \quad A \in \mathbb{C}^{m\times m}, \; B \in \mathbb{C}^{n\times n}, \; C \in \mathbb{C}^{m\times n}.$

This equation is the $(1,2)$ block of the equation

$\notag \begin{bmatrix} A & -C \\ 0 & B \end{bmatrix} = \begin{bmatrix} I_m & X \\ 0 & I_n \end{bmatrix} \begin{bmatrix} A & 0 \\ 0 & B \end{bmatrix} \begin{bmatrix} I_m & X \\ 0 & I_n \end{bmatrix}^{-1}.$

If $\mathrm{sign}(A) = I$ and $\mathrm{sign}(B) = -I$ then

$\notag \mathrm{sign} \left( \begin{bmatrix} A & -C \\ 0 & -B \end{bmatrix} \right) = \begin{bmatrix} I_m & X \\ 0 & I_n \end{bmatrix} \begin{bmatrix} I_m & 0 \\ 0 & -I_n \end{bmatrix} \begin{bmatrix} I_m & -X \\ 0 & I_n \end{bmatrix} = \begin{bmatrix} I_m & -2X \\ 0 & -I_n \end{bmatrix},$

so the solution $X$ can be read from the $(1,2)$ block of the sign of the block upper triangular matrix $\bigl[\begin{smallmatrix}A & -C\\ 0& B \end{smallmatrix}\bigr]$. The conditions that $\mathrm{sign}(A)$ and $\mathrm{sign}(B)$ are identity matrices are satisfied for the Lyapunov equation $AX + XA^* = C$ when $A$ is positive stable, that is, when the eigenvalues of $A$ lie in the open right half-plane.

A generalization of this argument shows that the matrix sign function can be used to solve the algebraic Riccati equation $XFX - A^*X - XA - G = 0$, where $F$ and $G$ are Hermitian.

## Application to the Eigenvalue Problem

It is easy to see that $S = \mathrm{sign}(A)$ satisfies $\mathrm{trace}(S) = \mathrm{trace}(\mathrm{diag}(\mathrm{sign}(\lambda_i))) = q - p$, where $p$ and $q$ are the number of eigenvalues in the open left-half plane and open right-half plane, respectively (as above). Since $n = p + q$, we have the formulas

$\notag p = \displaystyle\frac{n - \mathrm{trace}(S)}{2}, \quad q = \displaystyle\frac{n + \mathrm{trace}(S)}{2}.$

More generally, for real $a$ and $b$ with $a < b$,

$\notag \displaystyle\frac{1}{2} \bigl( \mathrm{sign}(A - aI) - \mathrm{sign}(A - bI) \bigr)$

is the number of eigenvalues lying in the vertical strip $\{\, z: \mathop{\mathrm{Re}}z \in(a,b)\,\}$. Formulas also exist to count the number of eigenvalues in rectangles and more complicated regions.

## Computing the Matrix Sign Function

What makes the matrix sign function so interesting and useful is that it can be computed directly without first computing eigenvalues or eigenvectore of $A$. Roberts noted that the iteration

$\notag X_{k+1} = \displaystyle\frac{1}{2} (X_k + X_k^{-1}), \quad X_0 = A,$

converges quadratically to $\mathrm{sign}(A)$. This iteration is Newton’s method applied to the equation $X^2 = I$, with starting matrix $A$. It is one of the rare circumstances in which explicitly inverting matrices is justified!

Various other iterations are available for computing $\mathrm{sign}(A)$. A matrix multiplication-based iteration is the Newton–Schulz iteration

$\notag X_{k+1} = \displaystyle\frac{1}{2} X_k (3I - X_k^2), \quad X_0 = A.$

This iteration is quadratically convergent if $\|I-A^2\| < 1$ for some subordinate matrix norm. The Newton–Schulz iteration is the $[1/0]$ member of a Padé family of rational iterations

$X_{k+1} = X_k p_{\ell m}^{}\left(1-X_k^2\right) q_{\ell m}^{}\left(1-X_k^2\right)^{-1}, \quad X_0 = A,$

where $p_{\ell m}(\xi)/q_{\ell m}(\xi)$ is the $[\ell/m]$ Padé approximant to $(1-\xi)^{-1/2}$ ($p_{\ell m}$ and $q_{\ell m}$ are polynomials of degrees at most $\ell$ and $m$, respectively). The iteration is globally convergent to $\mathrm{sign}(A)$ for $\ell = m-1$ and $\ell = m$, and for $\ell \ge m+1$ it converges when $\|I-A^2\| < 1$, with order of convergence $\ell+m+1$ in all cases.

Although the rate of convergence of these iterations is at least quadratic, and hence asymptotically fast, it can be slow initially. Indeed for $n = 1$, if $|a| \gg 1$ then the Newton iteration computes $x_1 = (a+1/a)/2 \approx a/2$, and so the early iterations make slow progress towards $\pm 1$. Fortunately, it is possible to speed up convergence with the use of scale parameters. The Newton iteration can be replaced by

$\notag X_{k+1} = \displaystyle\frac{1}{2} \bigl(\mu_kX_k + \mu_k^{-1}X_k^{-1}\bigr), \quad X_0 = A,$

with, for example,

$\notag \mu_k = \sqrt{\|X_k^{-1}\|/\|X_k\|}.$

This parameter $\mu_k$ can be computed at no extra cost.

As an example, we took A = gallery('lotkin',4), which has eigenvalues $1.887$, $-1.980\times10^{-1}$, $-1.228\times10^{-2}$, and $-1.441\times10^{-4}$ to four significant figures. After six iterations of the unscaled Newton iteration $X_6$ had an eigenvalue $-100.8$, showing that $X_6$ is far from $\mathrm{sign}(A)$, which has eigenvalues $\pm 1$. Yet when scaled by $\mu_k$ (using the $1$-norm), after six iterations all the eigenvalues of $X$ were within distance $10^{-16}$ of $\pm 1$, and the iteration had converged to within this tolerance.

The Matrix Computation Toolbox contains a MATLAB function signm that computes the matrix sign function. It computes a Schur decomposition then obtains the sign of the triangular Schur factor by a finite recurrence. This function is too expensive for use in applications, but is reliable and is useful for experimentation.

## Relation to Matrix Square Root and Polar Decomposition

The matrix sign function is closely connected with the matrix square root and the polar decomposition. This can be seen through the relations

$\notag \mathrm{sign}\left( \begin{bmatrix} 0 & A \\\ I & 0 \end{bmatrix} \right ) = \begin{bmatrix}0 & A^{1/2} \\ A^{-1/2} & 0 \end{bmatrix}, \\[\smallskipamount]$

for $A$ with no eigenvalues on the nonpositive real axis, and

$\notag \mathrm{sign}\left( \begin{bmatrix} 0 & A \\\ A^* & 0 \end{bmatrix} \right ) = \begin{bmatrix}0 & U \\ U^* & 0 \end{bmatrix},$

for nonsingular $A$, where $A = UH$ is a polar decomposition. Among other things, these relations yield iterations for $A^{1/2}$ and $U$ by applying the iterations above to the relevant block $2n\times 2n$ matrix and reading off the (1,2) block.

## References

This is a minimal set of references, which contain further useful references within.

# What Is a Fractional Matrix Power?

A $p$th root of an $n\times n$ matrix $A$ is a matrix $X$ such that $X^p = A$, and it can be written $X = A^{1/p}$. For a rational number $r = j/k$ (where $j$ and $k$ are integers), defining $A^r$ is more difficult: is it $(A^j)^{1/k}$ or $(A^{1/k})^j$? These two possibilities can be different even for $n = 1$. More generally, how can we define $A^\alpha$ for an arbitrary real number $\alpha$?

Recall, first, that for a nonzero complex scalar $z$ we define $z^\alpha = \mathrm{e}^{\alpha\log z}$, where $\log$ is the principal logarithm: the one taking values in the strip $\{\, z : -\pi < \mathop{\mathrm{Im}} z \le \pi \,\}$. We can generalize this definition to matrices. For a nonsingular matrix $A$ we define

$\notag A^\alpha = \mathrm{e}^{\alpha\log A}.$

Here the logarithm is the principal matrix logarithm, the matrix function built on the principal scalar logarithm, and so the eigenvalues of $\log A$ lie in the strip $\{\, z : -\pi < \mathop{\mathrm{Im}} z \le \pi \,\}$. When $\alpha \in [-1,1]$, the eigenvalues of $A^\alpha$, which are $\lambda^\alpha$ where $\lambda$ is an eigenvalue of $A$, lie in the segment $\{\, z: -|\alpha|\pi < \arg z \le |\alpha|\pi\,\}$ of the complex plane.

The most important special case is $\alpha = 1/p$ for a positive integer $p$, in which case

$\notag A^{1/p} = \mathrm{e}^{\frac{1}{p}\log A}.$

We can check that

$\notag \bigl(A^{1/p}\bigr)^p = \bigl(\mathrm{e}^{\frac{1}{p}\log A}\bigr)^p = \mathrm{e}^{\log A} = A,$

so the definition does indeed produce a $p$th root. The matrix $A^{1/p}$ is called the principal $p$th root.

Returning to the case of rational $p$, we note that

$\notag A^{j/k} = \mathrm{e}^{\frac{j}{k}\log A} = \Bigl(\mathrm{e}^{\frac{1}{k}\log A}\Bigr)^j = (A^{1/k})^j,$

but $(A^{j})^{1/k}$ can be a different matrix. In general, it is not true that $(A^\alpha)^\beta = (A^\beta)^\alpha$ for real $\alpha$ and $\beta$, although for symmetric positive definite matrices this identity does hold because the eigenvalues are real and positive.

An integral expression for $A^\alpha$ valid for $\alpha \in(0,1)$ is

$\notag A^\alpha = \displaystyle\frac{\sin(\alpha\pi)} {\alpha\pi} A \int_0^{\infty}(t^{1/\alpha}I+A)^{-1}\,\mathrm{d}t, \quad \alpha \in(0,1).$

Another representation for $A = t (I - B)$ with $\rho(B) < 1$ is given by the binomial expansion

$\notag A^{\alpha} = t^{\alpha} \displaystyle\sum_{j=0}^\infty {\alpha \choose j}(-B)^j, \quad \rho(B) < 1.$

For $2\times 2$ real matrices of the form

$\notag A = \begin{bmatrix} a & b \\ c & a \end{bmatrix}$

with $bc < 0$ there is an explicit formula for $A^\alpha$. It is easy to see that $A$ has eigenvalues $\lambda_{\pm} = a \pm \mathrm{i} d$, where $d = (-bc)^{1/2}$. Let $\theta = \arg(\lambda_+) \in (0,\pi)$ and $r = |\lambda_+|$. It can be shown that

$\notag A^\alpha = \displaystyle\frac{r^\alpha}{d} \begin{bmatrix} d\cos(\alpha\theta) & b\sin(\alpha\theta) \\ c\sin(\alpha\theta) & d\cos(\alpha\theta) \end{bmatrix}.$

## Computation

The formula $A^\alpha = \mathrm{e}^{\alpha \log A}$ can be used computationally, but it is somewhat indirect in that one must approximate both the exponential and the logarithm. A more direct algorithm based on the Schur decomposition and Padé approximation of the power function is developed by Higham and Lin (2013). MATLAB code is available from MathWorks File Exchange.

If $A$ is diagonalizable, so that $A = XDX^{-1}$ for some nonsingular $X$ with $D = \mathrm{diag}(\lambda_i)$, then $A^\alpha = X D^\alpha X^{-1} = X \mathrm{diag}(\lambda_i^\alpha) X^{-1}$. This formula is safe to use computationally only if $X$ is well conditioned. For defective (non-diagonalizable) $A$ we can express $A^\alpha$ in terms of the Jordan canonical form, but this expression is not useful computationally because the Jordan canonical form cannot be reliably computed.

## Inverse Function

If $X = A^{1/2}$ then $A = X^2$, by the definition of square root. If $X = A^{\alpha}$ does it follow that $A = X^{1/\alpha}$? Clearly, the answer is “no” in general because, for example, $X = A^2$ does not imply $A = X^{1/2}$.

Using the matrix unwinding function it can be shown that $(A^{\alpha})^{1/\alpha} = A$ for $\alpha \in [-1,1]$. Hence the function $g(A) = A^{1/\alpha}$ is the inverse function of $f(A) = A^{\alpha}$ for $\alpha\in[-1,1]$.

## Backward Error

How can we check the quality of an approximation $X$ to $A^{\alpha}$? For $\alpha = 1/p$ we can check the residual $A - X^p$, but for real $p$ there is no natural residual. Instead we can look at the backward error.

For an approximation $X$ to $A^\alpha$, a backward error is a matrix $\Delta A$ such that $X = (A + \Delta A)^\alpha$. Assume that $A$ and $X$ are nonsingular and that $\alpha \in [-1,1]$. Then, as shown in the previous section, $X = (A + \Delta A)^\alpha$ implies $\Delta A = X^{1/\alpha} - A$. Hence the normwise relative backward error is

$\notag \eta(X) = \displaystyle\frac{ \|X^{1/\alpha} - A \|}{\|A\|}, \quad \alpha\in[-1,1].$

## Applications with Stochastic Matrices

An important application of fractional matrix powers is in discrete-time Markov chains, which arise in areas including finance and medicine. A transition matrix for a Markov process is a matrix whose $(i,j)$ element is the probability of moving from state $i$ to state $j$ over a time step. It has nonnegative entries and the rows sum to $1$, so it is a stochastic matrix. In practice, a transition matrix may be estimated for a certain time period, say one year, but a transition matrix for a shorter period, say one month, may be needed. If $A$ is a transition matrix for a time period $p$ then a stochastic $p$th root of $A$ is a transition matrix for a time period a factor $p$ smaller. Therefore $p = 12$ (years to months) and $p = 7$ (weeks to days) are among the values of interest. Unfortunately, $A^{1/p}$ is not necessarily a stochastic matrix. Moreover, $A$ can have a stochastic $p$th root that is not $A^{1/p}$. For example, the stochastic matrix

$\notag A = \left[\begin{array}{ccc} 0 & 1 & 0\\ 0 & 0 & 1\\ 1 & 0 & 0\\ \end{array} \right]$

has principal square root

$\notag A^{1/2} = \frac{1}{3} \left[\begin{array}{rrr} 2 & 2 & -1 \\ -1 & 2 & 2 \\ 2 & -1 & 2 \end{array} \right],$

but $A^{1/2}$ is not stochastic because of the negative elements. The square root

$\notag Y = \left[\begin{array}{ccc} 0 & 0 & 1\\ 1 & 0 & 0\\ 0 & 1 & 0\\ \end{array} \right]$

is stochastic, though. (Interestingly, $A$ is also a square root of $Y$!)

A wide variety configurations is possible as regards existence, nature (primary or nonprimary), and number of stochastic roots. Higham and Lin (2011) delineate the various possibilities that can arise. They note that the stochastic lower triangular matrix

$\notag A_n = \begin{bmatrix} 1 & & & \\ \frac{1}{2} & \frac{1}{2} & & \\ \vdots & \vdots &\ddots & \\ \frac{1}{n} & \frac{1}{n} &\cdots & \frac{1}{n} \end{bmatrix}$

has a stochastic $p$th root, namely $A_n^{1/p}$, for all $p$. For example, to three significant figures,

$\notag A_6^{1/3} = \begin{bmatrix} 1.000 & & & & & \\ 0.206 & 0.794 & & & & \\ 0.106 & 0.201 & 0.693 & & & \\ 0.069 & 0.111 & 0.190 & 0.630 & & \\ 0.050 & 0.075 & 0.109 & 0.181 & 0.585 & \\ 0.039 & 0.056 & 0.076 & 0.107 & 0.172 & 0.550 \\ \end{bmatrix}.$

The existence of stochastic roots of stochastic matrices is connected with the embeddability problem, which asks when a nonsingular stochastic matrix $A$ can be written $A = \mathrm{e}^Q$ for some $Q$ with $q_{ij} \ge 0$ for $i\ne j$ and $\sum_{j=1}^n q_{ij} = 0$, $i=1\colon n$. Kingman showed in 1962 that this condition holds if and only if for every positive integer $p$ there exists a stochastic $X_p$ such that $A = X_p^p$.

## Applications in Fractional Differential Equations

Fractional matrix powers arise in the numerical solution of differential equations of fractional order, especially partial differential equations involving fractional Laplace operators. Here, the problem may be one of computing $A^\alpha b$, in which case for large problems it is preferable to directly approximate $A^\alpha b$, for example by Krylov methods or quadrature methods, rather than to explicitly compute $A^\alpha$.

## References

This is a minimal set of references, which contain further useful references within.

# What Is the Matrix Unwinding Function?

Care is needed when dealing with multivalued functions because identities that hold for positive scalars can fail in the complex plane. For example, it is not always true that $(z-1)^{1/2}(z+1)^{1/2} = (z^2-1)^{1/2}$ or $\log(z_1 z_2) = \log z_1 + \log z_2$ for all $z,z_1,z_2\in\mathbb{C}$. Here, the square root is the principal square root (the one lying in the right half-plane) and the logarithm is the principal logarithm (the one with imaginary part in $(-\pi,\pi]$).

A powerful tool for dealing with multivalued complex functions is the unwinding number, defined for $z\in\mathbb{C}$ by

$\notag \mathcal{U}(z) = \displaystyle\frac{z - \log \mathrm{e}^z}{2\pi \mathrm{i}}.$

The unwinding number provides a correction term for the putative identity $\log \mathrm{e}^z = z$, in that

$\notag \qquad\qquad\qquad\qquad z = \log \mathrm{e}^z + 2\pi \mathrm{i}\, \mathcal{U}(z) \qquad\qquad\qquad\qquad (*)$

for all $z$.

A useful formula for the unwinding number is

$\notag \mathcal{U}(z) = \left\lceil\displaystyle\frac{\mathrm{Im} z - \pi}{2\pi}\right\rceil,$

where $\lceil\cdot\rceil$ is the ceiling function, which returns the smallest integer greater than or equal to its argument. It follows that $\mathcal{U}(z) = 0$ if and only if $\mathrm{Im} z \in (-\pi, \pi]$. Hence $\log \mathrm{e}^z = z$ if and only if $\mathrm{Im} z \in (-\pi, \pi]$.

The unwinding number provides correction terms for various identities. For example, for $z_1,z_2\in\mathbb{C}$, replacing $z$ by $\log z_1 \pm \log z_2$ in $(*$), we have

\notag \begin{aligned} \log z_1 \pm \log z_2 &= \log \bigl( \mathrm{e}^{\log z_1 \pm \log z_2} \bigr) + 2\pi\mathrm{i} \, \mathcal(\log z_1 \pm \log z_2)\\ &= \log \bigl(\mathrm{e}^{\log z_1} \mathrm{e}^{\pm \log z_2} \bigr) + 2\pi\mathrm{i} \, \mathcal(\log z_1 \pm \log z_2)\\ &= \log \bigl( z_1 z_2^{\pm 1} \bigr) + 2\pi\mathrm{i} \, \mathcal(\log z_1 \pm \log z_2). \end{aligned}

This gives the identities

\notag \begin{aligned} \log (z_1 z_2) &= \log z_1 + \log z_2 - 2\pi \mathrm{i}\, \mathcal{U}(\log z_1 +\log z_2), \\ \qquad\qquad\qquad \log (z_1/z_2) &= \log z_1 - \log z_2 - 2\pi \mathrm{i}\,\mathcal{U}(\log z_1 - \log z_2). \qquad\qquad\qquad (\#) \end{aligned}

Note that in textbooks one can find identities such as $\log (z_1 z_2) = \log z_1 + \log z_2$, in which each occurrence of $\log$ is interpreted as a possibly different branch. For computational purposes we want formulas that contain a specific branch, usually the principal branch.

An application of the unwinding number to matrix functions is in computing the logarithm of a $2\times 2$ upper triangular matrix. For any function $f$, we have

$\notag f\left( \begin{bmatrix} \lambda_1 & t_{12} \\ 0 & \lambda_2 \end{bmatrix} \right) = \begin{bmatrix} f(\lambda_1) & t_{12} f[\lambda_1,\lambda_2] \\ 0 & f(\lambda_2) \end{bmatrix},$

where the divided difference

$\notag f[\lambda_1,\lambda_2] = \begin{cases} \displaystyle\frac{f(\lambda_2)-f(\lambda_1)}{\lambda_2-\lambda_1}, & \lambda_1 \ne \lambda_2, \\ f'(\lambda_1), & \lambda_1 = \lambda_2. \end{cases}$

When $\lambda_1 \approx \lambda_2$ this formula suffers from numerical cancellation. For the logarithm, we can rewrite it, using $(\#)$, as

\notag \begin{aligned} \log\lambda_2 - \log\lambda_1 &= \log \left(\frac{\lambda_2}{\lambda_1}\right) + 2\pi \mathrm{i} \, \mathcal{U}(\log \lambda_2 - \log \lambda_1) \\ &= \log \left(\frac{1+z}{1-z}\right) + 2\pi \mathrm{i}\, \mathcal{U}(\log \lambda_2 - \log \lambda_1), \end{aligned}

where $z = (\lambda_2-\lambda_1)/(\lambda_2+\lambda_1)$. Using the hyperbolic arc tangent, defined by

$\notag \mathrm{atanh}(z) = \frac{1}{2}\log\left( \displaystyle\frac{1+z}{1-z} \right),$

we obtain

$\notag f[\lambda_1,\lambda_2] = \displaystyle\frac{2\mathrm{atanh}(z) + 2\pi \mathrm{i}\, \mathcal{U}(\log \lambda_2 - \log \lambda_1)}{\lambda_2-\lambda_1}, \quad \lambda_1 \ne \lambda_2.$

Assuming that we have an accurate $\mathrm{atanh}$ function this formula will provide an accurate value of $f[\lambda_1,\lambda_2]$ provided that $z$ is not too close to $\pm 1$ (the singularities of $\mathrm{atanh}$) and not too large. This formula is used in the MATLAB function logm.

## Matrix Unwinding Function

The unwinding number leads to the matrix unwinding function, defined for $A\in\mathbb{C}^{n\times n}$ by

$\notag \mathcal{U}(A) = \displaystyle\frac{A - \log \mathrm{e}^A}{2\pi \mathrm{i}}.$

Here, $\log$ is the principal matrix logarithm, defined by the property that its eigenvalues have imaginary parts in the interval $(-\pi,\pi]$. It can be shown that $\mathcal{U}(A) = 0$ if and only if the imaginary parts of all the eigenvalues of $A$ lie in the interval $(-\pi, \pi]$. Furthermore, $\mathcal{U}(A)$ is a diagonalizable matrix with integer eigenvalues.

As an example, the matrix

$\notag A = \left[\begin{array}{rrrr} 3 & 1 & -1 & -9\\ -1 & 3 & 9 & -1\\ -1 & -9 & 3 & 1\\ 9 & -1 & -1 & 3 \end{array}\right], \quad \Lambda(A) = \{ 2\pm 8\mathrm{i}, 4 \pm 10\mathrm{i} \}$

has unwinding matrix function

$\notag X = \mathcal{U}(A) = \mathrm{i} \left[\begin{array}{rrrr} 0 & -\frac{1}{2} & 0 & \frac{3}{2}\\ \frac{1}{2} & 0 & -\frac{3}{2} & 0\\ 0 & \frac{3}{2} & 0 & -\frac{1}{2}\\ -\frac{3}{2} & 0 & \frac{1}{2} & 0 \end{array}\right], \quad \Lambda(X) = \{ \pm 1, \pm 2 \}.$

In general, if $A$ is real then $\mathcal{U}(A)$ is pure imaginary as long as $A$ has no eigenvalue with imaginary part that is an odd multiple of $\pi$.

The matrix unwinding function is useful for providing correction terms for matrix identities involving multivalued functions. Here are four useful matrix identities, along with cases in which the correction term vanishes. See Aprahamian and Higham (2014) for proofs.

• For nonsingular $A$ and $\alpha,\beta\in\mathbb{C}$,

$\notag (A^\alpha)^{\beta} = A^{\alpha\beta} \mathrm{e}^{-2\beta\pi \mathrm{i}\, \mathcal{U}(\alpha\log A)}.$

If $\beta$ is an integer then the correction term is $I$. If $\alpha\in(-1,1]$ and $\beta = 1/\alpha$ then $\mathcal{U}(\alpha\log A) = 0$ and so

$\notag (A^\alpha)^{1/\alpha} = A, \quad \alpha \in [-1,1],$

and this equation is obviously true for $\alpha = -1$, too.

• If $A$ and $B$ are nonsingular and $AB = BA$ then

$\notag \log(AB^{\pm1}) = \log A \pm \log B - 2\pi \mathrm{i}\, \mathcal{U}(\log A \pm \log B).$

If $\arg\lambda_i + \arg\mu_i \in(-\pi,\pi]$ for every eigenvalue $\lambda_i$ of $A$ and the corresponding eigenvalue $\mu_i$ of $B$ (there is a correspondence because of the commutativity of $A$ and $B$, which implies that they are simultaneously unitarily diagonalizable), then $\log(AB^{\pm 1}) = \log A \pm \log B$.

• If $A$ and $B$ are nonsingular and $AB = BA$ then for any $\alpha\in\mathbb{C}$,

$\notag (AB)^\alpha = A^\alpha B^\alpha \mathrm{e}^{-2\pi\, \alpha \mathrm{i}\, \mathcal{U}(\log A + \log B)}.$

If $\alpha$ is an integer or the eigenvalues of $A$ and $B$ have arguments in $(-\pi/2,\pi/2]$ then $(AB)^\alpha = A^\alpha B^\alpha$.

• For nonsingular $A$ and $\alpha\in\mathbb{C}$.

$\notag \log A^{\alpha} = \alpha \log A - 2\pi \mathrm{i}\, \mathcal{U}(\alpha \log A).$

If $\alpha\in(-1,1]$ then $\log A^{\alpha} = \alpha \log A$, and this equation also holds for $\alpha = -1$ if $A$ has no negative eigenvalues.

The matrix unwinding function can be computed by an adaptation of the Schur–Parlett algorithm. The algorithm computes a Schur decomposition and re-orders it into a block form with eigenvalues having the same unwinding number in the same diagonal block. The unwinding function of each diagonal block is then a multiple of the identity and the off-diagonal blocks are obtained by the block Parlett recurrence. This approach gives more accurate results than directly evaluating $\mathcal{U}(A)$ from its definition in terms of the matrix logarithm and natrix exponential. A MATLAB code unwindm is available at https://github.com/higham/unwinding

## Matrix Argument Reduction

The matrix unwinding function can be of use computationally for reducing the size of the imaginary parts of the eigenvalues of a matrix. The function

$\notag \mathrm{mod}(A) = A - 2\pi\mathrm{i}\, \mathcal{U}(A)$

has eigenvalues $\lambda$ with $\mathrm{Im} \lambda \in(-\pi,\pi]$. Using $\mathrm{mod}(A)$ in place of $A$ can be useful in computing the matrix exponential by the scaling and squaring algorithm because $\mathrm{e}^A = \mathrm{e}^{\mathrm{mod}(A)}$ and $\mathrm{mod}(A)$ can have a much smaller norm than $A$, giving potential reductions in cost. Combined with the Schur decomposition-based algorithm for $\mathcal{U}(A)$ mentioned above, this idea leads to a numerical method for $\mathrm{e}^A$. See Aprahamian and Higham (2014) for details.

## Round Trip Relations

If you apply a matrix function and then its inverse do you get back to where you started, that is, is $f^{-1}(f(z)) = z$? In principle yes, but if the inverse is multivalued the answer is not immediate. The matrix unwinding function is useful for analyzing such round trip relations. As an example, if $A$ has no eigenvalues with real part of the form $k \pi$ for an integer $k$, then

$\notag \mathrm{acos}(\cos A) = \bigl(A-2\pi\,\mathcal{U} (\mathrm{i} A)\bigr)\mathrm{sign} \bigl(A-2\pi\mathcal{U}( \mathrm{i} A)\bigr).$

Here, $\mathrm{acos}$ is the principal arc cosine defined in Aprahamian and Higham (2016), where this result and analogous results for the arc sine, arc hyperbolic cosine, and arc hyperbolic sine are derived; and $\mathrm{sign}$ is the matrix sign function.

## History

The unwinding number was introduced by Corless, Hare and Jeffrey in 1996, to help implement computer algebra over the complex numbers. It was generalized to the matrix case by Aprahamian and Higham (2014).

## References

This is a minimal set of references, which contain further useful references within.

# What Is a (Non)normal Matrix?

An $n\times n$ matrix is normal if $A^*A = AA^*$, that is, if $A$ commutes with its conjugate transpose. Although the definition is simple to state, its significance is not immediately obvious.

The definition says that the inner product of the $i$th and $j$th columns equals the inner product of the $i$th and $j$th rows for all $i$ and $j$. For $i = j$ this means that the $i$th row and the $i$th column have the same $2$-norm for all $i$. This fact can easily be used to show that a normal triangular matrix must be diagonal. It then follows from the Schur decomposition that $A$ is normal if it is unitarily diagonalizable: $A = QDQ^*$ for some unitary $Q$ and diagonal $D$, where $D$ contains the eigenvalues of $A$ on the diagonal. Thus the normal matrices are those with a complete set of orthonormal eigenvectors.

For a general diagonalizable matrix, $A = XDX^{-1}$, the condition number $\kappa(X) = \|X| \|X^{-1}\|$ can be arbitrarily large, but for a normal matrix $X$ can be taken to have 2-norm condition number $1$. This property makes normal matrices well-behaved for numerical computation.

Many equivalent conditions to $A$ being normal are known: seventy are given by Grone et al. (1987) and a further nineteen are given by Elsner and Ikramov (1998).

The normal matrices include the classes of matrix given in this table:

Real Complex
Diagonal Diagonal
Symmetric Hermitian
Skew-symmetric Skew-Hermitian
Orthogonal Unitary
Circulant Circulant

Circulant matrices are $n\times n$ Toeplitz matrices in which the diagonals wrap around:

$\notag \begin{bmatrix} c_1 & c_n & \dots & c_2 \\ c_2 & c_1 & \dots & \vdots \\ \vdots & \ddots & \ddots & c_n \\ c_n & \dots & c_2 & c_1 \\ \end{bmatrix}.$

They are diagonalized by a unitary matrix known as the discrete Fourier transform matrix, which has $(r,s)$ element $\exp( -2\pi \mathrm{i} (r-1)(s-1) / n )$.

A normal matrix is not necessarily of the form given in the table, even for $n = 2$. Indeed, a $2\times 2$ normal matrix must have one of the forms

$\notag \left[\begin{array}{@{\mskip2mu}rr@{\mskip2mu}} a & b\\ b & c \end{array}\right], \quad \left[\begin{array}{@{}rr@{\mskip2mu}} a & b\\ -b & a \end{array}\right].$

The first matrix is symmetric. The second matrix is of the form $aI + bJ$, where $J = \bigl[\begin{smallmatrix}\!\phantom{-}0 & 1\\\!-1 & 0 \end{smallmatrix}\bigr]$ is skew-symmetric and satisfies $J^2 = -I$, and it has eigenvalues $a \pm \mathrm{i}b$.

It is natural to ask what the commutator $C = AA^*- A^*A$ can look like when $A$ is not normal. One immediate observation is that $C$ has zero trace, so its eigenvalues sum to zero, implying that $C$ is an indefinite Hermitian matrix if it is not zero. Since an indefinite matrix has at least two different nonzero eigenvalues, $C$ cannot be of rank $1$.

In the polar decomposition $A = UH$, where $U$ is unitary and $H$ is Hermitian positive semidefinite, the polar factors commute if and only if $A$ is normal.

The field of values, also known as the numerical range, is defined for $A\in\mathbb{C}^{n\times n}$ by

$F(A) = \biggl\{\, \displaystyle\frac{z^*Az}{z^*z} : 0 \ne z \in \mathbb{C}^n \, \biggr\}.$

The set $F(A)$ is compact and convex (a nontrivial property proved by Toeplitz and Hausdorff), and it contains all the eigenvalues of $A$. Normal matrices have the property that the field of values is the convex hull of the eigenvalues. The next figure illustrates two fields of values, with the eigenvalues plotted as dots. The one on the left is for the nonnormal matrix gallery('smoke',16) and that on the right is for the circulant matrix gallery('circul',x) with x constructed as x = randn(16,1); x = x/norm(x).

## Measures of Nonnormality

How can we measure the degree of nonnormality of a matrix? Let $A$ have the Schur decomposition $A = QTQ^*$, where $Q$ is unitary and $T$ is upper triangular, and write $T = D+M$, where $D = \mathrm{diag}(\lambda_i)$ is diagonal with the eigenvalues of $A$ on its diagonal and $M$ is strictly upper triangular. If $A$ is normal then $M$ is zero, so $\|M\|_F$ is a natural measure of how far $A$ is from being normal. While $M$ depends on $Q$ (which is not unique), its Frobenius norm does not, since $\|A\|_F^2 = \|T\|_F^2 = \|D\|_F^2 + \|M\|_F^2$. Accordingly, Henrici defined the departure from normality by

$\notag \nu(A) = \biggl( \|A\|_F^2 - \displaystyle\sum_{j=1}^n |\lambda_j|^2 \biggr)^{1/2}.$

Henrici (1962) derived an upper bound for $\nu(A)$ and Elsner and Paardekooper (1987) derived a lower bound, both based on the commutator:

$\notag \displaystyle\frac{\|A^*A-AA^*\|_F}{4\|A\|_2} \le \nu(A) \le \Bigl( \displaystyle\frac{n^3-n}{12} \Bigr)^{1/4} \|A^*A-AA^*\|_F^{1/2}.$

The distance to normality is

$\notag d(A) = \min \bigl\{\, \|E\|_F: ~A+E \in \mathbb{C}^{n\times n}~~\mathrm{is~ normal} \,\bigr\}.$

This quantity can be computed by an algorithm of Ruhe (1987). It is trivially bounded above by $\nu(A)$ and is also bounded below by a multiple of it (László, 1994):

$\notag \displaystyle\frac{\nu(A)}{n^{1/2}} \le d(A) \le \nu(A).$

Normal matrices are a particular class of diagonalizable matrices. For diagonalizable matrices various bounds are available that depend on the condition number of a diagonalizing transformation. Since such a transformation is not unique, we take a diagonalization $A = XDX^{-1}$, $D = \mathrm{diag}(\lambda_i)$, with $X$ having minimal 2-norm condition number:

$\kappa_2(X) = \min\{\, \kappa_2(Y): A = YDY^{-1}, ~D~\mathrm{diagonal} \,\}.$

Here are some examples of such bounds. We denote by $\rho(A)$ the spectral radius of $A$, the largest absolute value of any eigenvalue of $A$.

• By taking norms in the eigenvalue-eigenvector equation $Ax = \lambda x$ we obtain $\rho(A) \le \|A\|_2$. Taking norms in $A = XDX^{-1}$ gives $\|A\|_2 \le \kappa_2(X) \|D\|_2 = \kappa_2(X)\rho(A)$. Hence

$\notag \displaystyle\frac{\|A\|_2}{\kappa_2(X)} \le \rho(A) \le \|A\|_2.$

• If $A$ has singular values $\sigma_1 \ge \sigma_2 \ge \cdots \ge \sigma_n$ and its eigenvalues are ordered $|\lambda_1| \ge |\lambda_2| \ge \cdots \ge |\lambda_n|$, then (Ruhe, 1975)

$\notag \displaystyle\frac{\sigma_i(A)}{\kappa_2(X)} \le |\lambda_i(A)| \le \kappa_2(X) \sigma_i(A), \quad i = 1\colon n.$

Note that for $i=1$ the previous upper bound is sharper.

• For any real $p > 0$,

$\notag \displaystyle\frac{\rho(A)^p}{\kappa_2(X)} \le \|A^p\|_2 \le \kappa_2(X) \rho(A)^p.$

• For any function $f$ defined on the spectrum of $A$,

$\notag \displaystyle\frac{\max_i|f(\lambda_i)|}{\kappa_2(X)} \le \|f(A)\|_2 \le \max_i|f(\lambda_i)|.$

For normal $A$ we can take $X$ unitary and so all these bounds are equalities. The condition number $\kappa_2(X)$ can therefore be regarded as another measure of non-normality, as quantified by these bounds.

## References

This is a minimal set of references, which contain further useful references within.

# What Is the Matrix Logarithm?

A logarithm of a square matrix $A$ is a matrix $X$ such that $\mathrm{e}^X = A$, where $\mathrm{e}^X$ is the matrix exponential. Just as in the scalar case, the matrix logarithm is not unique, since if $\mathrm{e}^X = A$ then $\mathrm{e}^{X+2k\pi\mathrm{i}I} = A$ for any integer $k$. However, for matrices the nonuniqueness is complicated by the presence of repeated eigenvalues. For example, the matrix

$\notag X(t) = 2\pi \mathrm{i} \begin{bmatrix}1 & -2t & 2t^2 \\ 0 & -1 & -t \\ 0 & 0 & 0 \end{bmatrix}$

is an upper triangular logarithm of the $3\times 3$ identity matrix $I_3$ for any $t$, whereas the obvious logarithms are the diagonal matrices $2\pi\mskip1mu\mathrm{i}\mskip1mu\mathrm{diag}(k_1,k_2,k_3)$, for integers $k_1$, $k_2$, and $k_3$. Notice that the repeated eigenvalue $1$ of $I_3$ has been mapped to the distinct eigenvalues $2\pi$, $-2\pi$, and $0$ in $X(t)$. This is characteristic of nonprimary logarithms, and in some applications such strange logarithms may be required—an example is the embeddability problem for Markov chains.

An important question is whether a nonsingular real matrix has a real logarithm. The answer is that it does if and only if the Jordan canonical form contains an even number of Jordan blocks of each size for every negative eigenvalue. This means, in particular, that if $A$ has an unrepeated negative eigenvalue then it does not have a real logarithm. Minus the $n\times n$ identity matrix has a real logarithm for even $n$ but not for odd $n$. Indeed, for $n=2$,

$\notag X = \pi\left[\begin{array}{@{}rr@{}} 0 & 1\\ -1 & 0 \end{array}\right]$

satisfies $\mathrm{e}^X = -I_2$, as does $Y = ZXZ^{-1}$ for any nonsingular $Z$, since $\mathrm{e}^Y = Z\mathrm{e}^X Z^{-1} = -Z Z^{-1} = -I_2$.

For most practical purposes it is the principal logarithm that is of interest, defined for any matrix with no negative real eigenvalues as the unique logarithm whose eigenvalues lie in the strip $\{\, z : -\pi < \mathrm{Im}(z) < \pi \,\}$. From this point on we assume that $A$ has no negative eigenvalues and that the logarithm is the principal logarithm.

Various explicit representations of the logarithm are available, including

\notag \begin{aligned} \log A &= \int_0^1 (A-I)\bigl[ t(A-I) + I \bigr]^{-1} \,\mathrm{d}t, \\ \log(I+X) &= X - \frac{X^2}{2} + \frac{X^3}{3} - \frac{X^4}{4} + \cdots, \quad \rho(X)<1, \end{aligned}

where the spectral radius $\rho(X) = \max\{\, |\lambda| : \lambda~\textrm{is an eigenvalue of}~X\,\}$. A useful relation is $\log (A^{\alpha}) = \alpha \log A$ for $\alpha\in[-1,1]$, with important special cases $\log (A^{-1}) = - \log A$ and $\log (A^{1/2}) = \frac{1}{2} \log A$ (where the square root is the principal square root). Recurring the latter expression gives, for any positive integer $k$,

$\notag \log(A) = 2^k \log\bigl(A^{1/2^k}\bigr).$

This formula is the basis for the inverse scaling and squaring method for computing the logarithm, which chooses $k$ so that $E = I - A^{1/2^k}$ is small enough that $\log(I + E)$ can be efficiently approximated by Padé approximation. The MATLAB function logm uses the inverse scaling and squaring method together with a Schur decomposition.

## References

This references contains more on the facts above, as well as further references.