What Is the Matrix Sign Function?

The matrix sign function is the matrix function corresponding to the scalar function of a complex variable

\notag      \mathrm{sign}(z) =      \begin{cases}          1, & \mathop{\mathrm{Re}} z > 0, \\         -1, & \mathop{\mathrm{Re}} z < 0.      \end{cases}

Note that this function is undefined on the imaginary axis. The matrix sign function can be obtained from the Jordan canonical form definition of a matrix function: if A = XJX^{-1} is a Jordan decomposition with \mathrm{diag}(J) = \mathrm{diag}(\lambda_i) then

\notag     S = \mathrm{sign}(A) = X \mathrm{sign}(J)X^{-1}                          = X \mathrm{diag}(\mathrm{sign}(\lambda_i))X^{-1},

since all the derivatives of the sign function are zero. The eigenvalues of S are therefore all \pm 1. Moreover, S^2 = I, so S is an involutory matrix.

The matrix sign function was introduced by Roberts in 1971 as a tool for model reduction and for solving Lyapunov and algebraic Riccati equations. The fundamental property that Roberts employed is that (I+S)/2 and (I-S)/2 are projectors onto the invariant subspaces associated with the eigenvalues of A in the open right-half plane and open left-half plane, respectively. Indeed without loss of generality we can assume that the eigenvalues of A are ordered so that J = \mathrm{diag}(J_1,J_2), with the eigenvalues of J_1\in\mathbb{C}^{p\times p} in the open left half-plane and those of J_2\in\mathbb{C}^{q\times q} in the open right half-plane (p + q = n). Then

\notag     S = X \begin{bmatrix} -I_p & 0 \\                            0   & I_q           \end{bmatrix}X^{-1}

and, writing X = [X_1~X_2], where X_1 is n\times p and X_2 is n\times q, we have

\notag \begin{aligned}   \displaystyle\frac{I+S}{2} &= X \begin{bmatrix} 0 & 0 \\                                                0   & I_q           \end{bmatrix}X^{-1} = X_2 X^{-1}(p+1\colon n,:),\\[\smallskipamount]   \displaystyle\frac{I-S}{2} &= X \begin{bmatrix} I_p & 0 \\                                                  0   & 0           \end{bmatrix}X^{-1} = X_1 X^{-1}(1\colon p,:). \end{aligned}

Also worth noting are the integral representation

\notag  \mathrm{sign}(A) = \displaystyle\frac{2}{\pi}                     A \int_0^{\infty} (t^2I+A^2)^{-1} \,\mathrm{d}t

and the concise formula

\notag   \mathrm{sign}(A) = A (A^2)^{-1/2}.

Application to Sylvester Equation

To see how the matrix sign function can be used, consider the Sylvester equation

\notag       AX - XB = C, \quad A \in \mathbb{C}^{m\times m}, \; B \in                    \mathbb{C}^{n\times n}, \;                     C \in \mathbb{C}^{m\times n}.

This equation is the (1,2) block of the equation

\notag          \begin{bmatrix} A & -C \\                    0 & B \end{bmatrix}            =          \begin{bmatrix} I_m & X   \\                    0   & I_n \end{bmatrix}          \begin{bmatrix} A & 0 \\                    0 & B \end{bmatrix}          \begin{bmatrix} I_m & X   \\                    0   & I_n \end{bmatrix}^{-1}.

If \mathrm{sign}(A) = I and \mathrm{sign}(B) = -I then

\notag    \mathrm{sign} \left(          \begin{bmatrix} A & -C \\                    0 & -B \end{bmatrix}          \right)          =          \begin{bmatrix} I_m &   X \\                    0   & I_n \end{bmatrix}          \begin{bmatrix} I_m & 0    \\                    0   & -I_n \end{bmatrix}          \begin{bmatrix} I_m & -X   \\                    0   & I_n \end{bmatrix} =          \begin{bmatrix} I_m & -2X   \\                    0   &  -I_n \end{bmatrix},

so the solution X can be read from the (1,2) block of the sign of the block upper triangular matrix \bigl[\begin{smallmatrix}A & -C\\ 0& B \end{smallmatrix}\bigr]. The conditions that \mathrm{sign}(A) and \mathrm{sign}(B) are identity matrices are satisfied for the Lyapunov equation AX + XA^* = C when A is positive stable, that is, when the eigenvalues of A lie in the open right half-plane.

A generalization of this argument shows that the matrix sign function can be used to solve the algebraic Riccati equation XFX - A^*X - XA - G = 0, where F and G are Hermitian.

Application to the Eigenvalue Problem

It is easy to see that S = \mathrm{sign}(A) satisfies \mathrm{trace}(S) = \mathrm{trace}(\mathrm{diag}(\mathrm{sign}(\lambda_i))) = q - p, where p and q are the number of eigenvalues in the open left-half plane and open right-half plane, respectively (as above). Since n = p + q, we have the formulas

\notag    p = \displaystyle\frac{n - \mathrm{trace}(S)}{2}, \quad    q = \displaystyle\frac{n + \mathrm{trace}(S)}{2}.

More generally, for real a and b with a < b,

\notag   \displaystyle\frac{1}{2} \bigl( \mathrm{sign}(A - aI) - \mathrm{sign}(A - bI) \bigr)

is the number of eigenvalues lying in the vertical strip \{\, z: \mathop{\mathrm{Re}}z \in(a,b)\,\}. Formulas also exist to count the number of eigenvalues in rectangles and more complicated regions.

Computing the Matrix Sign Function

What makes the matrix sign function so interesting and useful is that it can be computed directly without first computing eigenvalues or eigenvectore of A. Roberts noted that the iteration

\notag     X_{k+1} = \displaystyle\frac{1}{2} (X_k + X_k^{-1}), \quad X_0 = A,

converges quadratically to \mathrm{sign}(A). This iteration is Newton’s method applied to the equation X^2 = I, with starting matrix A. It is one of the rare circumstances in which explicitly inverting matrices is justified!

Various other iterations are available for computing \mathrm{sign}(A). A matrix multiplication-based iteration is the Newton–Schulz iteration

\notag     X_{k+1} = \displaystyle\frac{1}{2} X_k (3I - X_k^2), \quad X_0 = A.

This iteration is quadratically convergent if \|I-A^2\| < 1 for some subordinate matrix norm. The Newton–Schulz iteration is the [1/0] member of a Padé family of rational iterations

X_{k+1} = X_k  p_{\ell m}^{}\left(1-X_k^2\right)  q_{\ell m}^{}\left(1-X_k^2\right)^{-1},             \quad X_0 = A,

where p_{\ell m}(\xi)/q_{\ell m}(\xi) is the [\ell/m] Padé approximant to (1-\xi)^{-1/2} (p_{\ell m} and q_{\ell m} are polynomials of degrees at most \ell and m, respectively). The iteration is globally convergent to \mathrm{sign}(A) for \ell = m-1 and \ell = m, and for \ell \ge m+1 it converges when \|I-A^2\| < 1, with order of convergence \ell+m+1 in all cases.

Although the rate of convergence of these iterations is at least quadratic, and hence asymptotically fast, it can be slow initially. Indeed for n = 1, if |a| \gg 1 then the Newton iteration computes x_1 = (a+1/a)/2 \approx a/2, and so the early iterations make slow progress towards \pm 1. Fortunately, it is possible to speed up convergence with the use of scale parameters. The Newton iteration can be replaced by

\notag   X_{k+1} = \displaystyle\frac{1}{2} \bigl(\mu_kX_k + \mu_k^{-1}X_k^{-1}\bigr), \quad X_0 = A,

with, for example,

\notag     \mu_k = \sqrt{\|X_k^{-1}\|/\|X_k\|}.

This parameter \mu_k can be computed at no extra cost.

As an example, we took A = gallery('lotkin',4), which has eigenvalues 1.887, -1.980\times10^{-1}, -1.228\times10^{-2}, and -1.441\times10^{-4} to four significant figures. After six iterations of the unscaled Newton iteration X_6 had an eigenvalue -100.8, showing that X_6 is far from \mathrm{sign}(A), which has eigenvalues \pm 1. Yet when scaled by \mu_k (using the 1-norm), after six iterations all the eigenvalues of X were within distance 10^{-16} of \pm 1, and the iteration had converged to within this tolerance.

The Matrix Computation Toolbox contains a MATLAB function signm that computes the matrix sign function. It computes a Schur decomposition then obtains the sign of the triangular Schur factor by a finite recurrence. This function is too expensive for use in applications, but is reliable and is useful for experimentation.

Relation to Matrix Square Root and Polar Decomposition

The matrix sign function is closely connected with the matrix square root and the polar decomposition. This can be seen through the relations

\notag     \mathrm{sign}\left( \begin{bmatrix} 0 & A \\\ I & 0 \end{bmatrix} \right )      = \begin{bmatrix}0 & A^{1/2} \\ A^{-1/2} & 0 \end{bmatrix}, \\[\smallskipamount]

for A with no eigenvalues on the nonpositive real axis, and

\notag    \mathrm{sign}\left( \begin{bmatrix} 0 & A \\\ A^* & 0 \end{bmatrix} \right )     = \begin{bmatrix}0 & U \\ U^* & 0 \end{bmatrix},

for nonsingular A, where A = UH is a polar decomposition. Among other things, these relations yield iterations for A^{1/2} and U by applying the iterations above to the relevant block 2n\times 2n matrix and reading off the (1,2) block.

References

This is a minimal set of references, which contain further useful references within.

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

What Is a Fractional Matrix Power?

A pth root of an n\times n matrix A is a matrix X such that X^p = A, and it can be written X = A^{1/p}. For a rational number r = j/k (where j and k are integers), defining A^r is more difficult: is it (A^j)^{1/k} or (A^{1/k})^j? These two possibilities can be different even for n = 1. More generally, how can we define A^\alpha for an arbitrary real number \alpha?

Recall, first, that for a nonzero complex scalar z we define z^\alpha = \mathrm{e}^{\alpha\log z}, where \log is the principal logarithm: the one taking values in the strip \{\, z : -\pi <  \mathop{\mathrm{Im}} z \le \pi \,\}. We can generalize this definition to matrices. For a nonsingular matrix A we define

\notag        A^\alpha = \mathrm{e}^{\alpha\log A}.

Here the logarithm is the principal matrix logarithm, the matrix function built on the principal scalar logarithm, and so the eigenvalues of \log A lie in the strip \{\, z : -\pi <  \mathop{\mathrm{Im}} z \le \pi \,\}. When \alpha \in [-1,1], the eigenvalues of A^\alpha, which are \lambda^\alpha where \lambda is an eigenvalue of A, lie in the segment \{\, z: -|\alpha|\pi < \arg z \le |\alpha|\pi\,\} of the complex plane.

The most important special case is \alpha = 1/p for a positive integer p, in which case

\notag        A^{1/p} = \mathrm{e}^{\frac{1}{p}\log A}.

We can check that

\notag    \bigl(A^{1/p}\bigr)^p = \bigl(\mathrm{e}^{\frac{1}{p}\log A}\bigr)^p                          = \mathrm{e}^{\log A} = A,

so the definition does indeed produce a pth root. The matrix A^{1/p} is called the principal pth root.

Returning to the case of rational p, we note that

\notag    A^{j/k} = \mathrm{e}^{\frac{j}{k}\log A}            = \Bigl(\mathrm{e}^{\frac{1}{k}\log A}\Bigr)^j            = (A^{1/k})^j,

but (A^{j})^{1/k} can be a different matrix. In general, it is not true that (A^\alpha)^\beta = (A^\beta)^\alpha for real \alpha and \beta, although for symmetric positive definite matrices this identity does hold because the eigenvalues are real and positive.

An integral expression for A^\alpha valid for \alpha \in(0,1) is

\notag    A^\alpha  = \displaystyle\frac{\sin(\alpha\pi)} {\alpha\pi} A \int_0^{\infty}(t^{1/\alpha}I+A)^{-1}\,\mathrm{d}t,    \quad \alpha \in(0,1).

Another representation for A = t (I - B) with \rho(B) < 1 is given by the binomial expansion

\notag     A^{\alpha} =  t^{\alpha}          \displaystyle\sum_{j=0}^\infty {\alpha \choose j}(-B)^j,     \quad \rho(B) < 1.

For 2\times 2 real matrices of the form

\notag A =  \begin{bmatrix}        a & b \\        c & a      \end{bmatrix}

with bc < 0 there is an explicit formula for A^\alpha. It is easy to see that A has eigenvalues \lambda_{\pm} = a \pm \mathrm{i} d, where d = (-bc)^{1/2}. Let \theta = \arg(\lambda_+) \in (0,\pi) and r = |\lambda_+|. It can be shown that

\notag A^\alpha = \displaystyle\frac{r^\alpha}{d}        \begin{bmatrix}            d\cos(\alpha\theta) & b\sin(\alpha\theta) \\            c\sin(\alpha\theta) & d\cos(\alpha\theta)        \end{bmatrix}.

Computation

The formula A^\alpha = \mathrm{e}^{\alpha \log A} can be used computationally, but it is somewhat indirect in that one must approximate both the exponential and the logarithm. A more direct algorithm based on the Schur decomposition and Padé approximation of the power function is developed by Higham and Lin (2013). MATLAB code is available from MathWorks File Exchange.

If A is diagonalizable, so that A = XDX^{-1} for some nonsingular X with D = \mathrm{diag}(\lambda_i), then A^\alpha = X D^\alpha X^{-1} = X \mathrm{diag}(\lambda_i^\alpha) X^{-1}. This formula is safe to use computationally only if X is well conditioned. For defective (non-diagonalizable) A we can express A^\alpha in terms of the Jordan canonical form, but this expression is not useful computationally because the Jordan canonical form cannot be reliably computed.

Inverse Function

If X = A^{1/2} then A = X^2, by the definition of square root. If X = A^{\alpha} does it follow that A = X^{1/\alpha}? Clearly, the answer is “no” in general because, for example, X = A^2 does not imply A = X^{1/2}.

Using the matrix unwinding function it can be shown that (A^{\alpha})^{1/\alpha} = A for \alpha \in [-1,1]. Hence the function g(A) = A^{1/\alpha} is the inverse function of f(A) = A^{\alpha} for \alpha\in[-1,1].

Backward Error

How can we check the quality of an approximation X to A^{\alpha}? For \alpha = 1/p we can check the residual A - X^p, but for real p there is no natural residual. Instead we can look at the backward error.

For an approximation X to A^\alpha, a backward error is a matrix \Delta A such that X = (A + \Delta A)^\alpha. Assume that A and X are nonsingular and that \alpha \in [-1,1]. Then, as shown in the previous section, X = (A + \Delta A)^\alpha implies \Delta A = X^{1/\alpha} - A. Hence the normwise relative backward error is

\notag      \eta(X) = \displaystyle\frac{ \|X^{1/\alpha} - A \|}{\|A\|}, \quad \alpha\in[-1,1].

Applications with Stochastic Matrices

An important application of fractional matrix powers is in discrete-time Markov chains, which arise in areas including finance and medicine. A transition matrix for a Markov process is a matrix whose (i,j) element is the probability of moving from state i to state j over a time step. It has nonnegative entries and the rows sum to 1, so it is a stochastic matrix. In practice, a transition matrix may be estimated for a certain time period, say one year, but a transition matrix for a shorter period, say one month, may be needed. If A is a transition matrix for a time period p then a stochastic pth root of A is a transition matrix for a time period a factor p smaller. Therefore p = 12 (years to months) and p = 7 (weeks to days) are among the values of interest. Unfortunately, A^{1/p} is not necessarily a stochastic matrix. Moreover, A can have a stochastic pth root that is not A^{1/p}. For example, the stochastic matrix

\notag    A = \left[\begin{array}{ccc}        0  & 1 & 0\\        0  & 0 & 1\\        1  & 0 & 0\\        \end{array} \right]

has principal square root

\notag    A^{1/2} = \frac{1}{3} \left[\begin{array}{rrr}        2  & 2 & -1 \\        -1 & 2 & 2 \\        2 & -1 & 2        \end{array} \right],

but A^{1/2} is not stochastic because of the negative elements. The square root

\notag         Y = \left[\begin{array}{ccc}          0  & 0  &  1\\          1  & 0  &  0\\          0  & 1  &  0\\          \end{array} \right]

is stochastic, though. (Interestingly, A is also a square root of Y!)

A wide variety configurations is possible as regards existence, nature (primary or nonprimary), and number of stochastic roots. Higham and Lin (2011) delineate the various possibilities that can arise. They note that the stochastic lower triangular matrix

\notag    A_n = \begin{bmatrix}             1         &                   &           &    \\      \frac{1}{2}      &  \frac{1}{2}      &           &    \\      \vdots           &  \vdots           &\ddots     &    \\      \frac{1}{n}      &  \frac{1}{n}      &\cdots     &  \frac{1}{n}    \end{bmatrix}

has a stochastic pth root, namely A_n^{1/p}, for all p. For example, to three significant figures,

\notag  A_6^{1/3} = \begin{bmatrix} 1.000  &        &        &        &        &        \\ 0.206  & 0.794  &        &        &        &        \\ 0.106  & 0.201  & 0.693  &        &        &        \\ 0.069  & 0.111  & 0.190  & 0.630  &        &        \\ 0.050  & 0.075  & 0.109  & 0.181  & 0.585  &        \\ 0.039  & 0.056  & 0.076  & 0.107  & 0.172  & 0.550  \\ \end{bmatrix}.

The existence of stochastic roots of stochastic matrices is connected with the embeddability problem, which asks when a nonsingular stochastic matrix A can be written A = \mathrm{e}^Q for some Q with q_{ij} \ge 0 for i\ne j and \sum_{j=1}^n q_{ij} = 0, i=1\colon n. Kingman showed in 1962 that this condition holds if and only if for every positive integer p there exists a stochastic X_p such that A = X_p^p.

Applications in Fractional Differential Equations

Fractional matrix powers arise in the numerical solution of differential equations of fractional order, especially partial differential equations involving fractional Laplace operators. Here, the problem may be one of computing A^\alpha b, in which case for large problems it is preferable to directly approximate A^\alpha b, for example by Krylov methods or quadrature methods, rather than to explicitly compute A^\alpha.

References

This is a minimal set of references, which contain further useful references within.

What Is the Matrix Unwinding Function?

Care is needed when dealing with multivalued functions because identities that hold for positive scalars can fail in the complex plane. For example, it is not always true that (z-1)^{1/2}(z+1)^{1/2} = (z^2-1)^{1/2} or \log(z_1 z_2) = \log z_1 + \log z_2 for all z,z_1,z_2\in\mathbb{C}. Here, the square root is the principal square root (the one lying in the right half-plane) and the logarithm is the principal logarithm (the one with imaginary part in (-\pi,\pi]).

A powerful tool for dealing with multivalued complex functions is the unwinding number, defined for z\in\mathbb{C} by

\notag     \mathcal{U}(z) = \displaystyle\frac{z - \log \mathrm{e}^z}{2\pi \mathrm{i}}.

The unwinding number provides a correction term for the putative identity \log \mathrm{e}^z = z, in that

\notag    \qquad\qquad\qquad\qquad     z = \log \mathrm{e}^z + 2\pi \mathrm{i}\, \mathcal{U}(z)    \qquad\qquad\qquad\qquad (*)

for all z.

A useful formula for the unwinding number is

\notag    \mathcal{U}(z) = \left\lceil\displaystyle\frac{\mathrm{Im} z - \pi}{2\pi}\right\rceil,

where \lceil\cdot\rceil is the ceiling function, which returns the smallest integer greater than or equal to its argument. It follows that \mathcal{U}(z) = 0 if and only if \mathrm{Im} z \in (-\pi, \pi]. Hence \log \mathrm{e}^z = z if and only if \mathrm{Im} z \in (-\pi, \pi].

The unwinding number provides correction terms for various identities. For example, for z_1,z_2\in\mathbb{C}, replacing z by \log z_1 \pm \log z_2 in (*), we have

\notag \begin{aligned}     \log z_1 \pm \log z_2       &= \log \bigl( \mathrm{e}^{\log z_1 \pm \log z_2} \bigr)          + 2\pi\mathrm{i} \, \mathcal(\log z_1 \pm \log z_2)\\       &= \log \bigl(\mathrm{e}^{\log z_1} \mathrm{e}^{\pm \log z_2} \bigr)          + 2\pi\mathrm{i} \, \mathcal(\log z_1 \pm \log z_2)\\       &= \log \bigl( z_1 z_2^{\pm 1} \bigr)          + 2\pi\mathrm{i} \, \mathcal(\log z_1 \pm \log z_2). \end{aligned}

This gives the identities

\notag \begin{aligned} \log (z_1 z_2) &= \log z_1 + \log z_2 - 2\pi \mathrm{i}\, \mathcal{U}(\log z_1 +\log z_2), \\  \qquad\qquad\qquad   \log (z_1/z_2) &= \log z_1 - \log z_2 -       2\pi \mathrm{i}\,\mathcal{U}(\log z_1 - \log z_2).  \qquad\qquad\qquad (\#) \end{aligned}

Note that in textbooks one can find identities such as \log (z_1 z_2) = \log z_1 + \log z_2, in which each occurrence of \log is interpreted as a possibly different branch. For computational purposes we want formulas that contain a specific branch, usually the principal branch.

An application of the unwinding number to matrix functions is in computing the logarithm of a 2\times 2 upper triangular matrix. For any function f, we have

\notag     f\left( \begin{bmatrix}                      \lambda_1 & t_{12}  \\                            0   & \lambda_2             \end{bmatrix} \right)          = \begin{bmatrix}               f(\lambda_1) & t_{12} f[\lambda_1,\lambda_2] \\                 0          & f(\lambda_2)             \end{bmatrix},

where the divided difference

\notag   f[\lambda_1,\lambda_2]    = \begin{cases}      \displaystyle\frac{f(\lambda_2)-f(\lambda_1)}{\lambda_2-\lambda_1},      & \lambda_1 \ne \lambda_2, \\      f'(\lambda_1), & \lambda_1 = \lambda_2.     \end{cases}

When \lambda_1 \approx \lambda_2 this formula suffers from numerical cancellation. For the logarithm, we can rewrite it, using (\#), as

\notag  \begin{aligned}   \log\lambda_2 - \log\lambda_1   &= \log \left(\frac{\lambda_2}{\lambda_1}\right) + 2\pi \mathrm{i} \,        \mathcal{U}(\log \lambda_2 - \log \lambda_1)       \\   &= \log \left(\frac{1+z}{1-z}\right) + 2\pi \mathrm{i}\,     \mathcal{U}(\log \lambda_2 - \log \lambda_1),  \end{aligned}

where z = (\lambda_2-\lambda_1)/(\lambda_2+\lambda_1). Using the hyperbolic arc tangent, defined by

\notag     \mathrm{atanh}(z) = \frac{1}{2}\log\left( \displaystyle\frac{1+z}{1-z} \right),

we obtain

\notag   f[\lambda_1,\lambda_2]    = \displaystyle\frac{2\mathrm{atanh}(z) + 2\pi \mathrm{i}\,      \mathcal{U}(\log \lambda_2 - \log \lambda_1)}{\lambda_2-\lambda_1},    \quad \lambda_1 \ne \lambda_2.

Assuming that we have an accurate \mathrm{atanh} function this formula will provide an accurate value of f[\lambda_1,\lambda_2] provided that z is not too close to \pm 1 (the singularities of \mathrm{atanh}) and not too large. This formula is used in the MATLAB function logm.

Matrix Unwinding Function

The unwinding number leads to the matrix unwinding function, defined for A\in\mathbb{C}^{n\times n} by

\notag   \mathcal{U}(A) = \displaystyle\frac{A - \log \mathrm{e}^A}{2\pi \mathrm{i}}.

Here, \log is the principal matrix logarithm, defined by the property that its eigenvalues have imaginary parts in the interval (-\pi,\pi]. It can be shown that \mathcal{U}(A) = 0 if and only if the imaginary parts of all the eigenvalues of A lie in the interval (-\pi, \pi]. Furthermore, \mathcal{U}(A) is a diagonalizable matrix with integer eigenvalues.

As an example, the matrix

\notag   A = \left[\begin{array}{rrrr}    3 & 1 & -1 & -9\\    -1 & 3 & 9 & -1\\    -1 & -9 & 3 & 1\\    9 & -1 & -1 & 3 \end{array}\right],    \quad \Lambda(A) = \{ 2\pm 8\mathrm{i}, 4 \pm 10\mathrm{i} \}

has unwinding matrix function

\notag   X = \mathcal{U}(A) = \mathrm{i} \left[\begin{array}{rrrr}    0 & -\frac{1}{2} & 0 & \frac{3}{2}\\    \frac{1}{2} & 0 & -\frac{3}{2} & 0\\    0 & \frac{3}{2} & 0 & -\frac{1}{2}\\    -\frac{3}{2} & 0 & \frac{1}{2} & 0 \end{array}\right],    \quad \Lambda(X) = \{ \pm 1, \pm 2 \}.

In general, if A is real then \mathcal{U}(A) is pure imaginary as long as A has no eigenvalue with imaginary part that is an odd multiple of \pi.

The matrix unwinding function is useful for providing correction terms for matrix identities involving multivalued functions. Here are four useful matrix identities, along with cases in which the correction term vanishes. See Aprahamian and Higham (2014) for proofs.

  • For nonsingular A and \alpha,\beta\in\mathbb{C},

    \notag   (A^\alpha)^{\beta} = A^{\alpha\beta}   \mathrm{e}^{-2\beta\pi \mathrm{i}\, \mathcal{U}(\alpha\log A)}.

    If \beta is an integer then the correction term is I. If \alpha\in(-1,1] and \beta = 1/\alpha then \mathcal{U}(\alpha\log A) = 0 and so

    \notag      (A^\alpha)^{1/\alpha} = A, \quad \alpha \in [-1,1],

    and this equation is obviously true for \alpha = -1, too.

  • If A and B are nonsingular and AB = BA then

    \notag      \log(AB^{\pm1}) = \log A \pm \log B - 2\pi \mathrm{i}\, \mathcal{U}(\log A \pm \log B).

    If \arg\lambda_i + \arg\mu_i \in(-\pi,\pi] for every eigenvalue \lambda_i of A and the corresponding eigenvalue \mu_i of B (there is a correspondence because of the commutativity of A and B, which implies that they are simultaneously unitarily diagonalizable), then \log(AB^{\pm 1}) = \log A \pm \log B.

  • If A and B are nonsingular and AB = BA then for any \alpha\in\mathbb{C},

    \notag   (AB)^\alpha = A^\alpha B^\alpha \mathrm{e}^{-2\pi\, \alpha \mathrm{i}\, \mathcal{U}(\log A + \log B)}.

    If \alpha is an integer or the eigenvalues of A and B have arguments in (-\pi/2,\pi/2] then (AB)^\alpha = A^\alpha B^\alpha.

  • For nonsingular A and \alpha\in\mathbb{C}.

    \notag     \log A^{\alpha} = \alpha \log A - 2\pi \mathrm{i}\, \mathcal{U}(\alpha \log A).

    If \alpha\in(-1,1] then \log A^{\alpha} = \alpha \log A, and this equation also holds for \alpha = -1 if A has no negative eigenvalues.

The matrix unwinding function can be computed by an adaptation of the Schur–Parlett algorithm. The algorithm computes a Schur decomposition and re-orders it into a block form with eigenvalues having the same unwinding number in the same diagonal block. The unwinding function of each diagonal block is then a multiple of the identity and the off-diagonal blocks are obtained by the block Parlett recurrence. This approach gives more accurate results than directly evaluating \mathcal{U}(A) from its definition in terms of the matrix logarithm and natrix exponential. A MATLAB code unwindm is available at https://github.com/higham/unwinding

Matrix Argument Reduction

The matrix unwinding function can be of use computationally for reducing the size of the imaginary parts of the eigenvalues of a matrix. The function

\notag   \mathrm{mod}(A) = A - 2\pi\mathrm{i}\, \mathcal{U}(A)

has eigenvalues \lambda with \mathrm{Im} \lambda \in(-\pi,\pi]. Using \mathrm{mod}(A) in place of A can be useful in computing the matrix exponential by the scaling and squaring algorithm because \mathrm{e}^A = \mathrm{e}^{\mathrm{mod}(A)} and \mathrm{mod}(A) can have a much smaller norm than A, giving potential reductions in cost. Combined with the Schur decomposition-based algorithm for \mathcal{U}(A) mentioned above, this idea leads to a numerical method for \mathrm{e}^A. See Aprahamian and Higham (2014) for details.

Round Trip Relations

If you apply a matrix function and then its inverse do you get back to where you started, that is, is f^{-1}(f(z)) = z? In principle yes, but if the inverse is multivalued the answer is not immediate. The matrix unwinding function is useful for analyzing such round trip relations. As an example, if A has no eigenvalues with real part of the form k \pi for an integer k, then

\notag    \mathrm{acos}(\cos A) = \bigl(A-2\pi\,\mathcal{U}    (\mathrm{i} A)\bigr)\mathrm{sign} \bigl(A-2\pi\mathcal{U}( \mathrm{i} A)\bigr).

Here, \mathrm{acos} is the principal arc cosine defined in Aprahamian and Higham (2016), where this result and analogous results for the arc sine, arc hyperbolic cosine, and arc hyperbolic sine are derived; and \mathrm{sign} is the matrix sign function.

History

The unwinding number was introduced by Corless, Hare and Jeffrey in 1996, to help implement computer algebra over the complex numbers. It was generalized to the matrix case by Aprahamian and Higham (2014).

References

This is a minimal set of references, which contain further useful references within.

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

What Is the Matrix Logarithm?

A logarithm of a square matrix A is a matrix X such that \mathrm{e}^X = A, where \mathrm{e}^X is the matrix exponential. Just as in the scalar case, the matrix logarithm is not unique, since if \mathrm{e}^X = A then \mathrm{e}^{X+2k\pi\mathrm{i}I} = A for any integer k. However, for matrices the nonuniqueness is complicated by the presence of repeated eigenvalues. For example, the matrix

\notag    X(t) =     2\pi \mathrm{i}       \begin{bmatrix}1 & -2t & 2t^2 \\                      0 &  -1 & -t \\                      0 &  0 &  0       \end{bmatrix}

is an upper triangular logarithm of the 3\times 3 identity matrix I_3 for any t, whereas the obvious logarithms are the diagonal matrices 2\pi\mskip1mu\mathrm{i}\mskip1mu\mathrm{diag}(k_1,k_2,k_3), for integers k_1, k_2, and k_3. Notice that the repeated eigenvalue 1 of I_3 has been mapped to the distinct eigenvalues 2\pi, -2\pi, and 0 in X(t). This is characteristic of nonprimary logarithms, and in some applications such strange logarithms may be required—an example is the embeddability problem for Markov chains.

An important question is whether a nonsingular real matrix has a real logarithm. The answer is that it does if and only if the Jordan canonical form contains an even number of Jordan blocks of each size for every negative eigenvalue. This means, in particular, that if A has an unrepeated negative eigenvalue then it does not have a real logarithm. Minus the n\times n identity matrix has a real logarithm for even n but not for odd n. Indeed, for n=2,

\notag    X =  \pi\left[\begin{array}{@{}rr@{}}              0 & 1\\              -1 & 0             \end{array}\right]

satisfies \mathrm{e}^X = -I_2, as does Y = ZXZ^{-1} for any nonsingular Z, since \mathrm{e}^Y = Z\mathrm{e}^X Z^{-1} = -Z Z^{-1}  = -I_2.

For most practical purposes it is the principal logarithm that is of interest, defined for any matrix with no negative real eigenvalues as the unique logarithm whose eigenvalues lie in the strip \{\, z : -\pi < \mathrm{Im}(z)  < \pi \,\}. From this point on we assume that A has no negative eigenvalues and that the logarithm is the principal logarithm.

Various explicit representations of the logarithm are available, including

\notag \begin{aligned}       \log A  &= \int_0^1 (A-I)\bigl[ t(A-I) + I \bigr]^{-1} \,\mathrm{d}t, \\     \log(I+X) &= X - \frac{X^2}{2} + \frac{X^3}{3}                    - \frac{X^4}{4} + \cdots, \quad \rho(X)<1, \end{aligned}

where the spectral radius \rho(X) = \max\{\, |\lambda| : \lambda~\textrm{is an eigenvalue of}~X\,\}. A useful relation is \log (A^{\alpha}) = \alpha \log A for \alpha\in[-1,1], with important special cases \log (A^{-1}) = - \log A and \log (A^{1/2}) = \frac{1}{2} \log A (where the square root is the principal square root). Recurring the latter expression gives, for any positive integer k,

\notag       \log(A) = 2^k \log\bigl(A^{1/2^k}\bigr).

This formula is the basis for the inverse scaling and squaring method for computing the logarithm, which chooses k so that E = I - A^{1/2^k} is small enough that \log(I + E) can be efficiently approximated by Padé approximation. The MATLAB function logm uses the inverse scaling and squaring method together with a Schur decomposition.

References

This references contains more on the facts above, as well as further references.

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

What Is a Matrix Function?

If you give an array as the argument to a mathematical function in a programming language or problem solving environment you are likely to receive the result of applying the function to each component of the array. For example, here MATLAB exponentiates the integers from 0 to 3:

>> A = [0 1; 2 3], X = exp(A)
A =
    0     1
    2     3
X =
   1.0000e+00   2.7183e+00
   7.3891e+00   2.0086e+01

When the array represents a matrix this componentwise evaluation is not useful, as it does not obey the rules of matrix algebra. To see what properties we might require, consider the matrix square root. This function is notable historically because Cayley considered it in the 1858 paper in which he introduced matrix algebra.

A square root of an n\times n matrix A is a matrix X such that X^2 = A. Any square root X satisfies

AX = X^2 X = X X^2 = XA,

so X commutes with A. Also, A^* = (X^2)^* = (X^*)^2, so X^* is a square root of A^* (here, the superscript * denotes the conjugate transpose).

Furthermore, for any nonsingular matrix Z we have

(Z^{-1}XZ)^2 = Z^{-1}X^2 Z = Z^{-1}A Z.

If we choose Z as a matrix that takes A to its Jordan canonical form J then we have (Z^{-1}XZ)^2 = J, so that Z^{-1}XZ is a square root of J, or in other words X can be expressed as X = Z \sqrt{J} Z^{-1}.

Generalizing from these properties of the matrix square root it is reasonable to ask that a function f(A) of a square matrix A satisfies

  • f(A) commutes with A,
  • f(A^*) = f(A)^*,
  • f(A) = Z f(J)Z^{-1}, where A has the Jordan canonical form A = ZJZ^{-1}.

(These are of course not the only requirements; indeed f(A) \equiv I satisfies all three conditions.)

Many definitions of f(A) can be given that satisfy these and other natural conditions. We will describe just three definitions (a notable omission is a definition in terms of polynomial interpolation).

Power Series

For a function f that has a power series expansion we can define f(A) by substituting A for the variable. It can be shown that the matrix series will be convergent for matrices whose eigenvalues lie within the radius of convergence of the scalar power series. For example, where \rho denotes the spectral radius. This definition is natural for functions that have a power series expansion, but it is rather limited in its applicability.

\begin{aligned}    \cos(A) &= I - \displaystyle\frac{A^2}{2!} + \frac{A^4}{4!} - \frac{A^6}{6!} +                    \cdots,\\   \log(I+A) &= A - \displaystyle\frac{A^2}{2} + \frac{A^3}{3} - \frac{A^4}{4} +                     \cdots, \quad \rho(A)<1,   \end{aligned}

Jordan Canonical Form Definition

If A has the Jordan canonical form

Z^{-1}AZ = J = \mathrm{diag}(J_1, J_2, \dots, J_p),

where

J_k = J_k(\lambda_k) = \begin{bmatrix}                 \lambda_k & 1         &          &           \\                           & \lambda_k & \ddots   &           \\                           &           & \ddots   &    1      \\                           &           &          & \lambda_k       \end{bmatrix}       = \lambda_k I + N \in \mathbb{C}^{m_k\times m_k},

then

f(A) := Z f(J) Z^{-1} = Z \mathrm{diag}(f(J_k)) Z^{-1},

where

f(J_k) :=  \begin{bmatrix}   f(\lambda_k) & f'(\lambda_k) &  \dots  & \displaystyle\frac{f^{(m_k-1)}(\lambda_k)}{(m_k-1)!} \\          & f(\lambda_k)  & \ddots  & \vdots \\          &          & \ddots  & f'(\lambda_k) \\          &          &         & f(\lambda_k)      \end{bmatrix}.

Notice that f(J_k) has the derivatives of f along its diagonals in the upper triangle. Write J_k = \lambda_k I + N, where N is zero apart from a superdiagonal of 1s. The formula for f(J_k) is just the Taylor series expansion

f(J_k) = f(\lambda_k I + N) = f(\lambda_k)I + f'(\lambda_k)N +             \displaystyle\frac{f''(\lambda_k)}{2!}N^2 + \cdots +             \displaystyle\frac{f^{(m_k-1)}(\lambda_k)}{(m_k-1)!}N^{m_k-1}.

The series is finite because N^{m_k} = 0: as N is powered up the superdiagonal of 1s moves towards the right-hand corner until it eventually disappears.

An immediate consequence of this formula is that f(A) is defined only if the necessary derivatives exist: for each eigenvalue \lambda_k we need the existence of the derivatives at \lambda_k of order up to one less than the size of the largest Jordan block in which \lambda_k appears.

When A is a diagonalizable matrix the definition simplifies considerably: if A = ZDZ^{-1} with D = \mathrm{diag}(\lambda_i) then f(A) = Zf(D)Z^{-1}  = Z \mathrm{diag}(\lambda_i) Z^{-1}.

Cauchy Integral Formula

For a function f analytic on and inside a closed contour \Gamma that encloses the spectrum of A the matrix f(A) can be defined by the Cauchy integral formula

f(A) := \displaystyle\frac{1}{2\pi \mathrm{i}}      \int_\Gamma f(z) (zI-A)^{-1} \,\mathrm{d}z.

This formula is mathematically elegant and can be used to provide short proofs of certain theoretical results.

Equivalence of Definitions

These and several other definitions turn out to be equivalent under suitable conditions. This was recognized by Rinehart, who wrote in 1955

“There have been proposed in the literature since 1880 eight distinct definitions of a matric function, by Weyr, Sylvester and Buchheim, Giorgi, Cartan, Fantappiè, Cipolla, Schwerdtfeger and Richter … All of the definitions except those of Weyr and Cipolla are essentially equivalent.”

Computing a Matrix Function in MATLAB

In MATLAB, matrix functions are distinguished from componentwise array evaluation by the postfix “m” on a function name. Thus expm, logm, and sqrtm are matrix function counterparts of exp, log, and sqrt. Compare the example at the start of this article with

>> A = [0 1; 2 3], X = expm(A)
A =
    0     1
    2     3
X =
   5.2892e+00   8.4033e+00
   1.6807e+01   3.0499e+01

The MATLAB function funm calculates f(A) for general matrix functions f (subject to some restrictions).

References

This is a minimal set of references, which contain further useful references within.

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

What Is the Matrix Exponential?

The exponential of a square matrix A is defined by the power series (introduced by Laguerre in 1867)

\mathrm{e}^A = I + A + \displaystyle\frac{A^2}{2!} +                \frac{A^3}{3!} + \cdots.

That the series converges follows from the convergence of the series for scalars. Various other formulas are available, such as

\mathrm{e}^A = \displaystyle\lim_{s\to \infty} (I+A/s)^s.

The matrix exponential is always nonsingular and (\mathrm{e}^A)^{-1} = \mathrm{e}^{-A}.

Much interest lies in the connection between \mathrm{e}^{A+B} and \mathrm{e}^A  \mathrm{e}^B. It is easy to show that \mathrm{e}^{A+B} = \mathrm{e}^A \mathrm{e}^B if A and B commute, but commutativity is not necessary for the equality to hold. Series expansions are available that relate \mathrm{e}^{A+B} to \mathrm{e}^A \mathrm{e}^B for general A and B, including the Baker–Campbell–Hausdorff formula and the Zassenhaus formula, both of which involve the commutator [A,B] = AB - BA. For Hermitian A and B the inequality \mathrm{trace}(\mathrm{e}^{A+B}) \le \mathrm{trace}(\mathrm{e}^A \mathrm{e}^B) was proved independently by Golden and Thompson in 1965.

Especially important is the relation

\mathrm{e}^A = \bigl(\mathrm{e}^{A/2^s}\bigr)^{2^s},

for integer s, which is used in the scaling and squaring method for computing the matrix exponential.

Another important property of the matrix exponential is that it maps skew-symmetric matrices to orthogonal ones. Indeed if A = - A^T then

\bigl(\mathrm{e}^A\bigr)^{-1}    = \mathrm{e}^{-A}    = \mathrm{e}^{A^T}    = \bigl(\mathrm{e}^A\bigr)^T.

This is a special case of the fact that the exponential maps elements of a Lie algebra into the corresponding Lie group.

The matrix exponential plays a fundamental role in linear ordinary differential equations (ODEs). The vector ODE

\displaystyle\frac{dy}{dt} = A y, \quad y(0) = c

has solution y(t) = \mathrm{e}^{At} c, while the solution of the ODE in n \times n matrices

\displaystyle\frac{dY}{dt} = AY + YB, \quad Y(0) = C

is Y(t) = \mathrm{e}^{At}Ce^{Bt}.

In control theory, the matrix exponential is used in converting from continuous time dynamical systems to discrete time ones. Another application of the matrix exponential is in centrality measures for nodes in networks.

Many methods have been proposed for computing the matrix exponential. See the references for details.

References

This is a minimal set of references, which contain further useful references within.

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

What Is a Matrix Square Root?

A square root of an n\times n matrix A is any matrix X such that X^2 = A.

For a scalar a (n = 1), there are two square roots (which are equal if a = 0), and they are real if and only if a is real and nonnegative. For n \ge 2, depending on the matrix there can be no square roots, finitely many, or infinitely many. The matrix

A = \begin{bmatrix}      0  &  1 \\      0  &  0      \end{bmatrix}

is easily seen to have no square roots. The matrix

D =  \mathrm{diag}(1,2) =  \begin{bmatrix}                                1  &  0 \\                                0  &  2                               \end{bmatrix}

has four square roots, \mathrm{diag}(\pm 1, \pm\sqrt{2}). The identity matrix

\begin{bmatrix}      1  &  0 \\      0  &  1      \end{bmatrix}

has infinitely many square roots (namely the 2\times 2 involutory matrices), including \mathrm{diag}(\pm 1, \pm 1), the lower triangular matrix

\begin{bmatrix}      1  &  0 \\      1  &  -1      \end{bmatrix},

and any symmetric orthogonal matrix, such as

\begin{bmatrix}      \cos \theta  &  \sin \theta \\      \sin \theta  & -\cos \theta      \end{bmatrix}, \quad \theta \in[0,2\pi]

(which is a Householder matrix). Clearly, a square root of a diagonal matrix need not be diagonal.

The matrix square root of most practical interest is the one whose eigenvalues lie in the right half-plane, which is called the principal square root, written A^{1/2}. If A is nonsingular and has no eigenvalues on the negative real axis then A has a unique principal square root. For the diagonal matrix D above, D^{1/2} = \mathrm{diag}(1,\sqrt{2}).

A symmetric positive definite matrix has a unique symmetric positive definite square root. Indeed if A is symmetric positive definite then it has a spectral decomposition A = QDQ^T, where Q is orthogonal and D is diagonal with positive diagonal elements, and then A^{1/2} = Q D^{1/2}Q^T is also symmetric positive definite.

If A is nonsingular then it has at least 2^s square roots, where s is the number of distinct eigenvalues. The existence of a square root of a singular matrix depends on the Jordan structure of the zero eigenvalues.

In some contexts involving symmetric positive definite matrices A, such as Kalman filtering, a matrix Y such that A = Y^TY is called a square root, but this is not the standard meaning.

When A has structure one can ask whether a square root having the same structure, or some other related structure, exists. Results are known for (for example)

  • stochastic matrices,
  • M-matrices,
  • skew-Hamiltonian matrices,
  • centrosymmetric matrices, and
  • matrices from an automorphism group.

An important distinction is between square roots of A that can be expressed as a polynomial in A (primary square roots) and those that cannot. Square roots of the latter type arise when A has repeated eigenvalues and two copies of an eigenvalue are mapped to different square roots. In some contexts, a nonprimary square root may be the natural choice. For example, consider the matrix

G(\theta) = \begin{bmatrix}      \cos \theta  &  \sin \theta \\      -\sin \theta  & \cos \theta      \end{bmatrix}, \quad \theta \in[0,2\pi],

which represents a rotation through an angle \theta radians clockwise. The natural square root of G(\theta) is G(\theta/2). For \theta = \pi, this gives the square root

G(\pi/2) = \begin{bmatrix}       0 &  1 \\      -1 &  0      \end{bmatrix}

of

G(\pi) = \begin{bmatrix}       -1 & 0 \\        0 & -1      \end{bmatrix}.

The matrix square root arises in many applications, often in connection with other matrix problems such as the polar decomposition, matrix geometric means, Markov chains (roots of transition matrices), quadratic matrix equations, and generalized eigenvalue problems. Most often the matrix is symmetric positive definite, but square roots of nonsymmetric matrices are also needed. Among modern applications, the matrix square root can be found in recent papers on machine learning.

References

This is a minimal set of references, which contain further useful references within.

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

Improved MATLAB Function Sqrtm

sqrtm-help.jpg

The MATLAB function sqrtm, for computing a square root of a matrix, first appeared in the 1980s. It was improved in MATLAB 5.3 (1999) and again in MATLAB 2015b. In this post I will explain how the recent changes have brought significant speed improvements.

Recall that every n-by-n nonsingular matrix A has a square root: a matrix X such that X^2 = A. In fact, it has at least two square roots, \pm X, and possibly infinitely many. These two extremes occur when A has a single block in its Jordan form (two square roots) and when an eigenvalue appears in two or more blocks in the Jordan form (infinitely many square roots).

In practice, it is usually the principal square root that is wanted, which is the one whose eigenvalues lie in the right-half plane. This square root is unique if A has no real negative eigenvalues. We make it unique in all cases by taking the square root of a negative number to be the one with nonnegative imaginary part.

The original sqrtm transformed A to Schur form and then applied a recurrence of Parlett, designed for general matrix functions; in fact it simply called the MATLAB funm function of that time. This approach can be unstable when A has nearly repeated eigenvalues. I pointed out the instability in a 1999 report A New Sqrtm for MATLAB and provided a replacement for sqrtm that employs a recurrence derived specifically for the square root by Björck and Hammarling in 1983. The latter recurrence is perfectly stable. My function also provided an estimate of the condition number of the matrix square root.

The importance of sqrtm has grown over the years because logm (for the matrix logarithm) depends on it, as do codes for other matrix functions, for computing arbitrary powers of a matrix and inverse trigonometric and inverse hyperbolic functions.

For a triangular matrix T, the cost of the recurrence for computing T^{1/2} is the same as the cost of computing T^{-1}, namely n^3/3 flops. But while the inverse of a triangular matrix is a level 3 BLAS operation, and so has been very efficiently implemented in libraries, the square root computation is not in the level 3 BLAS standard. As a result, my sqrtm implemented the Björck–Hammarling recurrence in M-code as a triply nested loop and was rather slow.

The new sqrtm introduced in MATLAB 2015b contains a new implementation of the Björck–Hammarling recurrence that, while still in M-code, is much faster. Here is a comparison of the underlying function sqrtm_tri (contained in toolbox/matlab/matfun/private/sqrtm_tri.m) with the relevant piece of code extracted from the old sqrtm. Shown are execution times in seconds for random triangular matrices an a quad-core Intel Core i7 processor.

n sqrtm_tri old sqrtm
10 0.0024 0.0008
100 0.0017 0.014
1000 0.45 3.12

For n=10, the new code is slower. But for n=100 we already have a factor 8 speedup, rising to a factor 69 for n=1000. The slowdown for n=10 is for a combination of two reasons: the new code is more general, as it supports the real Schur form, and it contains function calls that generate overheads for small n.

How does sqrtm_tri work? It uses a recursive partitioning technique. It writes

T = \begin{bmatrix} T_{11} & T_{12} \\ 0 & T_{22} \\ \end{bmatrix}

and notes that

T^{1/2} = \begin{bmatrix} T_{11}^{1/2} & X_{12} \\ 0 & T_{22}^{1/2} \\ \end{bmatrix},

where T_{11}^{1/2} X_{12} + X_{12} T_{22}^{1/2} = T_{12}. In this way the task of computing the square root of T is reduced to the tasks of recursively computing the square roots of the smaller matrices T_{11} and T_{22} and then solving the Sylvester equation for X_{12}. The Sylvester equation is solved using an LAPACK routine, for efficiency. If you’d like to take a look at the code, type edit private/sqrtm_tri at the MATLAB prompt. For more on this recursive scheme for computing square roots of triangular matrices see Blocked Schur Algorithms for Computing the Matrix Square Root (2013) by Deadman, Higham and Ralha.

The sqrtm in MATLAB 2015b includes two further refinements.

  • For real matrices it uses the real Schur form, which means that all computations are carried out in real arithmetic, giving a speed advantage and ensuring that the result will not be contaminated by imaginary parts at the roundoff level.
  • It estimates the 1-norm condition number of the matrix square root instead of the 2-norm condition number, so exploits the normest1 function.

Finally, I note that the product of two triangular matrices is also not a level-3 BLAS routine, yet again it is needed in matrix function codes. A proposal for it to be included in the Intel Math Kernel Library was made recently by Peter Kandolf, and I strongly support the proposal.

The Improved MATLAB Functions Expm and Logm

pcam-p97-exp-short.jpg
Equation from the Princeton Companion to Applied Mathematics, article “Functions of Matrices” (p. 97)

The matrix exponential is a ubiquitous matrix function, important both for theory and for practical computation. The matrix logarithm, an inverse to the exponential, is also increasingly used (see my earlier post, 400 Years of Logarithms).

MATLAB R2015b introduced new versions of the expm and logm functions. The Release Notes say

The algorithms for expm, logm, and sqrtm show increased accuracy, with logm and sqrtm additionally showing improved performance.

The help text for expm and logm is essentially unchanged from the previous versions, so what’s different about the new functions? (I will discuss sqrtm in a future post.)

The answer is that both functions make use of new backward error bounds that can be much smaller than the old ones for very nonnormal matrices, and so help to avoid a phenomenon known as overscaling. The key change is that when bounding a matrix power series p(X) = a_0 I + a_1 X + a_2 X^2 + \cdots, instead of bounding the kth term a_k X^k by |a_k| \|X\|^k, a potentially smaller bound is used.

This is best illustrated by example. Suppose we want to bound \|X^{12}\| and are not willing to compute X^{12} but are willing to compute lower powers of X. We have 12 = 6 \times 2 = 4 \times 3, so \|X^{12}\| is bounded by each of the terms (\|X^2\|)^6, (\|X^3\|)^4, (\|X^4\|)^3, and (\|X^6\|)^2. But it is easy to see that (\|X^6\|)^2 \le (\|X^2\|)^6 and (\|X^6\|)^2 \le (\|X^3\|)^4, so we can discard two of the bounds, ending up with

\|X^{12}\| \le \min( \|X^4\|^3, \|X^6\|^2 ).

This argument can be generalized so that every power of X is bounded in terms of the norms of X^p for values of p up to some small, fixed value. The gains can be significant. Consider the matrix

X = \begin{bmatrix}1 & 100 \\ 0 & 1 \end{bmatrix}.

We have \|X\|^{12} \approx 10^{24}, but

X^k = \begin{bmatrix}1 & 100k \\ 0 & 1 \end{bmatrix},

so the bound above is roughly \|X^{12}\| \le 6 \times 10^{7}, which is a significant improvement.

One way to understand what is happening is to note the inequality

\rho(X) \le \| X^k\| ^{1/k} \le \|X\|,

where \rho is the spectral radius (the largest modulus of any eigenvalue). The upper bound corresponds to the usual analysis. The lower bound is something that we cannot use to bound the norm of the power series. The middle term is what we are using, and as k\to\infty the middle term converges to the lower bound, which can be arbitrarily smaller than the upper bound.

What is the effect of these bounds on the algorithms in expm and logm? Both algorithms make use of Padé approximants, which are good only for small-normed matrices, so the algorithms begin by reducing the norm of the input matrix. Backward error bounds derived by bounding a power series as above guide the norm reduction and if the bounds are weak then the norm is reduced too much, which can result in loss of accuracy in floating point arithmetic—the phenomenon of overscaling. The new bounds greatly reduce the chance of overscaling.

In his blog post A Balancing Act for the Matrix Exponential, Cleve Moler describes a badly scaled 3-by-3 matrix for which the original expm suffers from overscaling and a loss of accuracy, but notes that the new algorithm does an excellent job.

The new logm has another fundamental change: it applies inverse scaling and squaring and Padé approximation to the whole triangular Schur factor, whereas the previous logm applied this technique to the individual diagonal blocks in conjunction with the Parlett recurrence.

For more on the algorithms underlying the new codes see these papers. The details of how the norm of a matrix power series is bounded are given in Section 4 of the first paper.

Principal Values of Inverse Cosine and Related Functions

casio-fx-300es-cropped.jpg

I’ve recently been working, with Mary Aprahamian, on theory and algorithms for the matrix inverse sine and cosine and their hyperbolic counterparts. Of course, in order to treat the matrix functions we first need a good understanding of the scalar case. We found that, as regards practical computation, the literature is rather confusing. The reason can be illustrated with the logarithm of a complex number.

Consider the question of whether the equation

\log(z_1 z_2) = \log z_1 + \log z_2

is valid. In many textbooks this equation is stated as is, but with the (often easily overlooked) proviso that each occurrence of \log denotes a particular branch of the logarithm—possibly different in each case. In other words, the equation is true for the multivalued function that includes all branches.

In practice, however, we are usually interested in the principal logarithm, defined as the one for which the complex argument of \log z lies in the interval (-\pi,\pi] (or possibly some other specific branch). Now the equation is not always true. A correction term that makes the equation valid can be expressed in terms of the unwinding number introduced by Corless, Hare, and Jeffrey in 1996, which is discussed in my earlier post Making Sense of Multivalued Matrix Functions with the Matrix Unwinding Function.

The definition of principal logarithm given in the previous paragraph is standard. But for the inverse (hyperbolic) cosine and sine it is difficult to find clear definitions of principal values, especially over the complex plane. Some authors define these inverse functions in terms of the principal logarithm. Care is required here, since seemingly equivalent formulas can yield different results (one reason is that (z^2-1)^{1/2} is not equivalent to (z-1)^{1/2}(z+1)^{1/2} for complex z). This is a good way to proceed, but working out the ranges of the principal functions from these definitions is not trivial.

In our paper we give diagrams that summarize four kinds of information about the principal inverse functions acos, asin, acosh, and asinh.

  • The branch points.
  • The branch cuts, marked by solid lines.
  • The domain and range, shaded gray and extending to infinity in the obvious directions).
  • The values attained on the branch cuts: the value on the cut is the limit of the values of the function as z approaches the cut from the side without the hashes.

The figures are below. Once we know the principal values we can address questions analogous to the log question, but now for identities relevant to the four inverse functions.

For more, including an explanation of the figures in words and all the details of the matrix case—including answers to questions such as “when is \mathrm{acos}(\cos A) equal to A?”—see our recent EPrint Matrix Inverse Trigonometric and Inverse Hyperbolic Functions: Theory and Algorithms.

acosm-fig0.jpg
acosm-fig1.jpg
acosm-fig2.jpg
acosm-fig3.jpg