## What is Numerical Stability?

Numerical stability concerns how errors introduced during the execution of an algorithm affect the result. It is a property of an algorithm rather than the problem being solved. I will assume that the errors under consideration are rounding errors, but in principle the errors can be from any source.

Consider a scalar function $y = f(x)$ of a scalar $x$. We regard $x$ as the input data and $y$ as the output. The forward error of a computed approximation $\widehat{y}$ to $y$ is the relative error $|y - \widehat{y}|/|y|$. The backward error of $\widehat{y}$ is

$\min\left\{\, \displaystyle\frac{|\Delta x|}{|x|}: \widehat{y} = f(x+\Delta x) \,\right\}.$

If $\widehat{y}$ has a small backward error then it is the exact answer for a slightly perturbed input. Here, “small’ is interpreted relative to the floating-point arithmetic, so a small backward error means one of the form $cu$ for a modest constant $c$, where $u$ is the unit roundoff.

An algorithm that always produces a small backward error is called backward stable. In a backward stable algorithm the errors introduced during the algorithm have the same effect as a small perturbation in the data. If the backward error is the same size as any uncertainty in the data then the algorithm produces as good a result as we can expect.

If $x$ undergoes a relative perturbation of size $u$ then $y = f(x)$ can change by as much as $\mathrm{cond}_f(x) u$, where

$\mathrm{cond}_f(x) = \lim_{\epsilon\to0} \displaystyle\sup_{|\Delta x| \le \epsilon |x|} \displaystyle\frac{|f(x+\Delta x) - f(x)|}{\epsilon|f(x)|}$

is the condition number of $f$ at $x$. An algorithm that always produces $\widehat{y}$ with a forward error of order $\mathrm{cond}_f(x) u$ is called forward stable.

The definition of $\mathrm{cond}_f(x)$ implies that a backward stable algorithm is automatically forward stable. The converse is not true. An example of an algorithm that is forward stable but not backward stable is Gauss–Jordan elimination for solving a linear system.

If $\widehat{y}$ satisfies

$\widehat{y} + \Delta y = f(x+\Delta x), \quad |\Delta y| \le \epsilon |\widehat{y}|, \quad |\Delta x| \le \epsilon |x|,$

with $\epsilon$ small in the sense described above, then the algorithm for computing $y$ is mixed backward–forward stable. Such an algorithm produces almost the right answer for a slightly perturbed input. The following diagram illustrates the previous equation.

With these definitions in hand, we can turn to the meaning of the term numerically stable. Depending on the context, numerical stability can mean that an algorithm is (a) backward stable, (b) forward stable, or (c) mixed backward–forward stable.

For some problems, backward stability is difficult or impossible to achieve, so numerical stability has meaning (b) or (c). For example, let $Z = xy^T$, where $x$ and $y$ are $n$-vectors. Backward stability would require the computed $\widehat{Z}$ to satisfy $\widehat{Z} = (x+\Delta x)(y+\Delta y)^T$ for some small $\Delta x$ and $\Delta y$, meaning that $\widehat{Z}$ is a rank-$1$ matrix. But the computed $\widehat{Z}$ contains $n^2$ independent rounding errors and is very unlikely to have rank $1$.

We briefly mention some other relevant points.

• What is the data? For a linear system $Ax = b$ the data is $A$ or $b$, or both. For a nonlinear function we need to consider whether problem parameters are data; for example, for $f(x) = \cos(3x^2+2)$ are the $3$ and the $2$ constants or are they data, like $x$, that can be perturbed?
• We have measured perturbations in a relative sense. Absolute errors can also be used.
• For problems whose inputs are matrices or vectors we need to use norms or componentwise measures.
• Some algorithms are only unconditionally numerically stable, that is, they are numerically stable for some subset of problems. For example, the normal equations method for solving a linear least squares problem is forward stable for problems with a large residual.

## References

This is a minimal set of references, which contain further useful references within.

## What is the Polar Decomposition?

A polar decomposition of $A\in\mathbb{C}^{m \times n}$ with $m \ge n$ is a factorization $A = UH$, where $U\in\mathbb{C}^{m \times n}$ has orthonormal columns and $H\in\mathbb{C}^{n \times n}$ is Hermitian positive semidefinite. This decomposition is a generalization of the polar representation $z = r \mathrm{e}^{\mathrm{i}\theta}$ of a complex number, where $H$ corresponds to $r\ge 0$ and $U$ to $\mathrm{e}^{\mathrm{i}\theta}$. When $A$ is real, $H$ is symmetric positive semidefinite. When $m = n$, $U$ is a square unitary matrix (orthogonal for real $A$).

We have $A^*A = H^*U^*UH = H^2$, so $H = (A^*\!A)^{1/2}$, which is the unique positive semidefinite square root of $A^*A$. When $A$ has full rank, $H$ is nonsingular and $U = AH^{-1}$ is unique, and in this case $U$ can be expressed as

$U = \displaystyle\frac{2}{\pi} A \displaystyle\int_{0}^{\infty} (t^2I+A^*\!A)^{-1}\, \mathrm{d}t.$

An example of a polar decomposition is

$A = \left[\begin{array}{@{}rr} 4 & 0\\ -5 & -3\\ 2 & 6 \end{array}\right] = \sqrt{2}\left[\begin{array}{@{}rr} \frac{1}{2} & -\frac{1}{6}\\[\smallskipamount] -\frac{1}{2} & -\frac{1}{6}\\[\smallskipamount] 0 & \frac{2}{3} \end{array}\right] \cdot \sqrt{2}\left[\begin{array}{@{\,}rr@{}} \frac{9}{2} & \frac{3}{2}\\[\smallskipamount] \frac{3}{2} & \frac{9}{2} \end{array}\right] \equiv UH.$

For an example with a rank-deficient matrix consider

$A = \begin{bmatrix} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 0 & 0 & 0 \end{bmatrix},$

for which $A^*A = \mathrm{diag}(0,1,1)$ and so $H = \mathrm{diag}(0,1,1)$. The equation $A = UH$ then implies that

$U = \begin{bmatrix} 0 & 1 & 0 \\ 0 & 0 & 1 \\ \theta & 0 & 0 \end{bmatrix}, \quad |\theta| = 1,$

so $U$ is not unique.

The polar factor $U$ has the important property that it is a closest matrix with orthonormal columns to $A$ in any unitarily invariant norm. Hence the polar decomposition provides an optimal way to orthogonalize a matrix. This method of orthogonalization is used in various applications, including in quantum chemistry, where it is called Löwdin orthogonalization. Most often, though, orthogonalization is done through QR factorization, trading optimality for a faster computation.

An important application of the polar decomposition is to the orthogonal Procrustes problem1

$\min \bigl\{\, \|A-BW\|_F: W \in \mathbb{C}^{n\times n},\; W^*W = I \,\bigr\},$

where $A,B\in\mathbb{C}^{m\times n}$ and the norm is the Frobenius norm $\|A\|_F^2 = \sum_{i,j} |a_{ij}|^2$. This problem, which arises in factor analysis and in multidimensional scaling, asks how closely a unitary transformation of $B$ can reproduce $A$. Any solution is a unitary polar factor of $B^*\!A$, and there is a unique solution if $B^*\!A$ is nonsingular. Another application of the polar decomposition is in 3D graphics transformations. Here, the matrices are $3\times 3$ and the polar decomposition can be computed by exploiting a relationship with quaternions.

For a square nonsingular matrix $A$, the unitary polar factor $U$ can be computed by a Newton iteration:

$X_{k+1} = \displaystyle\frac{1}{2} (X_k + X_k^{-*}), \quad X_0 = A.$

The iterates $X_k$ converge quadratically to $U$. This is just one of many iterations for computing $U$ and much work has been done on the efficient implementation of these iterations.

If $A = P \Sigma Q^*$ is a singular value decomposition (SVD), where $P\in\mathbb{C}^{m\times n}$ has orthonormal columns, $Q\in\mathbb{C}^{n\times n}$ is unitary, and $\Sigma$ is square and diagonal with nonnegative diagonal elements, then

$A = PQ^* \cdot Q \Sigma Q^* \equiv UH,$

where $U$ has orthonormal columns and $H$ is Hermitian positive semidefinite. So a polar decomposition can be constructed from an SVD. The converse is true: if $A = UH$ is a polar decomposition and $H = Q\Sigma Q^*$ is a spectral decomposition ($Q$ unitary, $D$ diagonal) then $A = (UQ)\Sigma Q^* \equiv P \Sigma Q^*$ is an SVD. This latter relation is the basis of a method for computing the SVD that first computes the polar decomposition by a matrix iteration then computes the eigensystem of $H$, and which is extremely fast on distributed-memory manycore computers.

The nonuniqueness of the polar decomposition for rank deficient $A$, and the lack of a satisfactory definition of a polar decomposition for $m < n$, are overcome in the canonical polar decomposition, defined for any $m$ and $n$. Here, $A = UH$ with $U$ a partial isometry, $H$ is Hermitian positive semidefinite, and $U^*U = HH^+$. The superscript “$+$” denotes the Moore–Penrose pseudoinverse and a partial isometry can be characterized as a matrix $U$ for which $U^+ = U^*$.

Generalizations of the (canonical) polar decomposition have been investigated in which the properties of $U$ and $H$ are defined with respect to a general, possibly indefinite, scalar product.

## References

This is a minimal set of references, which contain further useful references within.

## Footnotes:

1

Procrustes: an ancient Greek robber who tied his victims to an iron bed, stretching their legs if too short for it, and lopping them if too long.

## What Is a Symmetric Positive Definite Matrix?

A real $n\times n$ matrix $A$ is symmetric positive definite if it is symmetric ($A$ is equal to its transpose, $A^T$) and

$x^T\!Ax > 0 \quad \mbox{for all nonzero vectors}~x.$

By making particular choices of $x$ in this definition we can derive the inequalities

\begin{alignedat}{2} a_{ii} &>0 \quad&&\mbox{for all}~i,\\ a_{ij} &< \sqrt{ a_{ii} a_{jj} } \quad&&\mbox{for all}~i\ne j. \end{alignedat}

Satisfying these inequalities is not sufficient for positive definiteness. For example, the matrix

$A = \begin{bmatrix} 1 & 3/4 & 0 \\ 3/4 & 1 & 3/4 \\ 0 & 3/4 & 1 \\ \end{bmatrix}$

satisfies all the inequalities but $x^T\!Ax < 0$ for $x = [1,{}-\!\sqrt{2},~1]^T$.

A sufficient condition for a symmetric matrix to be positive definite is that it has positive diagonal elements and is diagonally dominant, that is, $a_{ii} > \sum_{j \ne i} |a_{ij}|$ for all $i$.

The definition requires the positivity of the quadratic form $x^T\!Ax$. Sometimes this condition can be confirmed from the definition of $A$. For example, if $A = B^T\!B$ and $B$ has linearly independent columns then $x^T\!Ax = (Bx)^T Bx > 0$ for $x\ne 0$. Generally, though, this condition is not easy to check.

Two equivalent conditions to $A$ being symmetric positive definite are

• every leading principal minor $\det(A_k)$, where the submatrix $A_k = A(1\colon k, 1 \colon k)$ comprises the intersection of rows and columns $1$ to $k$, is positive,
• the eigenvalues of $A$ are all positive.

The first condition implies, in particular, that $\det(A) > 0$, which also follows from the second condition since the determinant is the product of the eigenvalues.

Here are some other important properties of symmetric positive definite matrices.

• $A^{-1}$ is positive definite.
• $A$ has a unique symmetric positive definite square root $X$, where a square root is a matrix $X$ such that $X^2 = A$.
• $A$ has a unique Cholesky factorization $A = R^T\!R$, where $R$ is upper triangular with positive diagonal elements.

Sources of positive definite matrices include statistics, since nonsingular correlation matrices and covariance matrices are symmetric positive definite, and finite element and finite difference discretizations of differential equations.

Examples of symmetric positive definite matrices, of which we display only the $4\times 4$ instances, are the Hilbert matrix

$H_4 = \left[\begin{array}{@{\mskip 5mu}c*{3}{@{\mskip 15mu} c}@{\mskip 5mu}} 1 & \frac{1}{2} & \frac{1}{3} & \frac{1}{4} \\[6pt] \frac{1}{2} & \frac{1}{3} & \frac{1}{4} & \frac{1}{5}\\[6pt] \frac{1}{3} & \frac{1}{4} & \frac{1}{5} & \frac{1}{6}\\[6pt] \frac{1}{4} & \frac{1}{5} & \frac{1}{6} & \frac{1}{7}\\[6pt] \end{array}\right],$

the Pascal matrix

$P_4 = \left[\begin{array}{@{\mskip 5mu}c*{3}{@{\mskip 15mu} r}@{\mskip 5mu}} 1 & 1 & 1 & 1\\ 1 & 2 & 3 & 4\\ 1 & 3 & 6 & 10\\ 1 & 4 & 10 & 20 \end{array}\right],$

and minus the second difference matrix, which is the tridiagonal matrix

$S_4 = \left[\begin{array}{@{\mskip 5mu}c*{3}{@{\mskip 15mu} r}@{\mskip 5mu}} 2 & -1 & & \\ -1 & 2 & -1 & \\ & -1 & 2 & -1 \\ & & -1 & 2 \end{array}\right].$

All three of these matrices have the property that $a_{ij}$ is non-decreasing along the diagonals. However, if $A$ is positive definite then so is $P^TAP$ for any permutation matrix $P$, so any symmetric reordering of the row or columns is possible without changing the definiteness.

A $4\times 4$ symmetric positive definite matrix that was often used as a test matrix in the early days of digital computing is the Wilson matrix

$W = \begin{bmatrix} 5 &7& 6& 5 \\ 7 &10& 8& 7 \\ 6& 8& 10& 9 \\ 5& 7& 9 &10 \end{bmatrix}.$

What is the best way to test numerically whether a symmetric matrix is positive definite? Computing the eigenvalues and checking their positivity is reliable, but slow. The fastest method is to attempt to compute a Cholesky factorization and declare the matrix positivite definite if the factorization succeeds. This is a reliable test even in floating-point arithmetic. If the matrix is not positive definite the factorization typically breaks down in the early stages so and gives a quick negative answer.

Symmetric block matrices

$C = \begin{bmatrix} A & X\\ X^T & B \end{bmatrix}$

often appear in applications. If $A$ is nonsingular then we can write

$\begin{bmatrix} A & X\\ X^T & B \end{bmatrix} = \begin{bmatrix} I & 0\\ X^TA^{-1} & I \end{bmatrix} \begin{bmatrix} A & 0\\ 0 & B-X^TA^{-1}X \end{bmatrix} \begin{bmatrix} I & A^{-1}X\\ 0 & I \end{bmatrix},$

which shows that $C$ is congruent to a block diagonal matrix, which is positive definite when its diagonal blocks are. It follows that $C$ is positive definite if and only if both $A$ and $B - X^TA^{-1}X$ are positive definite. The matrix $B - X^TA^{-1}X$ is called the Schur complement of $A$ in $C$.

We mention two determinantal inequalities. If the block matrix $C$ above is positive definite then $\det(C) \le \det(A) \det(B)$ (Fischer’s inequality). Applying this inequality recursively gives Hadamard’s inequality for a symmetric positive definite $A$:

$\det(A) \le a_{11}a_{22} \dots a_{nn},$

with equality if and only if $A$ is diagonal.

Finally, we note that if $x^TAx \ge 0$ for all $x\ne0$, so that the quadratic form is allowed to be zero, then the symmetric matrix $A$ is called symmetric positive semidefinite. Some, but not all, of the properties above generalize in a natural way. An important difference is that semidefinitness is equivalent to all principal minors, of which there are $2^{n-1}$, being nonnegative; it is not enough to check the $n$ leading principal minors. Consider, as an example, the matrix

$\begin{bmatrix} 1 & 1 & 1\\ 1 & 1 & 1\\ 1 & 1 & 0 \end{bmatrix},$

which has leading principal minors $1$, $0$, and $0$ and a negative eigenvalue.

A complex $n\times n$ matrix $A$ is Hermitian positive definite if it is Hermitian ($A$ is equal to its conjugate transpose, $A^*$) and $x^*Ax > 0$ for all nonzero vectors $x$. Everything we have said above generalizes to the complex case.

## References

This is a minimal set of references, which contain further useful references within.

## What Is the Growth Factor for Gaussian Elimination?

Gaussian elimination is the process of reducing an $n \times n$ matrix to upper triangular form by elementary row operations. It consists of $n$ stages, in the $k$th of which multiples of row $k$ are added to later rows to eliminate elements below the diagonal in the $k$th column. The result of Gaussian elimination (assuming it succeeds) is a factorization $A = LU$, where $L$ is unit lower triangular (lower triangular with ones on the diagonal) and $U$ is upper triangular. This factorization reduces the solution of a linear system $Ax = b$ to the solution of two triangular systems.

James Wilkinson showed in the early 1960s that the numerical stability of Gaussian elimination depends on the growth factor

$\rho_n = \displaystyle\frac{\max_{i,j,k} |a_{ij}^{(k)}|} {\max_{i,j}|a_{ij}|} \ge 1,$

where the $a_{ij}^{(k)}$ are the elements at the start of the $k$th stage of Gaussian elimination, with $A = \bigl(a_{ij}^{(1)}\bigr)$. Specifically, he obtained a bound for the backward error of the computed solution that is proportional to $\rho_nu$, where $u$ is the unit roundoff of the floating-point arithmetic. Wilkinson’s analysis focused attention on the size of the growth factor.

If no row interchanges are done during the elimination then $\rho_n$ can be arbitrarily large, as is easily seen for $n = 2$. For some specific types of matrix more can be said.

• If $A$ is symmetric positive definite then $\rho_n = 1$. In this case one would normally use Cholesky factorization instead of LU factorization.
• If $A$ is nonsingular and totally nonnegative (every submatrix has nonnegative determinant) then $\rho_n = 1$.
• If $A$ is diagonally dominant by rows or columns then $\rho_n \le 2$.
• If $A$ is complex and its real and imaginary parts are both symmetric positive definite then $\rho_n < 3$.

The entry in the $(k,k)$ position at the start of stage $k$ of Gaussian elimination is called the pivot. By swapping rows and columns, any element with row and column indices greater than or equal to $k$ can be brought into the pivot position and used as the pivot. A pivoting strategy is a strategy for interchanging rows and columns at the start of each stage in order to choose a good sequence of pivots. For dense matrices the main criterion is that the pivoting strategy keeps the growth factor small, while for sparse matrices minimizing fill-in (a zero entry becoming nonzero) is also important.

## Partial Pivoting

Partial pivoting interchanges rows to ensure that the pivot element is the largest in magnitude in its column. Wilkinson showed that partial pivoting ensures $\rho_n \le 2^{n-1}$ and that equality is attained for matrices of the form illustrated for $n = 4$ by

$\left[\begin{array}{@{\mskip3mu}rrrr} 1 & 0 & 0 & 1 \\ -1 & 1 & 0 & 1 \\ -1 & -1 & 1 & 1 \\ -1 & -1 & -1 & 1 \\ \end{array} \right].$

This matrix is a special case of a larger class of matrices for which equality is attained (Higham and Higham, 1989). Exponential growth can occur for matrices arising in practical applications, as shown by Wright for the multiple shooting method for two-point boundary value problems and Foster for a quadrature method for solving Volterra integral equations. Remarkably, though, $\rho_n$ is usually of modest size in practice. Explaining this phenomenon is one of the outstanding problems in numerical analysis.

## Rook Pivoting

Rook pivoting interchanges rows and columns to ensure that the pivot element is the largest in magnitude in both its row and its column. Foster has shown that

$\rho_n \le 1.5 n^{\frac{3}{4} \log n}.$

This bound grows much more slowly than the bound for partial pivoting, but it can still be large. In practice, the bound is pessimistic.

## Complete Pivoting

Complete pivoting chooses as pivot the largest element in magnitude in the remaining part of the matrix. Wilkinson (1961) showed that

$\rho_n \le n^{1/2} (2\cdot 3^{1/2} \cdots n^{1/(n-1)})^{1/2} \sim c \, n^{1/2} n^{\frac{1}{4}\log n}.$

This bound grows much more slowly than that for rook pivoting, but again it is pessimistic. Wilkinson noted in 1965 that no matrix had been discovered for which $\rho_n > n$. Much interest has focused on the question of whether growth of $n$ can be exceeded and what is the largest possible growth. For a Hadamard matrix, $\rho_n \ge n$, but such matrices do not exist for all $n$. Evidence suggests that the growth factor for complete pivoting on a Hadamard matrix is exactly $n$, and this has been proved for $n = 12$ and $n = 16$.

That $\rho_n > n$ is possible was shown in the early 1990s by Gould (in floating-point arithmetic) and Edelman (in exact point arithmetic), through a $13\times 13$ matrix with growth $13.02$.

## Growth Factor Bounds for Any Pivoting Strategy

While much attention has focused on investigating the growth factor for particular pivoting strategies, some results are known that provide growth factor lower bounds for any pivoting strategy. Higham and Higham (1989) obtained a result that implies that for the symmetric orthogonal matrix

$S = \sqrt{ \displaystyle\frac{2}{n+1} } \biggl( \sin\Bigl( \displaystyle\frac{ij\pi}{n+1} \Bigr) \biggr)_{i,j=1}^n$

the lower bound

$\rho_n(S)\ge \displaystyle\frac{n+1}{2}$

holds for any pivoting strategy. Recently, Higham, Higham, and Pranesh (2020) have shown that

$\rho_n(Q) \gtrsim \displaystyle\frac{n}{4\log n}$

with high probability for large $n$ when $Q$ is a random orthogonal matrix from the Haar distribution. They show that the same bound holds for matrices that have only one singular value different from $1$ and have singular vector matrices that are random orthogonal matrices from the Haar distribution. These “randsvd matrices” can be generated as gallery('randsvd',n,kappa,2) in MATLAB.

## Experimentation

If you want to experiment with growth factors in MATLAB, you can use the function gep from the Matrix Computation Toolbox. Examples:

>> rng(0); A = randn(1000);
>> [L,U,P,Q,rho] = gep(A,'p'); rho  % Partial pivoting.
rho =
1.9437e+01
>> [L,U,P,Q,rho] = gep(A,'r'); rho  % Rook pivoting.
rho =
9.9493e+00
>> [L,U,P,Q,rho] = gep(A,'c'); rho  % Complete pivoting.
rho =
6.8363e+00
>> A = gallery('randsvd',1000,1e1,2);
>> [L,U,P,Q,rho] = gep(A,'c'); rho  % Complete pivoting.
rho =
5.4720e+01


## References

This is a minimal set of references, which contain further useful references within.

## What Is Stochastic Rounding?

In finite precision arithmetic the result of an elementary arithmetic operation does not generally lie in the underlying number system, $F$, so it must be mapped back into $F$ by the process called rounding. The most common choice is round to nearest, whereby the nearest number in $F$ to the given number is chosen; this is the default in IEEE standard floating-point arithmetic.

Round to nearest is deterministic: given the same number it always produces the same result. A form of rounding that randomly rounds to the next larger or next smaller number was proposed Barnes, Cooke-Yarborough, and Thomas (1951), Forysthe (1959), and Hull and Swenson (1966). Now called stochastic rounding, it comes in two forms. The first form rounds up or down with equal probability $1/2$.

We focus here on the second form of stochastic rounding, which is more interesting. To describe it we let $x\notin F$ be the given number and let $x_1$ and $x_2$ with $x_1 < x_2$ be the adjacent numbers in $F$ that are the candidates for the result of rounding. The following diagram shows the situation where $x$ lies just beyond the half-way point between $x_1$ and $x_2$:

We round up to $x_2$ with probability $(x-x_1)/(x_2-x_1)$ and down to $x_1$ with probability $(x_2-x)/(x_2-x_1)$; note that these probabilities sum to $1$. The probability of rounding to $x_1$ and $x_2$ is proportional to one minus the relative distance from $x$ to each number.

If stochastic rounding chooses to round to $x_1$ in the diagram then it has committed a larger error than round to nearest, which rounds to $x_2$. Indeed if we focus on floating-point arithmetic then the result of the rounding is

$\mathrm{f\kern.2ptl}(x) = x (1+\delta), \quad |\delta|\le 2u,$

where $u$ is the unit roundoff. For round to nearest the same result is true with the smaller bound $|\delta|\le u$. In addition, certain results that hold for round to nearest, such as $\mathrm{f\kern.2ptl}(x) = -\mathrm{f\kern.2ptl}(-x)$ and $\mathrm{f\kern.2ptl}(x/\sqrt{x^2+y^2}) \le 1$, can fail for stochastic rounding. What, then, is the benefit of stochastic rounding?

It is not hard to show that the expected value of the result of stochastically rounding $x$ is $x$ itself (this is obvious if $x = (x_1+x_2)/2$, for example), so the expected error is zero. Hence stochastic rounding maintains, in a statistical sense, some of the information that is discarded by a deterministic rounding scheme. This property leads to some important benefits, as we now explain.

In certain situations, round to nearest produces correlated rounding errors that cause systematic error growth. This can happen when we form the inner product of two long vectors $x$ and $y$ of nonnegative elements. If the elements all lie on $[0,1]$ then clearly the partial sum can grow monotonically as more and more terms are accumulated until at some point all the remaining terms “drop off the end” of the computed sum and do not change it—the sum stagnates. This phenomenon was observed and analyzed by Huskey and Hartree as long ago as 1949 in solving differential equations on the ENIAC. Stochastic rounding avoids stagnation by rounding up rather than down some of the time. The next figure gives an example. Here, $x$ and $y$ are $n$-vectors sampled from the uniform $[0,1]$ distribution, the inner product is computed in IEEE standard half precision floating-point arithmetic ($u \approx 4.9\times 10^{-4}$), and the backward errors are plotted for increasing $n$. From about $n = 1000$, the errors for stochastic rounding (SR, in orange) are smaller than those for round to nearest (RTN, in blue), the latter quickly reaching 1.

The figure is explained by recent results of Connolly, Higham, and Mary (2020). The key property is that the rounding errors generated by stochastic rounding are mean independent (a weaker version of independence). As a consequence, an error bound proportional to $\sqrt{n}u$ can be shown to hold with high probability. With round to nearest we have only the usual worst-case error bound, which is proportional to $nu$. In the figure above, the solid black line is the standard backward error bound $nu$ while the dashed black line is $\sqrt{n}u$. In this example the error with round to nearest is not bounded by a multiple of $\sqrt{n}u$, as the blue curve crosses the dashed black one. The underlying reason is that the rounding errors for round to nearest are correlated (they are all negative beyond a certain point in the sum), whereas those for stochastic rounding are uncorrelated.

Another notable property of stochastic rounding is that the expected value of the computed inner product is equal to the exact inner product. The same is true more generally for matrix–vector and matrix–matrix products and the solution of triangular systems.

Stochastic rounding is proving popular in neural network training and inference when low precision arithmetic is used, because it avoids stagnation.

Implementing stochastic rounding efficiently is not straightforward, as the definition involves both a random number and an accurate value of the result that we are trying to compute. Possibilities include modifying low level arithmetic routines (Hopkins et al., 2020) and exploiting the augmented operations in the latest version of the IEEE standard floating-point arithmetic (Fasi and Mikaitis, 2020).

Hardware is starting to become available that supports stochastic rounding, including the Intel Lohi neuromorphic chip, the Graphcore Intelligence Processing Unit (intended to accelerate machine learning), and the SpiNNaker2 chip.

Stochastic rounding can be done in MATLAB using the chop function written by me and Srikara Pranesh. This function is intended for experimentation rather than efficient computation.

## References

This is a minimal set of references, which contain further useful references within.

## What Is the Hilbert Matrix?

The Hilbert matrix $H_n = (h_{ij})$ is the $n\times n$ matrix with $h_{ij} = 1/(i+j-1)$. For example,

$H_4 = \left[\begin{array}{@{\mskip 5mu}c*{3}{@{\mskip 15mu} c}@{\mskip 5mu}} 1 & \frac{1}{2} & \frac{1}{3} & \frac{1}{4} \\[6pt] \frac{1}{2} & \frac{1}{3} & \frac{1}{4} & \frac{1}{5}\\[6pt] \frac{1}{3} & \frac{1}{4} & \frac{1}{5} & \frac{1}{6}\\[6pt] \frac{1}{4} & \frac{1}{5} & \frac{1}{6} & \frac{1}{7}\\[6pt] \end{array}\right].$

It is probably the most famous test matrix and its conditioning and other properties were extensively studied in the early days of digital computing, especially by John Todd.

The Hilbert matrix is symmetric and it is a Hankel matrix (constant along the anti-diagonals). Less obviously, it is symmetric positive definite (all its eigenvalues are positive) and totally positive (every submatrix has positive determinant). Its condition number grows rapidly with $n$; indeed for the 2-norm the asymptotic growth rate is $\kappa_2(H_n) \sim \mathrm{e}^{3.5n}$. An underlying reason for the ill conditioning is that the Hilbert matrix is obtained when least squares polynomial approximation is done using the monomial basis, and this is known to be an ill-conditioned polynomial basis.

The inverse of $H_n$ has integer entries, with $(i,j)$ entry

$(-1)^{i+j} (i+j-1) \displaystyle{ n+i-1 \choose n-j } { n+j-1 \choose n-i } { i+j-2 \choose i-1 }^2.$

For example,

$H_4^{-1} = \left[\begin{array}{rrrr} 16 & -120 & 240 & -140 \\ -120 & 1200 & -2700 & 1680 \\ 240 & -2700 & 6480 & -4200 \\ -140 & 1680 & -4200 & 2800 \\ \end{array}\right].$

The alternating sign pattern is a consequence of total positivity.

The determinant is also known explicitly:

$\det(H_n) = \displaystyle\frac{ (1!\, 2!\, \dots (n-1)!\, )^4 }{ 1!\, 2!\, \dots (2n-1)! } \sim 2^{-2n^2}. % \cite{todd54}$

The Hilbert matrix is infinitely divisible, which means that the matrix with $(i,j)$ element $h_{ij}^t$ is positive semidefinite for all nonnegative real numbers $t$.

Other interesting matrices can be formed from the Hilbert matrix. The Lotkin matrix, A = gallery('lotkin',n) in MATLAB, is the Hilbert matrix with the first row replaced by $1$s, making it nonsymmetric. Like the Hilbert matrix, it is ill conditioned and has an inverse with integer entries. For the $n\times n$ matrix with $(i,j)$ entry $1/(i+j)$, which is a submatrix of $H_{n+1}$, the largest singular value tends to $\pi$ as $n\to\infty$, with error decaying slowly as $O(1/\log n)$, as was shown by Taussky.

The Hilbert matrix is not a good test matrix, for several reasons. First, it is too ill conditioned, being numerically singular in IEEE double precision arithmetic for $n = 12$. Second, it is too special: the definiteness and total positivity, together with the fact that it is a special case of a Cauchy matrix (which has elements $1/(x_i + y_j)$), mean that certain quantities associated with it can be calculated more accurately than for a general positive definite matrix. The third reason why the Hilbert matrix is not a good test matrix is that most of its elements cannot be stored exactly in floating-point arithmetic, so the matrix actually stored is a perturbation of $H_n$—and since $H_n$ is ill conditioned this means (for example) that the inverse of the stored matrix can be very different from the exact inverse. A way around this might be to work the integer matrix $H_n^{-1}$, but its elements are exactly representable in double precision only for $n\le 12$.

MATLAB has functions hilb and invhilb for the Hilbert matrix and its inverse. How to efficiently form the Hilbert matrix in MATLAB is an interesting question. The hilb function essentially uses the code

j = 1:n;
H = 1./(j'+j-1)


This code is more concise and efficient than two nested loops would be. The second line may appear to be syntactically incorrect, because j'+j adds a column vector to a row vector. However, the MATLAB implicit expansion feature expands j' into an $n\times n$ matrix in which every column is j', expands j into an $n\times n$ matrix in which every row is j, then adds the two matrices. It is not hard to see that the Hilbert matrix is the result of these two lines of code.

## References

This is a minimal set of references, which contain further useful references within.

## Related Blog Posts

Posted in what-is | 3 Comments

## What Is a Fréchet Derivative?

Let $U$ and $V$ be Banach spaces (complete normed vector spaces). The Fréchet derivative of a function $f:U \to V$ at $X\in U$ is a linear mapping $L:U\to V$ such that

$f(X+E) - f(X) - L(X,E) = o(\|E\|)$

for all $E\in U$. The notation $L(X,E)$ should be read as “the Fréchet derivative of $f$ at $X$ in the direction $E$”. The Fréchet derivative may not exist, but if it does exist then it is unique. When $U = V = \mathbb{R}$, the Fréchet derivative is just the usual derivative of a scalar function: $L(x,e) = f'(x)e$.

As a simple example, consider $U = V = \mathbb{R}^{n\times n}$ and $f(X) = X^2$. From the expansion

$f(X+E) - f(X) = XE + EX + E^2$

we deduce that $L(X,E) = XE + EX$, the first order part of the expansion. If $X$ commutes with $E$ then $L_X(E) = 2XE = 2EX$.

More generally, it can be shown that if $f$ has the power series expansion $f(x) = \sum_{i=0}^\infty a_i x^i$ with radius of convergence $r$ then for $X,E\in\mathbb{R}^{n\times n}$ with $\|X\| < r$, the Fréchet derivative is

$L(X,E) = \displaystyle\sum_{i=1}^\infty a_i \displaystyle\sum_{j=1}^i X^{j-1} E X^{i-j}.$

An explicit formula for the Fréchet derivative of the matrix exponential, $f(A) = \mathrm{e}^A$, is

$L(A,E) = \displaystyle\int_0^1 \mathrm{e}^{A(1-s)} E \mathrm{e}^{As} \, ds.$

Like the scalar derivative, the Fréchet derivative satisfies sum and product rules: if $g$ and $h$ are Fréchet differentiable at $A$ then

\begin{alignedat}{2} f &= \alpha g + \beta h &&\;\Rightarrow\; L_f(A,E) = \alpha L_g(A,E) + \beta L_h(A,E),\\ f &= gh &&\;\Rightarrow\; L_f(A,E) = L_g(A,E) h(A) + g(A) L_h(A,E). \end{alignedat}

A key requirement of the definition of Fréchet derivative is that $L(X,E)$ must satisfy the defining equation for all $E$. This is what makes the Fréchet derivative different from the Gâteaux derivative (or directional derivative), which is the mapping $G:U \to V$ given by

$G(X,E) = \lim_{t\to0} \displaystyle\frac{f(X+t E)-f(X)}{t} = \frac{\mathrm{d}}{\mathrm{dt}}\Bigm|_{t=0} f(X + tE).$

Here, the limit only needs to exist in the particular direction $E$. If the Fréchet derivative exists at $X$ then it is equal to the Gâteaux derivative, but the converse is not true.

A natural definition of condition number of $f$ is

$\mathrm{cond}(f,X) = \lim_{\epsilon\to0} \displaystyle\sup_{\|\Delta X\| \le \epsilon \|X\|} \displaystyle\frac{\|f(X+\Delta X) - f(X)\|}{\epsilon\|f(X)\|},$

and it can be shown that $\mathrm{cond}$ is given in terms of the Fréchet derivative by

$\mathrm{cond}(f,X) = \displaystyle\frac{\|L(X)\| \|X\|}{\|f(X)\|},$

where

$\|L(X)\| = \sup_{Z\ne0}\displaystyle\frac{\|L(X,Z)\|}{\|Z\|}.$

For matrix functions, the Fréchet derivative has a number of interesting properties, one of which is that the eigenvalues of $L(X)$ are the divided differences

$f[\lambda_i,\lambda_j] = \begin{cases} \dfrac{ f(\lambda_i)-f(\lambda_j) }{\lambda_i - \lambda_j}, & \lambda_i\ne\lambda_j, \\ f'(\lambda_i), & \lambda_i=\lambda_j, \end{cases}$

for $1\le i,j \le n$, where the $\lambda_i$ are the eigenvalues of $X$. We can check this formula in the case $F(X) = X^2$. Let $(\lambda,u)$ be an eigenpair of $X$ and $(\mu,v)$ an eigenpair of $X^T$, so that $Xu = \lambda u$ and $X^Tv = \mu v$, and let $E = uv^T$. Then

$L(X,E) = XE + EX = Xuv^T + uv^TX = (\lambda + \mu) uv^T.$

So $uv^T$ is an eigenvector of $L(X)$ with eigenvalue $\lambda + \mu$. But $f[\lambda,\mu] = (\lambda^2-\mu^2)/(\lambda - \mu) = \lambda + \mu$ (whether or not $\lambda$ and $\mu$ are distinct).

## References

This is a minimal set of references, which contain further useful references within.

## Six Useful LaTeX Packages

Here are six $\LaTeX$ packages that, while they are perhaps not among the best known, I have found to be very useful. All of them are probably available in your $\LaTeX$ distribution, but in case they are not I give links to the packages on CTAN (the Comprehensive TeX Archive Network). A PDF file containing a demonstration of the examples below is available here.

## Mathtools

Mathtools extends the amsmath package with various additions, particularly for fine-tuning the typesetting of equations, and it also fixes a few problems with amsmath. To give just one example, the standard amsmath bmatrix environment centres its columns:

A = \begin{bmatrix}
12 & 3  & 0 \\
0  & 10 &-1
\end{bmatrix}


For matrices with negative elements, such as this one, the columns can look better right justified. The bmatrix* environment provided by mathtools allows a column alignment parameter to be given, so we can right justify the columns as follows:

A = \begin{bmatrix*}[r]
12 & 3  & 0 \\
0  & 10 &-1
\end{bmatrix*}


Unfortunately, only one alignment parameter can be given, so this command does not allow different columns to have different alignments (to achieve this, we can use the array environment instead).

The mathtools package is regularly updated.

## Url

The url package provides the \url command, which acts like \verb except that it allows line breaks to be made when the argument is typeset. I use url whenever I need to typeset a URL or a path, in particular in my BibTeX bib files. I call the package with the hyphens option, so that line breaks will be allowed at hyphens. Example:

\usepackage[hyphens]{url}
...
\footnote{\url{https://icl.bitbucket.io/hpl-ai/}}
\url{http://blogs.mathworks.com/cleve/2013/01/07/george-forsythe/}


An advantage over the \verb command is that \url (or, if necessary, \urldef) can be used inside another command, such as \footnote, as in this example.

## Upref

By loading the upref package you ensure that every \ref or \eqref reference is set in an upright font. This is normal typesetting practice, but it is not enforced by all $\LaTeX$ classes. It is enforced by the SIAM class, for example.

## Siunitx

The siunitx package typesets SI units according to the official rules, with just the right spacing between the number and the unit. For example, you can type

\si{kg.m/s^2}


$\mathrm{kg}\,\mathrm{m}/\mathrm{s}^2$


Even if you do not need to type SI units, you might find useful the package’s ability to round and format numbers.

## Upquote

The upquote package forces left and right quotes to be typed as straight (vertical) quotes in \verb and verbatim environments. This is usually what is required in program listings. We used upquote in MATLAB Guide.

## Fancyvrb

The fancyvrb package provides a more powerful replacement for the verbatim environment, allowing the font to be specified, the environment to be boxed, an external file to be inclued in a verbatim environment, and much more. Here is example that formats the contents of the environment in blue and boldface:

\begin{Verbatim}[fontsize=\small,formatcom=\color{blue}]
>> syms x, int(1/(1-x^4))
ans =
atan(x)/2 + atanh(x)/2
\end{Verbatim}


The second example includes an external file, numbers the lines, and puts horizontal lines before and after the code:

\renewcommand{\theFancyVerbLine}%
{\textcolor{red}{\small\arabic{FancyVerbLine}}}
\VerbatimInput[frame=lines,numbers=left,fontsize=\small]%
{d:/matlab/gen/gf.m}


## Other packages

I have previously written about the enumitem and booktabs packages, which I also strongly recommend.

## What Is the Adjugate of a Matrix?

The adjugate of an $n\times n$ matrix $A$ is defined by

$\mathrm{adj}(A) = \bigl( (-1)^{i+j} \det(A_{ji}) \bigr),$

where $A_{pq}$ denotes the submatrix of $A$ obtained by deleting row $p$ and column $q$. It is the transposed matrix of cofactors. The adjugate is sometimes called the (classical) adjoint and is sometimes written as $A^\mathrm{A}$.

$A \mathrm{adj}(A) = \mathrm{adj}(A) A = \det(A)I,$

so for nonsingular $A$,

$\mathrm{adj}(A) = \det(A) A^{-1},$

and this equation is often used to give a formula for $A^{-1}$ in terms of determinants.

Since the adjugate is a scalar multiple of the inverse, we would expect it to share some properties of the inverse. Indeed we have

\begin{aligned} \mathrm{adj}(AB) &= \mathrm{adj}(B) \mathrm{adj}(A),\\ \mathrm{adj}(A^*) &= \mathrm{adj}(A)^*. \end{aligned}

These properties can be proved by first assuming the matrices are nonsingular—in which case they follow from properties of the determinant and the inverse—and then using continuity of the entries of $\mathrm{adj}(A)$ and the fact that every matrix is the limit of a sequence of nonsingular matrices. Another property that can be proved in a similar way is

$\mathrm{adj}\bigl(\mathrm{adj}(A)\bigr) = (\det A)^{n-2} A.$

If $A$ has rank $n-2$ or less then $\mathrm{adj}(A)$ is just the zero matrix. Indeed in this case every $(n-1)\times(n-1)$ submatrix of $A$ is singular, so $\det(A_{ji}) \equiv 0$ in the definition of $\mathrm{adj}(A)$. In particular, $\mathrm{adj}(0) = 0$ and $\mathrm{adj}(xy^*) = 0$ for any rank-1 matrix $xy^*$

Less obviously, if $\mathrm{rank}(A) = n-1$ then $\mathrm{adj}(A)$ has rank 1. This can be seen from the formula below that expresses the adjugate in terms of the singular value decomposition (SVD).

If $x,y\in\mathbb{C}^n$, then for nonsingular $A$,

\begin{aligned} \det(A + xy^*) &= \det\bigl( A(I + A^{-1}x y^*) \bigr)\\ &= \det(A) \det( I + A^{-1}x y^*)\\ &= \det(A) (1+ y^*A^{-1}x)\\ &= \det(A) + y^* (\det(A)A^{-1})x\\ &= \det(A) + y^* \mathrm{adj}(A)x. \end{aligned}

Again it follows by continuity that for any $A$,

$\det(A + xy^*) = \det(A) + y^* \mathrm{adj}(A)x.$

This expression is useful when we need to deal with a rank-1 perturbation to a singular matrix. A related identity is

$\det\left( \begin{bmatrix} A & x \\ y^* & \alpha \\ \end{bmatrix}\right) = \alpha \det(A) - y^* \mathrm{adj}(A) x.$

Let $A = U\Sigma V^*$ be an SVD, where $U$ and $V$ are unitary and $\Sigma = \mathrm{diag}(\sigma_i)$, with $\sigma_1 \ge \sigma_2 \ge \cdots \ge \sigma_n \ge 0$. Then

$\mathrm{adj}(A) = \mathrm{adj}(V^*) \mathrm{adj}(\Sigma) \mathrm{adj}(U) = \mathrm{adj}(V)^* \mathrm{adj}(\Sigma) \mathrm{adj}(U).$

It is easy to see that $D = \mathrm{adj}(\Sigma)$ is diagonal, with

$d_{kk} = \displaystyle\prod_{i=1\atop i\ne k }^n \sigma_i.$

Since $U$ and $V$ are unitary and hence nonsingular,

\begin{aligned} \mathrm{adj}(A) &= \mathrm{adj}(V)^* \mathrm{adj}(\Sigma) \mathrm{adj}(U)\\ &= \overline{\det(V)} V^{-*} \mathrm{adj}(\Sigma) \det(U) U^{-1}\\ &= \overline{\det(V)} V \mathrm{adj}(\Sigma) \det(U) U^*\\ &= \overline{\det(V)} \det(U) \cdot V \mathrm{adj}(\Sigma) U^*. \end{aligned}

If $\mathrm{rank}(A) = n-1$ then $\sigma_n = 0$ and so

$\mathrm{adj}(A) = \rho\, \sigma_1 \sigma_2 \cdots \sigma_{n-1}v_n^{} u_n^*,$

where $\rho = \overline{\det(V)} \det(U)$ has modulus $1$ and $v_n$ and $u_n$ are the last columns of $V$ and $U$, respectively.

The definition of $\mathrm{adj}(A)$ does not provide a good means for computing it, because the determinant computations are prohibitively expensive. The following MATLAB function (which is very similar to the function adjoint in the Symbolic Math Toolbox) uses the SVD formula.

function X = adj(A)
%   X = ADJ(A) computes the adjugate of the square matrix A via
%   the singular value decomposition.

n = length(A);
[U,S,V] = svd(A);
D = zeros(n);
for i=1:n
d = diag(S);
d(i) = 1;
D(i,i) = prod(d);
end
X = conj(det(V))*det(U)*V*D*U';


Note that, in common with most SVD codes, the svd function does not return $\det(U)$ and $\det(V)$, so we must compute them. This function is numerically stable, as shown by Stewart (1998).

Finally, we note that Richter (1954) and Mirsky (1956) obtained the Frobenius norm bound

$\|\mathrm{adj}(A) \|_F \le \displaystyle\frac{ \|A\|_F^{n-1} }{ n^{(n-2)/2} }.$

## References

This is a minimal set of references, which contain further useful references within.

Update (August 10, 2020): added the second displayed equation.

Posted in what-is | 2 Comments

## What Is a Matrix Function?

If you give an array as the argument to a mathematical function in a programming language or problem solving environment you are likely to receive the result of applying the function to each component of the array. For example, here MATLAB exponentiates the integers from 0 to 3:

>> A = [0 1; 2 3], X = exp(A)
A =
0     1
2     3
X =
1.0000e+00   2.7183e+00
7.3891e+00   2.0086e+01


When the array represents a matrix this componentwise evaluation is not useful, as it does not obey the rules of matrix algebra. To see what properties we might require, consider the matrix square root. This function is notable historically because Cayley considered it in the 1858 paper in which he introduced matrix algebra.

A square root of an $n\times n$ matrix $A$ is a matrix $X$ such that $X^2 = A$. Any square root $X$ satisfies

$AX = X^2 X = X X^2 = XA,$

so $X$ commutes with $A$. Also, $A^* = (X^2)^* = (X^*)^2$, so $X^*$ is a square root of $A^*$ (here, the superscript $*$ denotes the conjugate transpose).

Furthermore, for any nonsingular matrix $Z$ we have

$(Z^{-1}XZ)^2 = Z^{-1}X^2 Z = Z^{-1}A Z.$

If we choose $Z$ as a matrix that takes $A$ to its Jordan canonical form $J$ then we have $(Z^{-1}XZ)^2 = J$, so that $Z^{-1}XZ$ is a square root of $J$, or in other words $X$ can be expressed as $X = Z \sqrt{J} Z^{-1}$.

Generalizing from these properties of the matrix square root it is reasonable to ask that a function $f(A)$ of a square matrix $A$ satisfies

• $f(A)$ commutes with $A$,
• $f(A^*) = f(A)^*$,
• $f(A) = Z f(J)Z^{-1}$, where $A$ has the Jordan canonical form $A = ZJZ^{-1}$.

(These are of course not the only requirements; indeed $f(A) \equiv I$ satisfies all three conditions.)

Many definitions of $f(A)$ can be given that satisfy these and other natural conditions. We will describe just three definitions (a notable omission is a definition in terms of polynomial interpolation).

## Power Series

For a function $f$ that has a power series expansion we can define $f(A)$ by substituting $A$ for the variable. It can be shown that the matrix series will be convergent for matrices whose eigenvalues lie within the radius of convergence of the scalar power series. For example, where $\rho$ denotes the spectral radius. This definition is natural for functions that have a power series expansion, but it is rather limited in its applicability.

\begin{aligned} \cos(A) &= I - \displaystyle\frac{A^2}{2!} + \frac{A^4}{4!} - \frac{A^6}{6!} + \cdots,\\ \log(I+A) &= A - \displaystyle\frac{A^2}{2} + \frac{A^3}{3} - \frac{A^4}{4} + \cdots, \quad \rho(A)<1, \end{aligned}

## Jordan Canonical Form Definition

If $A$ has the Jordan canonical form

$Z^{-1}AZ = J = \mathrm{diag}(J_1, J_2, \dots, J_p),$

where

$J_k = J_k(\lambda_k) = \begin{bmatrix} \lambda_k & 1 & & \\ & \lambda_k & \ddots & \\ & & \ddots & 1 \\ & & & \lambda_k \end{bmatrix} = \lambda_k I + N \in \mathbb{C}^{m_k\times m_k},$

then

$f(A) := Z f(J) Z^{-1} = Z \mathrm{diag}(f(J_k)) Z^{-1},$

where

$f(J_k) := \begin{bmatrix} f(\lambda_k) & f'(\lambda_k) & \dots & \displaystyle\frac{f^{(m_k-1)}(\lambda_k)}{(m_k-1)!} \\ & f(\lambda_k) & \ddots & \vdots \\ & & \ddots & f'(\lambda_k) \\ & & & f(\lambda_k) \end{bmatrix}.$

Notice that $f(J_k)$ has the derivatives of $f$ along its diagonals in the upper triangle. Write $J_k = \lambda_k I + N$, where $N$ is zero apart from a superdiagonal of 1s. The formula for $f(J_k)$ is just the Taylor series expansion

$f(J_k) = f(\lambda_k I + N) = f(\lambda_k)I + f'(\lambda_k)N + \displaystyle\frac{f''(\lambda_k)}{2!}N^2 + \cdots + \displaystyle\frac{f^{(m_k-1)}(\lambda_k)}{(m_k-1)!}N^{m_k-1}.$

The series is finite because $N^{m_k} = 0$: as $N$ is powered up the superdiagonal of 1s moves towards the right-hand corner until it eventually disappears.

An immediate consequence of this formula is that $f(A)$ is defined only if the necessary derivatives exist: for each eigenvalue $\lambda_k$ we need the existence of the derivatives at $\lambda_k$ of order up to one less than the size of the largest Jordan block in which $\lambda_k$ appears.

When $A$ is a diagonalizable matrix the definition simplifies considerably: if $A = ZDZ^{-1}$ with $D = \mathrm{diag}(\lambda_i)$ then $f(A) = Zf(D)Z^{-1} = Z \mathrm{diag}(\lambda_i) Z^{-1}$.

## Cauchy Integral Formula

For a function $f$ analytic on and inside a closed contour $\Gamma$ that encloses the spectrum of $A$ the matrix $f(A)$ can be defined by the Cauchy integral formula

$f(A) := \displaystyle\frac{1}{2\pi \mathrm{i}} \int_\Gamma f(z) (zI-A)^{-1} \,\mathrm{d}z.$

This formula is mathematically elegant and can be used to provide short proofs of certain theoretical results.

## Equivalence of Definitions

These and several other definitions turn out to be equivalent under suitable conditions. This was recognized by Rinehart, who wrote in 1955

“There have been proposed in the literature since 1880 eight distinct definitions of a matric function, by Weyr, Sylvester and Buchheim, Giorgi, Cartan, Fantappiè, Cipolla, Schwerdtfeger and Richter … All of the definitions except those of Weyr and Cipolla are essentially equivalent.”

## Computing a Matrix Function in MATLAB

In MATLAB, matrix functions are distinguished from componentwise array evaluation by the postfix “m” on a function name. Thus expm, logm, and sqrtm are matrix function counterparts of exp, log, and sqrt. Compare the example at the start of this article with

>> A = [0 1; 2 3], X = expm(A)
A =
0     1
2     3
X =
5.2892e+00   8.4033e+00
1.6807e+01   3.0499e+01


The MATLAB function funm calculates $f(A)$ for general matrix functions $f$ (subject to some restrictions).

## References

This is a minimal set of references, which contain further useful references within.