# What is the Kronecker Product?

The Kronecker product of two matrices $A\in\mathbb{C}^{m\times n}$ and $B\in\mathbb{C}^{p\times q}$ (also called the tensor product) is the $mp\times nq$ matrix1

$A \otimes B = \begin{bmatrix} a_{11}B & a_{12}B & \dots & a_{1n}B \\ a_{21}B & a_{22}B & \dots & a_{2n}B \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1}B & a_{m2}B & \dots & a_{mn}B \end{bmatrix}.$

In other words, $A\otimes B$ is the block $m\times n$ matrix with $(i,j)$ block $a_{ij}B$. For example,

$\begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix} \otimes \begin{bmatrix} b_{11} & b_{12} & b_{13} \\ b_{21} & b_{22} & b_{23} \end{bmatrix} = \left[\begin{array}{ccc|ccc} a_{11}b_{11} & a_{11}b_{12} & a_{11}b_{13} & a_{12}b_{11} & a_{12}b_{12} & a_{12}b_{13} \\ a_{11}b_{21} & a_{11}b_{22} & a_{11}b_{23} & a_{12}b_{21} & a_{12}b_{22} & a_{12}b_{23} \\\hline a_{21}b_{11} & a_{21}b_{12} & a_{21}b_{13} & a_{22}b_{11} & a_{22}b_{12} & a_{22}b_{13} \\ a_{21}b_{21} & a_{21}b_{22} & a_{21}b_{23} & a_{22}b_{21} & a_{22}b_{22} & a_{22}b_{23} \end{array}\right].$

Notice that the entries of $A\otimes B$ comprise every possible product $a_{ij}b_{rs}$, which is not the case for the usual matrix product $AB$ when it is defined. Indeed if $A$ and $B$ are $n\times n$ then

• $AB$ is $n\times n$ and contains sums of $n^3$ of the products $a_{ij}b_{rs}$,
• $A\otimes B$ is $n^2\times n^2$ and contains all $n^4$ products $a_{ij}b_{rs}$.

Two key properties of the Kronecker product are

\begin{aligned} (A\otimes B)^* &= A^* \otimes B^*, \label{KP1} \\ (A\otimes B)(C\otimes D)&= AC \otimes BD. \label{KP2} \end{aligned}

The second equality implies that when $x_i$ is an eigenvector of $A\in\mathbb{C}^{m\times m}$ with eigenvalue $\lambda_i$ and $y_j$ is an eigenvector of $B\in\mathbb{C}^{n\times n}$ with eigenvalue $\mu_j$ then

$(A\otimes B)(x_i\otimes y_j) = (Ax_i \otimes By_j) = (\lambda_i x_i \otimes \mu_j y_j) = \lambda_i\mu_j ( x_i \otimes y_j),$

so that $\lambda_i\mu_j$ is an eigenvalue of $A\otimes B$ with eigenvector $x_i \otimes y_j$. In fact, the $mn$ eigenvalues of $A\otimes B$ are precisely $\lambda_i\mu_j$ for $i = 1\colon m$ and $j = 1\colon n$.

Kronecker product structure arises in image deblurring models in which the blur is separable, that is, the blur in the horizontal direction can be separated from the blur in the vertical direction. Kronecker products also arise in the construction of Hadamard matrices. Recall that a Hadamard matrix is a matrix of $\pm1$s whose rows and columns are mutually orthogonal. If $H_n$ is an $n\times n$ Hadamard matrix then

$\begin{bmatrix} \phantom{-} 1 & 1 \\ -1 & 1 \end{bmatrix} \otimes H_n = \begin{bmatrix} \phantom{-} H_n & H_n \\ -H_n & H_n \end{bmatrix}$

is a $2n\times 2n$ Hadamard matrix.

The practical significance of Kronecker product structure is that it allows computations on a large matrix to be reduced to computations on smaller matrices. For example, suppose $A$ and $B$ are Hermitian positive definite matrices and $C = A \otimes B$, which can be shown to be Hermitian positive definite from the properties mentioned above. If $A = R^*R$ and $B = S^*S$ are Cholesky factorizations then

$A \otimes B = (R^*R) \otimes (S^*S) = (R^*\otimes S^*) (R \otimes S) = (R \otimes S)^* (R \otimes S),$

so $R\otimes S$, which is easily seen to be triangular with positive diagonal elements, is the Cholesky factor of $A\otimes B$. If $A$ and $B$ are $n\times n$ then forming $A\otimes B$ and computing its Cholesky factorization costs $O(n^6)$ flops, whereas $R$ and $S$ can be computed in $O(n^3)$ flops. If we want to solve a linear system $(A \otimes B)x = b$ this can be done using $R$ and $S$ without explicitly form $R\otimes S$.

## Vec Operator

The vec operator stacks the columns of a matrix into one long vector: if $A = [a_1,a_2,\dots,a_m]$ then $\mathrm{vec}(A) = [a_1^T a_2^T \dots a_m^T]^T$. The vec operator and the Kronecker product interact nicely: for any $A$, $X$, and $B$ for which the product $AXB$ is defined,

$\mathrm{vec}(AXB) = (B^T \otimes A) \mathrm{vec}(X).$

This relation allows us to express a linear system $AXB = C$ in the usual form “$Ax=b$”.

## Kronecker Sum

The Kronecker sum of $A\in\mathbb{C}^{m\times m}$ and $B\in\mathbb{C}^{n\times n}$ is defined by $A\oplus B = A \otimes I_n + I_m \otimes B$. The eigenvalues of $A\oplus B$ are $\lambda_{ij} = \lambda_i(A) + \lambda_j(B)$, $i=1\colon m$, $j=1\colon n$, where the $\lambda_i(A)$ are the eigenvalues of $A$ and the $\lambda_j(B)$ are those of $B$.

The Kronecker sum arises when we apply the vec operator to the matrix $AX + XB$:

$\mathrm{vec}(AX + XB) = (I_m \otimes A + B^T \otimes I_n) \mathrm{vec}(X) = (B^T \oplus A) \mathrm{vec}(X).$

Kronecker sum structure also arises in finite difference discretizations of partial differential equations, such as when Poisson’s equation is discretized on a square by the usual five-point operator.

## Vec-Permutation Matrix

Since for $A\in\mathbb{C}^{m\times n}$ the vectors $\mathrm{vec}(A)$ and $\mathrm{vec}(A^T)$ contain the same $mn$ elements in different orders, we must have

$\mathrm{vec}(A^T) = \Pi_{m,n} \mathrm{vec}(A),$

for some $mn\times mn$ permutation matrix $\Pi_{m,n}$. This matrix is called the the vec-permutation matrix, and is also known as the commutation matrix.

Kronecker multiplication is not commutative, that is, $A \otimes B \ne B \otimes A$ in general, but $A \otimes B$ and $B \otimes A$ do contain the same elements in different orders. In fact, the two matrices are related by row and column permutations: if $A\in\mathbb{C}^{m\times n}$ and $B\in\mathbb{C}^{p\times q}$ then

$(A\otimes B)\Pi_{n,q} = \Pi_{m,p} (B\otimes A).$

This relation can be obtained as follows: for $X\in\mathbb{C}^{n\times q}$,

\begin{aligned} (B \otimes A)\mathrm{vec}(X) &= \mathrm{vec}(AXB^T)\\ &= \Pi_{p,m} \mathrm{vec}(BX^TA^T)\\ &= \Pi_{p,m} (A\otimes B)\mathrm{vec}(X^T)\\ &= \Pi_{p,m} (A\otimes B) \Pi_{n,q}\mathrm{vec}(X). \end{aligned}

Since these equalities hold for all $X$, we have $B\otimes A = \Pi_{p,m} (A\otimes B) \Pi_{n,q}$, from which the relation follows on using $\Pi_{m,n}\Pi_{n,m} = I$, which can be obtained by replacing $A$ by $A^T$ in the definition of vec.

An explicit expression for the the vec-permutation matrix is

$\Pi_{m,n} = \displaystyle\sum_{i=1}^m \sum_{j=1}^n (e_i^{} e_j^T) \otimes (e_j^{} e_i^T),$

where $e_i$ is the $i$th unit vector.

The following plot shows the sparsity patterns of all the vec permutation matrices $\Pi_{m,n}$ with $mn = 120$, where the title of each subplot is $(m,n)$.

## MATLAB

In MATLAB the Kronecker product $A\otimes B$ can be computed as kron(A,B) and $\mathrm{vec}(A)$ is obtained by indexing with a colon: A(:). Be careful using kron as it can generate very large matrices!

## Historical Note

The Kronecker product is named after Leopold Kronecker (1823–1891). Henderson et al. (1983) suggest that it should be called the Zehfuss product, after Johann Georg Zehfuss (1832–1891), who obtained the result $\det(A\otimes B) = \det(A)^n \det(B)^m$ for $A\in\mathbb{C}^{m\times m}$ and $B\in\mathbb{C}^{n\times n}$ in 1858.

## References

This is a minimal set of references, which contain further useful references within.

## Footnotes:

1

The $\otimes$ symbol is typed in $\LaTeX$ as \otimes.

# What Is the Gerstenhaber Problem?

When Cayley introduced matrix algebra in 1858, he did much more than merely arrange numbers in a rectangular array. His definitions of addition, multiplication, and inversion produced an algebraic structure that has proved to be immensely useful, and which still holds many mysteries today.

An $n\times n$ matrix has $n^2$ parameters, so the vector space $\mathbb{R}^{n\times n}$ of real matrices has dimension $n^2$, with a basis given by the matrices $E_{ij} = e_i^{}e_j^T$, where $e_i$ is the vector that is zero except for a $1$ in the $i$th entry. In his original paper, Cayley noticed the important property that the powers of a particular matrix $A$ can never span $\mathbb{R}^{n\times n}$. The Cayley–Hamilton theorem says that $A$ satisfies its own characteristic equation, that is, $p(A) = 0$ where $p(t) = \det(t I - A)$ is the characteristic polynomial of $A$. This means that $A^n$ can be expressed as a linear combination of $I$, $A$, …, $A^{n-1}$, so the powers of $A$ span a vector space of dimension at most $n$.

Gerstenhaber proved a generalization of this property in 1961: if $A$ and $B$ are two commuting $n\times n$ matrices then the matrices $A^iB^j$, for all nonnegative $i$ and $j$, generate a vector space of dimension at most $n$. This result is much more difficult to prove than the Cayley–Hamilton theorem. Gerstenhaber’s proof was based on algebraic geometry, but purely matrix-theoretic proofs have been given.

A natural question, called the Gerstenhaber problem, is: does this result extend to three matrices, that is, does the vector space

$S_n = \{\, A^iB^jC^k: 0\le i,j,k \le n-1 \,\}$

have dimension at most $n$ for any $n\times n$ matrices $A$, $B$, and $C$ that commute with each other? (We can limit the powers to $n-1$ by the Cayley–Hamilton theorem.) The problem is defined over any field, but here I focus on the reals.

Before considering the three matrix case let us note that the answer is known to be “no” for four commuting $4\times 4$ matrices $A$, $B$, $C$, and $D$. Indeed let

$A = e_1^{}e_3^T, \quad B = e_1^{}e_4^T, \quad C = e_2^{}e_3^T, \quad D = e_2^{}e_4^T$

and note that all possible products of two of these matrices are zero, so the matrices commute pairwise. Yet $I = A^0, A, B, C, D$ are clearly five linearly independent matrices. Hence Gerstenhaber’s result does not extend to four matrices. It also does not extend to five or more matrices because it is known that the failure of the result for one value of $n$ implies failure for all larger $n$. The question, then, is whether the three matrix case is like the two matrix case or the case for four or more matrices.

A great deal of effort has been put into proving or disproving the Gerstenhaber problem, so far without success. Here are two known facts.

• The result holds for all $n\le 11$.
• By a 1905 result of Schur, the dimension of $S_n$ is at most $1 + \lfloor n^2/4\rfloor$, which is less than $n^2$ but still much bigger than $n$. (Here, $\lfloor \cdot \rfloor$ is the floor function.)

One approach to investigating this problem is to look for a counterexample computationally. For some $n\ge 12$, choose three commuting $n\times n$ matrices $A$, $B$, and $C$, select $m\ge n$ monomials

$X_i = A^{i_p} B^{j_p} C^{k_p}, \quad 1\le p \le m,$

form the matrix

$Y = [\mathrm{vec}(X_1), \mathrm{vec}(X_2), \dots, \mathrm{vec}(X_m)],$

where $\mathrm{vec}$ converts a matrix into a vector by stacking the columns on top of each other, then compute $\mathrm{rank}(Y)$, which is a lower bound on $\mathrm{dim}(S_n)$, and check whether it is greater than $n$.

This simple-minded approach has some obvious difficulties. How do we choose $A$, $B$, and $C$? How do we choose the powers? How do we avoid overflow and underflow and compute a reliable rank, given that we might be dealing with large powers of large matrices?

Holbrook and O’Meara (2020), who have written several papers on the Gerstenhaber problem, which they call the GP problem, state that they “firmly believe the GP will turn out to have a negative answer” and they have developed a sophisticated approach to searching for a case with $\mathrm{dim}(S_n) > n$, which they call a “Eureka”. They first note that $A$, $B$ and $C$ can be assumed to be nilpotent. This means that all three matrices must be defective, because a nondefective nilpotent matrix is zero. Next they note that since commuting matrices are simultaneously unitarily triangularizable, $A$, $B$, and $C$ can be assumed to be strictly upper triangular. Then they note that $A$ can be assumed to be in Weyr canonical form.

The Weyr canonical form is a dual of the Jordan canonical form in which the Jordan matrix is replaced by a Weyr matrix, which is a direct sum of basic Weyr matrices. A basic Weyr matrix has one distinct eigenvalue and is upper block bidiagonal with diagonal blocks that are multiples of the identity matrix and superdiagonal blocks that are rectangular identity matrices. The difference between Jordan and Weyr matrices is illustrated by the example

$J(\lambda) = \left[\begin{array}{ccc|cc} \lambda & 1 & 0 &&\\ 0 & \lambda & 1 &&\\ 0 & 0 & \lambda &&\\\hline & & &\lambda & 1\\ & & & 0 &\lambda \end{array}\right], \quad W(\lambda) = \left[\begin{array}{cc|cc|c} \lambda & 0 & 1 & 0&\\ 0 & \lambda & 0 & 1&\\\hline & & \lambda & 0 &1\\ & & 0 &\lambda & 0\\\hline & & & 0 &\lambda \end{array}\right],$

where the Jordan matrix $J(\lambda)$ and Weyr matrix $W(\lambda)$ are related by $J(\lambda) = P^TW(\lambda)P$ for some permutation matrix $P$. There is an elegant way of relating the block sizes in the Jordan and Weyr matrices via a Young diagram. Form an array whose $k$th column contains $j_k$ dots, where $j_k$ is the size of the $j$th diagonal block of $J$:

$\begin{array}{c|ccc} & j_1 & j_2\\\hline w_1 & \bullet & \bullet\\ w_2 & \bullet & \bullet\\ w_3 & \bullet & \end{array}$

Then the number of dots in the $k$th row is the size of the $k$th diagonal block in the Weyr form.

The reason for using the Weyr form is that whereas any matrix that commutes with a Jordan matrix has Toeplitz blocks, any matrix that commutes with a Weyr matrix is block upper triangular and is uniquely determined by the first block row. By choosing a Weyr form for $A$, commuting matrices $B$ and $C$ can be built up in a systematic way.

Thanks to a result of O’Meara (2020) it suffices to compute modulo a prime $p$, so the computations can be done in exact arithmetic, removing the need to worry about rounding errors or overflow.

Holbrook and O’Meara have mostly tried matrices up to dimension 50, but they feel that “the Loch Ness monster probably lives in deeper water, closer to $100\times 100$”. Their MATLAB codes (40 nicely commented but not optimized M-files) are available on request from the address given in their preprint. If you have the time to spare you might want to experiment with the codes and try to find a Eureka.

## References

This is a minimal set of references, which contain further useful references within.

# What Is a Cholesky Factorization?

The Cholesky factorization of a symmetric positive definite matrix $A$ is the factorization $A = R^T\!R$, where $R$ is upper triangular with positive diagonal elements. It is a generalization of the property that a positive real number has a unique positive square root. The Cholesky factorization always exists and the requirement that the diagonal of $R$ be positive ensures that it is unique.

As an example, the Cholesky factorization of the $4\times 4$ matrix with $(i,j)$ element $\gcd(i,j)$ (gallery('gcdmat',4) in MATLAB) is

$G_4 = \left[\begin{array}{cccc} 1 & 1 & 1 & 1\\ 1 & 2 & 1 & 2\\ 1 & 1 & 3 & 1\\ 1 & 2 & 1 & 4 \end{array}\right] = \left[\begin{array}{cccc} 1 & 0 & 0 & 0\\ 1 & 1 & 0 & 0\\ 1 & 0 & \sqrt{2} & 0\\ 1 & 1 & 0 & \sqrt{2} \end{array}\right] \left[\begin{array}{cccc} 1 & 1 & 1 & 1\\ 0 & 1 & 0 & 1\\ 0 & 0 & \sqrt{2} & 0\\ 0 & 0 & 0 & \sqrt{2} \end{array}\right].$

The Cholesky factorization of an $n\times n$ matrix contains $n-1$ other Cholesky factorizations within it: $A_k = R_k^TR_k$, $k=1:n-1$, where $A_k = A(1\colon k,1\colon k)$ is the leading principal submatrix of order $k$. For example, for $G_4$ with $k = 2$, $\bigl[\begin{smallmatrix}1 & 1\\ 1& 2\end{smallmatrix}\bigr]= \bigl[\begin{smallmatrix}1 & 0\\ 1& 1\end{smallmatrix}\bigr] \bigl[\begin{smallmatrix}1 & 1\\ 0& 1\end{smallmatrix}\bigr]$.

Inverting the Cholesky equation gives $A^{-1} = R^{-1} R^{-T}$, which implies the interesting relation that the $(n,n)$ element of $A^{-1}$ is $r_{nn}^{-2}$. So $G_4^{-1}$ has $(4,4)$ element $1/2$. We also have $\det(A) = \det(R)^2 = (r_{11}\dots r_{nn})^2$, so $\det(G_4) = 4$ for this matrix.

The Cholesky factorization is named after André-Louis Cholesky (1875–1918), a French military officer involved in geodesy and surveying in Crete and North Africa, who developed it for solving the normal equations arising in least squares problems.

The existence and uniqueness of the factorization can be proved by induction on $n$, and the proof also serves as a method for computing it. Partition $A\in\mathbb{R}^{n\times n}$ as

$A = \begin{bmatrix} A_{n-1} & c \\ c^T & \alpha \end{bmatrix}.$

We know that $A_{n-1}$ is positive definite (any principal submatrix of a positive definite matrix is easily shown to be positive definite). Assume that $A_{n-1}$ has a unique Cholesky factorization $A_{n-1} = R_{n-1}^TR_{n-1}$ and define the upper triangular matrix

$R = \begin{bmatrix} R_{n-1} & r \\ 0 & \beta \end{bmatrix}.$

Then

$R^T\!R = \begin{bmatrix} R_{n-1}^T R_{n-1} & R_{n-1}^T r \\ r^T R_{n-1} & r^Tr + \beta^2 \end{bmatrix},$

which equals $A$ if and only if

\begin{aligned} R_{n-1}^T r &= c, \\ r^Tr + \beta^2 &= \alpha. \end{aligned}

The first equation has a unique solution since $R_{n-1}$ is nonsingular. Then the second equation gives $\beta^2 = \alpha - r^Tr$. It remains to check that there is a unique real, positive $\beta$ satisfying this equation. From the inequality

$0 < \det(A) = \det(R^T) \det(R) = \det(R_{n-1})^2 \beta^2$

we see that $\beta^2>0$, hence there is a unique $\beta>0$. This completes the inductive step.

The quantity $\alpha - r^Tr = \alpha - c^T A_{n-1}^{-1}c$ is the Schur complement of $A_{n-1}$ in $A$. The essential reason why Cholesky factorization works is that the Schur complements of a positive definite matrix are themselves positive definite. Indeed we have the congruence

$\begin{bmatrix} I & 0 \\ -c^T A_{n-1}^{-1} & 1 \end{bmatrix} \begin{bmatrix} A_{n-1} & c \\ c^T & \alpha \end{bmatrix} \begin{bmatrix} I & -A_{n-1}^{-1}c\\ 0 & 1 \end{bmatrix} = \begin{bmatrix} A_{n-1} & 0 \\ 0 & \alpha - c^T A_{n-1}^{-1} c \end{bmatrix},$

and since congruences preserve definiteness it follows that $\alpha - c^T A_{n-1}^{-1} c > 0$.

In textbooks it is common to see element-level equations for computing the Cholesky factorization, which come from directly solving the matrix equation $A = R^T\!R$. If we equate elements on both sides, taking them a column at a time, starting with $a_{11} = r_{11}^2$, then the following algorithm is obtained:

$\begin{array}{l} 1~ \mbox{for}~j=1:n \\ 2~ \quad \mbox{for}~ i=1:j-1 \\ 3~ \qquad r_{ij} = \bigl(a_{ij} - \sum_{k=1}^{i-1} r_{ki}r_{kj} \bigr) /r_{ii} \\ 4~ \quad \mbox{end} \\ 5~ \quad r_{jj} = \bigl(a_{jj} - \sum_{k=1}^{j-1} r_{kj}^2\bigr)^{1/2} \\ 6~ \mbox{end} \end{array}$

What happens if this algorithm is executed on a general (indefinite) symmetric matrix that is, one that has both positive and negative eigenvalues? Line 5 will either attempt to take the square root of a negative number for some $j$ or it will produce $r_{jj} = 0$ and on the next iteration of the loop we will have a division by zero. In floating-point arithmetic it is possible for the algorithm to fail for a positive definite matrix, but only if it is numerically singular: the algorithm is guaranteed to run to completion if the condition number $\kappa_2(A) = \|A\|_2 \|A^{-1}\|_2$ is safely less than $u^{-1}$, where $u$ is the unit roundoff.

The MATLAB function chol normally returns an error message if the factorization fails. But a second output argument can be requested, which is set to the number of the stage on which the factorization failed, or to zero if the factorization succeeded. In the case of failure, the partially computed $R$ factor, returned in the first argument, can be used to compute a direction of negative curvature for $A$, which is a vector $z$ such that $z^T\!Az < 0$. Here is an example. The code

n = 8;
A = gallery('lehmer',n) - 0.3*eye(n); % Indefinite matrix.
[R,p] = chol(A)
z = [-R\(R'\A(1:p-1,p)); 1; zeros(n-p,1)];
neg_curve = z'*A*z


produces the output

R =
8.3666e-01   5.9761e-01   3.9841e-01
0   5.8554e-01   7.3193e-01
0            0   7.4536e-02
p =
4
neg_curve =
-9.1437e+00


Cholesky factorization has excellent numerical stability. The computed factor $\widehat{R}$ satisfies

$A + \Delta A = \widehat{R}^T\widehat{R}, \quad \|\Delta A\|_2 \le c_nu\|A\|_2,$

where $c_n$ is a constant. Unlike for LU factorization there is no possibility of element growth; indeed for $i\le j$,

$r_{ij}^2 \le \displaystyle \sum_{k=1}^j r_{kj}^2 = a_{ij},$

so the elements of $R$ are nicely bounded relative to those of $A$.

Once we have a Cholesky factorization we can use it to solve a linear system $Ax = b$, by solving the lower triangular system $R^Ty = b$ and then the upper triangular system $Rx = y$. The computed solution $\widehat{x}$ can be shown to satisfy

$(A+\Delta A)\widehat{x} = b, \quad \|\Delta A\|_2 \le d_nu\|A\|_2$

where $d_n$ is another constant. Thus $\widehat{x}$ has a small backward error.

Finally, what if $A$ is only symmetric positive semidefinite, that is, $x^T\!Ax \ge 0$ for all $x$, so that $A$ is possibly singular? There always exists a Cholesky factorization (we can take the $R$ factor in the QR factorization of the positive semidefinite square root of $A$) but it may not be unique. For example, we have

$A = \left[\begin{array}{cccc} 1 & 1 & 1 & 1\\ 1 & 1 & 1 & 1\\ 1 & 1 & 2 & 2\\ 1 & 1 & 2 & 4\\ \end{array}\right] = \left[\begin{array}{cccc} 1 & 0 & 0 & 0\\ 1 & 0 & 0 & 0\\ 1 & 1 & 0 & 0\\ 1 & 1 & x & y\\ \end{array}\right] \left[\begin{array}{cccc} 1 & 1 & 1 & 1\\ 0 & 0 & 1 & 1\\ 0 & 0 & 0 & x\\ 0 & 0 & 0 & y\\ \end{array}\right] = R^T\!R$

for any $x$ and $y$ such that $x^2 + y^2 = 2$. Note that $A$ has rank 3 but $R$ has two zero diagonal elements. When $A$ is positive semidefinite of rank $r$ what is usually wanted is a Cholesky factorization in which $R$ is zero in its last $n-r$ rows. Such a factorization can be obtained by using complete pivoting, which at each stage permutes the largest remaining diagonal element into the pivot position, which gives a factorization

$\Pi^T\mskip-5mu A\Pi = R^T\!R, \quad R = \begin{bmatrix} R_{11} & R_{12} \\ 0 & 0 \end{bmatrix}, \quad R_{11}\in\mathbb{R}^{r \times r},$

where $\Pi$ is a permutation matrix. For example, with $\Pi$ the identity matrix with its columns in reverse order,

$\Pi^T\mskip-5mu A\Pi = \left[\begin{array}{cccc} 4 & 2 & 1 & 1\\ 2 & 2 & 1 & 1\\ 1 & 1 & 1 & 1\\ 1 & 1 & 1 & 1 \end{array}\right] = \left[\begin{array}{cccc} 2 & 0 & 0 & 0\\ 1 & 1 & 0 & 0\\ \frac{1}{2} & \frac{1}{2} & \frac{\sqrt{2}}{2} & 0\\[2pt] \frac{1}{2} & \frac{1}{2} & \frac{\sqrt{2}}{2} & 0 \end{array}\right] \left[\begin{array}{cccc} 2 & 1 & \frac{1}{2} & \frac{1}{2}\\[2pt] 0 & 1 & \frac{1}{2} & \frac{1}{2}\\[2pt] 0 & 0 & \frac{\sqrt{2}}{2} & \frac{\sqrt{2}}{2}\\ 0 & 0 & 0 & 0 \end{array}\right],$

which clearly displays the rank of $A$.

## References

This is a minimal set of references, which contain further useful references within.

# What is Numerical Stability?

Numerical stability concerns how errors introduced during the execution of an algorithm affect the result. It is a property of an algorithm rather than the problem being solved. I will assume that the errors under consideration are rounding errors, but in principle the errors can be from any source.

Consider a scalar function $y = f(x)$ of a scalar $x$. We regard $x$ as the input data and $y$ as the output. The forward error of a computed approximation $\widehat{y}$ to $y$ is the relative error $|y - \widehat{y}|/|y|$. The backward error of $\widehat{y}$ is

$\min\left\{\, \displaystyle\frac{|\Delta x|}{|x|}: \widehat{y} = f(x+\Delta x) \,\right\}.$

If $\widehat{y}$ has a small backward error then it is the exact answer for a slightly perturbed input. Here, “small’ is interpreted relative to the floating-point arithmetic, so a small backward error means one of the form $cu$ for a modest constant $c$, where $u$ is the unit roundoff.

An algorithm that always produces a small backward error is called backward stable. In a backward stable algorithm the errors introduced during the algorithm have the same effect as a small perturbation in the data. If the backward error is the same size as any uncertainty in the data then the algorithm produces as good a result as we can expect.

If $x$ undergoes a relative perturbation of size $u$ then $y = f(x)$ can change by as much as $\mathrm{cond}_f(x) u$, where

$\mathrm{cond}_f(x) = \lim_{\epsilon\to0} \displaystyle\sup_{|\Delta x| \le \epsilon |x|} \displaystyle\frac{|f(x+\Delta x) - f(x)|}{\epsilon|f(x)|}$

is the condition number of $f$ at $x$. An algorithm that always produces $\widehat{y}$ with a forward error of order $\mathrm{cond}_f(x) u$ is called forward stable.

The definition of $\mathrm{cond}_f(x)$ implies that a backward stable algorithm is automatically forward stable. The converse is not true. An example of an algorithm that is forward stable but not backward stable is Gauss–Jordan elimination for solving a linear system.

If $\widehat{y}$ satisfies

$\widehat{y} + \Delta y = f(x+\Delta x), \quad |\Delta y| \le \epsilon |\widehat{y}|, \quad |\Delta x| \le \epsilon |x|,$

with $\epsilon$ small in the sense described above, then the algorithm for computing $y$ is mixed backward–forward stable. Such an algorithm produces almost the right answer for a slightly perturbed input. The following diagram illustrates the previous equation.

With these definitions in hand, we can turn to the meaning of the term numerically stable. Depending on the context, numerical stability can mean that an algorithm is (a) backward stable, (b) forward stable, or (c) mixed backward–forward stable.

For some problems, backward stability is difficult or impossible to achieve, so numerical stability has meaning (b) or (c). For example, let $Z = xy^T$, where $x$ and $y$ are $n$-vectors. Backward stability would require the computed $\widehat{Z}$ to satisfy $\widehat{Z} = (x+\Delta x)(y+\Delta y)^T$ for some small $\Delta x$ and $\Delta y$, meaning that $\widehat{Z}$ is a rank-$1$ matrix. But the computed $\widehat{Z}$ contains $n^2$ independent rounding errors and is very unlikely to have rank $1$.

We briefly mention some other relevant points.

• What is the data? For a linear system $Ax = b$ the data is $A$ or $b$, or both. For a nonlinear function we need to consider whether problem parameters are data; for example, for $f(x) = \cos(3x^2+2)$ are the $3$ and the $2$ constants or are they data, like $x$, that can be perturbed?
• We have measured perturbations in a relative sense. Absolute errors can also be used.
• For problems whose inputs are matrices or vectors we need to use norms or componentwise measures.
• Some algorithms are only unconditionally numerically stable, that is, they are numerically stable for some subset of problems. For example, the normal equations method for solving a linear least squares problem is forward stable for problems with a large residual.

## References

This is a minimal set of references, which contain further useful references within.