# What Is a Pseudo-Orthogonal Matrix?

A matrix $Q\in\mathbb{R}^{n\times n}$ is pseudo-orthogonal if

$\notag Q^T \Sigma Q = \Sigma, \qquad (1)$

where $\Sigma = \mathrm{diag}(\pm 1)$ is a signature matrix. A matrix $Q$ satisfying (1) is also known as a $J$-orthogonal matrix, where $J$ is another notation for a signature matrix. Of course, if $\Sigma = I$ then $Q$ is orthogonal.

It is easy to show that $Q^T$ is also pseudo-orthogonal. Furthermore, $Q$ is clearly nonsingular and it satisfies

$\notag Q = \Sigma Q^{-T}\Sigma. \qquad (2)$

Since $\Sigma$ is orthogonal, this equation implies that $\|Q\|_\ell = \|Q^{-T}\|_\ell = \|Q^{-1}\|_\ell$ and hence that

$\notag \kappa_p(Q) = \|Q\|_\ell^2, \quad \ell = 2,F. \qquad(3).$

What are some examples of pseudo-orthogonal matrices? For $n = 2$ and $\Sigma = \left[\begin{smallmatrix}1 & 0 \\ 0 & -1 \end{smallmatrix}\right]$, $Q$ is of the form

$\notag Q = \begin{bmatrix} a & b \\ c & d \end{bmatrix}, \quad ab - cd = 0, \quad a^2 - c^2 = 1, \quad b^2 - d^2 = -1,$

which includes the matrices

$\notag Q = \begin{bmatrix} \cosh \theta & -\sinh\theta \\ -\sinh\theta & \cosh\theta \end{bmatrix}, \quad \theta\in\mathbb{R}. \qquad (4)$

The Lorentz group, representing symmetries of the spacetime of special relativity, corresponds to $4\times 4$ matrices with $\Sigma = \mathrm{diag}(1,1,1,-1)$.

Equation (2) shows that $Q$ is similar to the inverse of its transpose and hence (since every matrix is similar to its transpose) similar to its inverse. It follows that if $\lambda$ is an eigenvalue of $Q$ then $\lambda^{-1}$ is also an eigenvalue and it has the same algebraic and geometric multiplicities as $\lambda$.

By permuting rows and columns in (1) we can arrange that

$\notag \Sigma = \Sigma_{p,q} := \begin{bmatrix} I_p & 0 \\ 0 & -I_q \end{bmatrix}. \qquad (5)$

We assume that $\Sigma$ has this form throughout the rest of this article. This form of $\Sigma$ allows us to conveniently characterize matrices that are both orthogonal and pseudo-orthogonal. Such a matrix must satisfy $\Sigma Q = Q\Sigma$, which means that $Q = \mathrm{diag}(Q_{11},Q_{22})$, and any such orthogonal matrix is pseudo-orthogonal.

## Applications

Pseudo-orthogonal matrices arise in hyperbolic problems, that is, problems where there is an underlying indefinite scalar product or weight matrix. An example is the problem of downdating the Cholesky factorization, where in the simplest case we have the Cholesky factorization $C = R^T\!R$ of a symmetric positive definite $C\in\mathbb{R}^{n\times n}$ and want the Cholesky factorization of $\widetilde{C} = C - zz^T$, which is assumed to be symmetric positive definite. A more general downdating problem is that we are given

$\notag A = \begin{array}[b]{cc} \left[\begin{array}{@{}c@{}} A_1\\ A_2 \end{array}\right] & \mskip-22mu\ \begin{array}{l} \scriptstyle p \\ \scriptstyle q \end{array} \end{array}, \quad p\ge n,$

and the Cholesky factorization $A^T\!A = R^T\!R$ and wish to obtain the Cholesky factor $S$ of $A_1^TA_1 = R^T\!R - A_2^TA_2$. Note that $R$ and $S$ are $n\times n$. This task arises when we solve a regression problem after the observations corresponding to $A_2$ have been removed. The simple case above corresponds to removing one row ($q = 1$). Assuming that $q \ll p$, we would like to obtain $S$ cheaply from $R$, and numerical stability considerations dictate that we should avoid explicitly forming $A_1^TA_1$. If we can find a pseudo-orthogonal matrix $Q$ such that

$\notag Q \begin{bmatrix} R \\ A_2 \end{bmatrix} = \begin{bmatrix} S \\ 0 \end{bmatrix}, \qquad (6)$

with $\Sigma$ given by (5) and $S\in\mathbb{R}^{n\times n}$ upper triangular, then

$\notag A_1^TA_1 = \begin{bmatrix} R \\ A_2 \end{bmatrix}^T \Sigma \begin{bmatrix} R \\ A_2 \end{bmatrix} = \begin{bmatrix} R \\ A_2 \end{bmatrix}^T Q^T \Sigma Q \begin{bmatrix} R \\ A_2 \end{bmatrix} = \begin{bmatrix} S \\ 0 \end{bmatrix}^T \Sigma \begin{bmatrix} S \\ 0 \end{bmatrix} = S^T\!S,$

so $S$ is the desired Cholesky factor.

The factorization (6) is called a hyperbolic QR factorization and it can be computed by using hyperbolic rotations to zero out the elements of $A_2$. A $2\times2$ hyperbolic rotation has the form (4), and an $n\times n$ hyperbolic rotation is an identity matrix with a $2\times 2$ hyperbolic rotation embedded in it at the intersection of rows and columns $i$ and $j$, for some $i$ and $j$.

In general, a hyperbolic QR factorization of $A\in\mathbb{R}^{m\times n}$ with $m = p+q$ and $p\ge n$ has the form $QA = \left[\begin{smallmatrix} R \\ 0 \end{smallmatrix}\right]$ with $Q$ pseudo-orthogonal with respect to $\Sigma = \Sigma_{p,q}$ and $R \in\mathbb{R}^{n\times n}$ upper triangular. The factorization exists if $A^T\Sigma A$ is positive definite.

Another hyperbolic problem is the indefinite least squares problem

$\notag \min_x \,(b-Ax)^T \Sigma (b-Ax), \qquad (7)$

where $A\in\mathbb{R}^{m\times n}$, $m\ge n$, and $b\in\mathbb{R}^m$ are given, and $\Sigma = \Sigma_{p,q}$ with $m = p + q$. For $p=0$ or $q=0$ we have the standard least squares (LS) problem and the quadratic form is definite, while for $pq>0$ the problem is to minimize a genuinely indefinite quadratic form. This problem arises, for example, in the area of optimization known as $H^{\infty}$ smoothing.

The normal equations for (7) are $A^T\Sigma Ax = A^T\Sigma b$, and since the Hessian matrix of the quadratic objective function in (7) is $A^T\Sigma A$ it follows that the indefinite least squares problem has a unique solution if and only if $A^T\Sigma A$ is positive definite. To solve the problem we can use a hyperbolic QR factorization $QA = \left[\begin{smallmatrix} R \\ 0 \end{smallmatrix}\right]$ to write

\notag \begin{aligned} A^T\Sigma A &= A^T Q^T \Sigma Q A = \begin{bmatrix} R \\ 0 \end{bmatrix}^T \begin{bmatrix} I_p & 0 \\ 0 & -I_q \end{bmatrix} \begin{bmatrix} R \\ 0 \end{bmatrix} = R^T\!R, \\ A^T\Sigma b &= A^T Q^T\Sigma Q b = \begin{bmatrix} R \\ 0 \end{bmatrix}^T \! \Sigma Q b. \end{aligned}

Solving the problem now reduces to solving the triangular system $Rx = d$, where $d$ comprises the first $n$ components of $Qb$. The same equation can also be obtained without using the normal equations by substituting the hyperbolic QR factorization into (7).

## The Exchange Operator

A simple technique exists for converting pseudo-orthogonal matrices into orthogonal matrices and vice versa. Let $A\in\mathbb{R}^{n\times n}$ with $n = p + q$, partition

$\notag A = \mskip5mu \begin{array}[b]{@{\mskip-20mu}c@{\mskip0mu}c@{\mskip-1mu}c@{}} & \mskip10mu\scriptstyle p & \scriptstyle q \\ \mskip15mu \begin{array}{r} \scriptstyle p \\ \scriptstyle q \end{array}~ & \multicolumn{2}{c}{\mskip-15mu \left[\begin{array}{c@{~}c@{~}} A_{11} & A_{12}\\ A_{21} & A_{22} \end{array}\right] } \end{array}, \qquad (8)$

and assume $A_{11}$ is nonsingular. The exchange operator is defined by

$\notag \mathrm{exc}(A) = \begin{bmatrix} A_{11}^{-1} & -A_{11}^{-1}A_{12} \\ A_{21}A_{11}^{-1} & A_{22} -A_{21}A_{11}^{-1}A_{12} \end{bmatrix}.$

It is easy to see that the exchange operator is involutory, that is,

$\notag \mathrm{exc}(\mathrm{exc}(A)) = A,$

and moreover (recalling that $\Sigma$ is given by (5)) that

$\notag \mathrm{exc}(\Sigma A\Sigma) = \Sigma \mathrm{exc}(A)\Sigma = \mathrm{exc}(A^T)^T. \qquad(9)$

The next result gives a formula for the inverse of $\mathrm{exc}(A)$.

Lemma 1. Let $A\in\mathbb{R}^{n\times n}$ with $A_{11}$ nonsingular. If $A$ is nonsingular and $\mathrm{exc}(A^{-1})$ exists then $\mathrm{exc}(A)$ is nonsingular and $\mathrm{exc}(A)^{-1} = \mathrm{exc}(A^{-1})$.

Proof. Consider the equation

$\notag y = \begin{bmatrix} y_1 \\ y_2 \end{bmatrix} = \begin{bmatrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = Ax.$

By solving the first equation for $x_1$ and then eliminating $x_1$ from the second equation we obtain

$\notag \begin{bmatrix} x_1 \\ y_2 \end{bmatrix} = \mathrm{exc}(A) \begin{bmatrix} y_1 \\ x_2 \end{bmatrix}. \qquad (10)$

By the same argument applied to $x = A^{-1}y$, we have

$\notag \begin{bmatrix} y_1 \\ x_2 \end{bmatrix} = \mathrm{exc}(A^{-1}) \begin{bmatrix} x_1 \\ y_2 \end{bmatrix}.$

Hence for any $x_1$ and $y_2$ there is a unique $y_1$ and $x_2$, which implies by (10) that $\mathrm{exc}(A)$ is nonsingular and that $\mathrm{exc}(A)^{-1} = \mathrm{exc}(A^{-1})$. $~\square$

Now we will show that the exchange operator maps pseudo-orthogonal matrices to orthogonal matrices and vice versa.

Theorem 2. Let $A\in\mathbb{R}^{n\times n}$. If $A$ is pseudo-orthogonal then $\mathrm{exc}(A)$ is orthogonal. If $A$ is orthogonal and $A_{11}$ is nonsingular then $\mathrm{exc}(A)$ is pseudo-orthogonal.

Proof. If $A$ is pseudo-orthogonal then $A_{11}^TA_{11} = I + A_{21}^TA_{21}$, which implies that $A_{11}$ is nonsingular. Since $\Sigma A^T\Sigma = A^{-1}$, it follows that $A^{-1}$ also has a nonsingular $(1,1)$ block and so $\mathrm{exc}(A^{-1})$ exists. Furthermore, using Lemma 1, $\mathrm{exc}(\Sigma A^T\Sigma) = \mathrm{exc}(A^{-1}) = \mathrm{exc}(A)^{-1}$. But (9) shows that $\mathrm{exc}(\Sigma A^T\Sigma) = \mathrm{exc}(A)^T$, and we conclude that $\mathrm{exc}(A)$ is orthogonal.

Assume now that $A$ is orthogonal with $A_{11}$ nonsingular. Then $\mathrm{exc}(A^T) = \mathrm{exc}(A^{-1})$ exists and Lemma 1 shows that $\mathrm{exc}(A)$ is nonsingular and $\mathrm{exc}(A)^{-1} = \mathrm{exc}(A^{-1}) = \mathrm{exc}(A^T)$. Hence, using (9),

$I = \mathrm{exc}(A^T) \mathrm{exc}(A) = \Sigma\mathrm{exc}(A)^T\Sigma \cdot \mathrm{exc}(A),$

which shows that $\mathrm{exc}(A)$ is pseudo-orthogonal. $~\square$

This MATLAB example uses the exchange operator to convert an orthogonal matrix obtained from a Hadamard matrix into a pseudo-orthogonal matrix.

>> p = 2; n = 4;
>> A = hadamard(n)/sqrt(n), Sigma = blkdiag(eye(p),-eye(n-p))
A =
5.0000e-01   5.0000e-01   5.0000e-01   5.0000e-01
5.0000e-01  -5.0000e-01   5.0000e-01  -5.0000e-01
5.0000e-01   5.0000e-01  -5.0000e-01  -5.0000e-01
5.0000e-01  -5.0000e-01  -5.0000e-01   5.0000e-01
Sigma =
1     0     0     0
0     1     0     0
0     0    -1     0
0     0     0    -1
>> Q = exc(A,p), Q'*Sigma*Q
Q =
1     1    -1     0
1    -1     0    -1
1     0    -1    -1
0     1    -1     1
ans =
1     0     0     0
0     1     0     0
0     0    -1     0
0     0     0    -1

The code uses the function

function X = exc(A,p)
%EXC     Exchange operator.
%   EXC(A,p) is the result of applying the exchange operator to
%   the square matrix A, which is regarded as a block 2-by-2
%   matrix with leading block of dimension p.
%   p defaults to floor(n)/2.

[m,n] = size(A);
if m ~= n, error('Matrix must be square.'), end
if nargin < 2, p = floor(n/2); end

A11 = A(1:p,1:p);
A12 = A(1:p,p+1:n);
A21 = A(p+1:n,1:p);
A22 = A(p+1:n,p+1:n);

X21 = A11\A12;
X = [inv(A11) -X21;
A21/A11  A22-A21*X21];

## Hyperbolic CS Decomposition

For an orthogonal matrix expressed in block $2\times 2$ form there is a close relationship between the singular value decompositions (SVDs) of the blocks, as revealed by the CS decomposition (see What Is the CS Decomposition?). An analogous decomposition holds for a pseudo-orthogonal matrix. Let $Q\in\mathbb{R}^{n \times n}$ be pseudo-orthogonal with respect to $\Sigma$ in (5), and suppose that $Q$ is partitioned as

$\notag Q = \begin{array}[b]{@{\mskip33mu}c@{\mskip-16mu}c@{\mskip-10mu}c@{}} \scriptstyle p & \scriptstyle n-p & \\ \multicolumn{2}{c}{ \left[\begin{array}{c@{~}c@{~}} Q_{11}& Q_{12} \\ Q_{21}& Q_{22} \\ \end{array}\right]} & \mskip-12mu\ \begin{array}{c} \scriptstyle p \\ \scriptstyle n-p \end{array} \end{array}, \quad p \le \displaystyle\frac{n}{2}.$

Then there exist orthogonal matrices $U_1,V_1\in\mathbb{R}^{p \times p}$ and $U_2,V_2\in\mathbb{R}^{q \times q}$ such that

$\notag \begin{bmatrix} U_1^T & 0\\ 0 & U_2^T \end{bmatrix} \begin{bmatrix} Q_{11} & Q_{12}\\ Q_{21} & Q_{22} \end{bmatrix} \begin{bmatrix} V_1 & 0\\ 0 & V_2 \end{bmatrix} = \begin{array}[b]{@{\mskip35mu}c@{\mskip30mu}c@{\mskip-10mu}c@{}c} \scriptstyle p & \scriptstyle p & \scriptstyle n-2p & \\ \multicolumn{3}{c}{ \left[\begin{array}{c@{~}|c@{~}c} C & -S & 0 \\ \hline -S & C & 0 \\ 0 & 0 & I_{n-2p} \end{array}\right]} & \mskip-12mu \begin{array}{c} \scriptstyle p \\ \scriptstyle p \\ \scriptstyle n-2p \end{array} \end{array}, \qquad (11)$

where $C = \mathrm{diag}(c_i)$, $S = \mathrm{diag}(s_i)$, and $C^2 - S^2 = I$, with $c_i > s_i \ge 0$ for all $i$. This is the hyperbolic CS decomposition, and it can be proved by applying the CS decomposition of an orthogonal matrix to $\mathrm{exc}(Q)$.

The leading principal submatrix $\left[\begin{smallmatrix}C & -S \\ -S & C \end{smallmatrix}\right]$ in (11) generalizes the $2\times 2$ matrix (4), and in fact it can be permuted into a direct sum of such matrices.

Note that the matrix on the right in (11) is symmetric positive definite. Therefore the singular values of $Q$ are the eigenvalues of that matrix, namely

$\notag c_1 \pm s_1, \dots, c_p \pm s_p; \quad 1~\mathrm{with~multiplicity~}n - 2p.$

Since $c_i^2 - s_i^2 = 1$ for all $i$, the first $2p$ singular values occur in reciprocal pairs, hence the largest and smallest singular values satisfy $\sigma_1 = \sigma_n^{-1}\ge 1$ (with strict inequality unless $p = 0$). This gives another proof of (3).

## Numerical Stability

While an orthogonal matrix is perfectly conditioned, a pseudo-orthogonal matrix can be arbitrarily ill conditioned, as follows from (3). For example, the MATLAB function gallery('randjorth') produces a random pseudo-orthogonal matrix with a default condition number of sqrt(1/eps).

>> rng(1); A = gallery('randjorth',2,2) % p = 2, n = 4
A =
2.9984e+03  -4.2059e+02   1.5672e+03  -2.5907e+03
1.9341e+03  -2.6055e+03   3.1565e+03  -7.5210e+02
3.1441e+03  -6.2852e+02   1.8157e+03  -2.6427e+03
1.6870e+03  -2.5633e+03   3.0204e+03  -5.4157e+02
>> cond(A)
ans =
6.7109e+07

This means that algorithms that use pseudo-orthogonal matrices are potentially numerically unstable. Therefore algorithms need to be carefully constructed and rounding error analysis must be done to ensure that an appropriate form of numerical stability is obtained.

## Notes

Pseudo-orthogonal matrices form the automorphism group of the scalar product defined by $\langle x,y\rangle = x^T\Sigma y$ for $x,y\in\mathbb{R}^n$. More results for pseudo-orthogonal matrices can be obtained as special cases of results for automorphism groups of general scalar products. See, for example, Mackey, Mackey, and Tisseur (2006).

For $\Sigma \ne \pm I$ the set of pseudo-orthogonal matrices is known to have four connected components, a topological property that can be proved using the hyperbolic CS decomposition (Motlaghian, Armandnejad, and Hall, 2018).

One can define pseudo-unitary matrices in an analogous way, as $Q\in\mathbb{C}^{n\times n}$ such that $Q^*\Sigma Q = \Sigma$. These correspond to the automorphism group of the scalar product $\langle x,y\rangle = x^*\Sigma y$ for $x,y\in\mathbb{C}^n$. The results we have discussed generalize in a straightforward way to pseudo-unitary matrices.

The exchange operator is also known as the principal pivot transform and as the sweep operator in statistics. Tsatsomeros (2000) gives a survey of its properties

The hyperbolic CS decomposition was derived by Lee (1948) and, according to Lee, was present in work of Autonne (1912).

## References

This is a minimal set of references, which contain further useful references within.

# What Is an LU Factorization?

An LU factorization of an $n\times n$ matrix $A$ is a factorization $A = LU$, where $L$ is unit lower triangular and $U$ is upper triangular. “Unit” means that $L$ has ones on the diagonal. Example:

$\notag \left[\begin{array}{rrrr} 3 & -1 & 1 & 1\\ -1 & 3 & 1 & -1\\ -1 & -1 & 3 & 1\\ 1 & 1 & 1 & 3 \end{array}\right] = \left[\begin{array}{rrrr} 1 & 0 & 0 & 0\\ -\frac{1}{3} & 1 & 0 & 0\\ -\frac{1}{3} & -\frac{1}{2} & 1 & 0\\ \frac{1}{3} & \frac{1}{2} & 0 & 1 \end{array}\right] \left[\begin{array}{rrrr} 3 & -1 & 1 & 1\\ 0 & \frac{8}{3} & \frac{4}{3} & -\frac{2}{3}\\ 0 & 0 & 4 & 1\\ 0 & 0 & 0 & 3 \end{array}\right]. \qquad (1)$

An LU factorization simplifies the solution of many problems associated with linear systems. In particular, solving a linear system $Ax = b$ reduces to solving the triangular systems $Ly = b$ and $Ux = y$, since then $b = L(Ux)$.

For a given $A$, an LU factorization may or may not exist, and if it does exist it may not be unique. Conditions for existence and uniqueness are given in the following result (see Higham, 2002, Thm. 9.1 for a proof). Denote by $A_k = A(1\colon k,1\colon k)$ the leading principal submatrix of $A$ of order $k$.

Theorem 1. The matrix $A\in\mathbb{R}^{n\times n}$ has a unique LU factorization if and only if $A_k$ is nonsingular for $k=1\colon n-1$. If $A_k$ is singular for some $1\le k \le n-1$ then the factorization may exist, but if so it is not unique.

Note that the (non)singularity of $A$ plays no role in Theorem 1. However, if $A$ is nonsingular and has an LU factorization then the factorization is unique. Indeed if $A$ has LU factorizations $A = L_1U_1 = L_2U_2$ then the $U_i$ are necessarily nonsingular and so $L_2^{-1}L_1 = U_2U_1^{-1}$. The left side of this equation is unit lower triangular and the right side is upper triangular; therefore both sides must equal the identity matrix, which means that $L_1 = L_2$ and $U_1 = U_2$, as required.

Equating leading principal submatrices in $A = LU$ gives $A_k = L_k U_k$, which implies that $\det(A_k) = \det(U_k) = u_{11} u_{22} \dots u_{kk}$. Hence $u_{kk} = \det(A_k)/\det(A_{k-1})$. In fact, such determinantal formulas hold for all the elements of $L$ and $U$:

\notag \begin{aligned} \ell_{ij} &= \frac{ \det\bigl( A( [1:j-1, \, i], 1:j ) \bigr) }{ \det( A_j ) }, \quad i > j, \\ u_{ij} &= \frac{ \det\bigl( A( 1:i, [1:i-1, \, j] ) \bigr) } { \det( A_{i-1} ) }, \quad i \le j. \end{aligned}

Here, $A(u,v)$, where $u$ and $v$ are vectors of subscripts, denotes the submatrix formed from the intersection of the rows indexed by $u$ and the columns indexed by $v$.

## Relation with Gaussian Elimination

LU factorization is intimately connected with Gaussian elimination. Recall that Gaussian elimination transforms a matrix $A^{(1)} = A\in\mathbb{R}^{n\times n}$ to upper triangular form $U = A^{(n)}$ in $n-1$ stages. At the $k$th stage, multiples of row $k$ are added to the later rows to eliminate the elements below the diagonal in column $k$, using the formulas

$\notag a_{ij}^{(k+1)} = a_{ij}^{(k)} - m_{ik} a_{kj}^{(k)}, \quad i = k+1 \colon n, \; j = k+1 \colon n,$

where the quantities $m_{ik} = a_{ik}^{(k)} / a_{kk}^{(k)}$ are the multipliers. Of course each $a_{kk}^{(k)}$ must be nonzero for these formulas to be defined, and this is connected with the conditions of Theorem 1, since $u_{kk} = a_{kk}^{(k)}$. The final $U$ is the upper triangular LU factor, with $u_{ij} = a_{ij}^{(i)}$ for $j\ge i$, and $\ell_{ij} = m_{ij}$ for $i > j$, that is, the multipliers make up the $L$ factor (for a proof of these properties see any textbook on numerical linear algebra).

The matrix factorization viewpoint is well established as a powerful paradigm for thinking and computing. Separating the computation of LU factorization from its application is beneficial. For example, given $A = LU$ we saw above how to solve $Ax = b$. If we need to solve for another right-hand side $b_2$ we can just solve $Ly_2 = b_2$ and $Ux_2 = y_2$, re-using the LU factorization. Similarly, solving $A^Tz = c$ reduces to solving the triangular systems $U^T w = c$ and $L^Tz = w$.

## Computation

An LU factorization can be computed by directly solving for the components of $L$ and $U$ in the equation $A = LU$. Indeed because $L$ has unit diagonal the first row of $U$ is the same as the first row of $A$, and $a_{k1} = \ell_{k1} u_{11} = \ell_{k1} a_{11}$ then determines the first column of $L$. One can go on to determine the $k$th row of $U$ and the $k$th row of $L$, for $k = 2\colon n$. This leads to the Doolittle method, which involves inner products of partial rows of $L$ and partial columns of $U$.

Given the equivalence between LU factorization and Gaussian elimination we can also employ the Gaussian elimination equations:

$\notag \begin{array}{l} \%~kji~\mathrm{form~of~LU~factorization.}\\ \mbox{for}~k=1:n-1 \\ \qquad \mbox{for}~ j=k+1:n \\ \qquad \qquad \mbox{for}~ i=k+1:n \\ \qquad\qquad\qquad a_{ij}^{(k+1)} = a_{ij}^{(k)} - a_{ik}^{(k)}a_{kj}^{(k)} / a_{kk}^{(k)}\\ \qquad\qquad\mbox{end}\\ \qquad\mbox{end}\\ \mbox{end}\\ \end{array}$

This $kji$ ordering of the loops in the factorization is the basis of early Fortran implementations of LU factorization, such as that in LINPACK. The inner loop travels down the columns of $A$, accessing contiguous elements of $A$ since Fortran stores arrays by column. Interchanging the two inner loops gives the $kij$ ordering, which updates the matrix a row at a time, and is appropriate for a language such as C that stores arrays by row.

The $ijk$ and $jik$ orderings correspond to the Doolittle method. The last two of the $3! = 6$ orderings are the $ikj$ and $jki$ orderings, to which we will return later.

## Schur Complements

For $A\in\mathbb{R}^{n\times n}$ with $\alpha = a_{11} \ne 0$ we can write

$\notag A = \begin{bmatrix} \alpha & b^T \\ c & D \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ c/\alpha & I_{n-1} \end{bmatrix} \begin{bmatrix} \alpha & b^T \\ 0 & D - cb^T/\alpha \end{bmatrix} = : L_1U_1. \qquad (2)$

The $(n-1)\times (n-1)$ matrix $S = D - cb^T/\alpha$ is called the Schur complement of $\alpha$ in $A$.

The first row and column of $L_1$ and $U_1$ have the correct forms for a unit lower triangular matrix and an upper triangular matrix, respectively. If we can find an LU factorization $S = L_2U_2$ then

$\notag A = \begin{bmatrix} 1 & 0 \\ c/\alpha & L_2 \end{bmatrix} \begin{bmatrix} \alpha & b^T \\ 0 & U_2 \end{bmatrix}$

is an LU factorization of $A$. Note that this is simply another way to express the $kji$ algorithm above.

For several matrix structures it is immediate that $\alpha \ne 0$. If we can show that the Schur complement inherits the same structure then it follows by induction that we can compute the factorization for $S$, and so an LU factorization of $A$ exists. Classes of matrix for which $a_{11} \ne 0$ and $A$ being in the class implies the Schur complement $S$ is also in the class include

• symmetric positive definite matrices,
• $M$-matrices,
• matrices (block) diagonally dominant by rows or columns.

(The proofs of these properties are nontrivial.) Note that the matrix (1) is row diagonally dominant, as is its $U$ factor, as must be the case since its rows are contained in Schur complements.

We say that $A$ has upper bandwidth $q$ if $a_{ij} = 0$ for $j>i+q$ and lower bandwidth $p$ if $a_{ij} = 0$ for $i>j+p$. Another use of (2) is to show that $L$ and $U$ inherit the bandwidths of $A$.

Theorem 2. Let $A\in\mathbb{R}^{n\times n}$ have lower bandwidth $p$ and upper bandwidth $q$. If $A$ has an LU factorization then $L$ has lower bandwidth $p$ and $U$ has upper bandwidth $q$.

Proof. In (2), the first column of $L_1$ and the first row of $U_1$ have the required structure and $S$ has upper bandwidth $q$ and lower bandwidth $p$, since $c$ and $b$ have only $p$ and $q$ nonzero components, respectively. The result follows by induction.

## Block Implementations

In order to achieve high performance on modern computers with their hierarchical memories, LU factorization is implemented in a block form expressed in terms of matrix multiplication and the solution of multiple right-hand side triangular systems. We describe two block forms of LU factorization. First, consider a block form of (2) with block size $p$, where $A_{11}$ is $p \times p$:

$\notag A = \begin{bmatrix} A_{11} & A_{12}\\ A_{21} & A_{22} \end{bmatrix} = \begin{bmatrix} L_{11} & 0 \\ L_{21}& I_{n-p} \end{bmatrix} \begin{bmatrix} U_{11} & U_{12} \\ 0 & S \end{bmatrix}.$

Here, $S$ is the Schur complement of $A_{11}$ in $A$, given by $S = A_{22} - A_{21}A_{11}^{-1}A_{12}$. This leads to the following algorithm:

1. Factor $A_{11} = L_{11}U_{11}$.
2. Solve $L_{11}U_{12} = A_{12}$ for $U_{12}$.
3. Solve $L_{21}U_{11} = A_{21}$ for $L_{21}$.
4. Form $S = A_{22}-L_{21}U_{12}$.
5. Repeat steps 1â€“4 on $S$ to obtain $S = L_{22}U_{22}$.

The factorization on step 1 can be done by any form of LU factorization. This algorithm is known as a right-looking algorithm, since it accesses data to the right of the block being worked on (in particular, at each stage lines 2 and 4 access the last few columns of the matrix).

An alternative algorithm can derived by considering a block $3\times 3$ partitioning, in which we assume we have already computed the first block column of $L$ and $U$:

$\notag A = \begin{bmatrix} A_{11} & A_{12} & A_{13}\\ A_{21} & A_{22} & A_{23}\\ A_{31} & A_{32} & A_{33} \end{bmatrix} = \begin{bmatrix} L_{11} & 0 & 0 \\ L_{21} & L_{22}& 0 \\ L_{31} & L_{32} & I \end{bmatrix} \begin{bmatrix} U_{11} & U_{12} & \times \\ 0 & U_{22} & \times \\ 0 & 0 & \times \end{bmatrix}.$

We now compute the middle block column of $L$ and $U$, comprising $p$ columns, by

1. Solve $L_{11}U_{12} = A_{12}$ for $U_{12}$.
2. Factor $A_{22}-L_{21}U_{12} = L_{22}U_{22}$.
3. Solve $L_{32}U_{22} = A_{32} - L_{31}U_{12}$ for $L_{32}$.
4. Repartition so that the first two block columns become a single block column and repeat steps 1â€“4.

This algorithm corresponds to the $jki$ ordering. Note that the Schur complement is updated only a block column at a time. Because the algorithm accesses data only to the left of the block column being worked on, it is known as a left-looking algorithm.

Our description of these block algorithms emphasizes the mathematical ideas. The implementation details, especially for the left-looking algorithm, are not trivial. The optimal choice of block size $p$ will depend on the machine, but $p$ is typically in the range $64$$512$.

An important point is that all these different forms of LU factorization, no matter which $ijk$ ordering or which value of $p$, carry out the same operations. The only difference is the order in which the operations are performed (and the order in which the data is accessed). Even the rounding errors are the same for all versions (assuming the use of “plain vanilla” floating-point arithmetic).

## Rectangular Matrices

Although it is most commonly used for square matrices, LU factorization is defined for rectangular matrices, too. If $A\in\mathbb{R}^{m\times n}$ then the factorization has the form $A = LU$ with $L\in\mathbb{R}^{m\times m}$ lower triangular and $U\in\mathbb{R}^{m\times n}$ upper trapezoidal. The conditions for existence and uniqueness of an LU factorization of $A$ are the same as those for $A(1\colon p, 1\colon p)$, where $p = \min(m,n)$.

## Block LU Factorization

Another form of LU factorization relaxes the structure of $L$ and $U$ from triangular to block triangular, with $L$ having identity matrices on the diagonal:

$\notag L = \begin{bmatrix} I & & & \\ L_{21} & I & & \\ \vdots & & \ddots & \\ L_{m1} & \dots & L_{m,m-1} & I \end{bmatrix}, \quad U = \begin{bmatrix} U_{11} & U_{12} & \dots & U_{1m} \\ & U_{22} & & \vdots \\ & & \ddots & U_{m-1,m} \\ & & & U_{mm} \end{bmatrix}.$

Note that $U$ is not, in general, upper triangular.

An example of a block LU factorization is

$\notag A = \left[ \begin{array}{rr|rr} 0 & 1 & 1 & 1 \\ -1 & 1 & 1 & 1 \\\hline -2 & 3 & 4 & 2 \\ -1 & 2 & 1 & 3 \\ \end{array} \right] = \left[ \begin{array}{cc|cc} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\\hline 1 & 2 & 1 & 0 \\ 1 & 1 & 0 & 1 \\ \end{array} \right] \left[ \begin{array}{rr|rr} 0 & 1 & 1 & 1 \\ -1 & 1 & 1 & 1 \\\hline 0 & 0 & 1 & -1 \\ 0 & 0 & -1 & 1 \\ \end{array} \right].$

LU factorization fails on $A$ because of the zero $(1,1)$ pivot. This block LU factorization corresponds to using the leading $2\times 2$ principal submatrix of $A$ to eliminate the elements in the $(3\colon 4,1\colon 2)$ submatrix. In the context of a linear system $Ax=b$, we have effectively solved for the variables $x_1$ and $x_2$ in terms of $x_3$ and $x_4$ and then substituted for $x_1$ and $x_2$ in the last two equations.

Conditions for the existence of a block LU factorization are analogous to, but less stringent than, those for LU factorization in Theorem 1.

Theorem 3. The matrix $A\in\mathbb{R}^{n\times n}$ has a unique block LU factorization if and only if the first $m-1$ leading principal block submatrices of $A$ are nonsingular.

The conditions in Theorem 3 can be shown to be satisfied if $A$ is block diagonally dominant by rows or columns.

Note that to solve a linear system $Ax = b$ using a block LU factorization we need to solve $Ly = b$ and $Ux = y$, but the latter system is not triangular and requires the solution of systems $U_{ii}x_i = y_i$ involving the diagonal blocks of $U$, which would normally be done by (standard) LU factorization.

## Sensitivity

If $A$ has a unique LU factorization then for a small enough perturbation $\Delta A$ an LU factorization $A + \Delta A = (L + \Delta L)(U + \Delta U)$ exists. To first order, this equation is $\Delta A = \Delta L U + L \Delta U$, which gives

$\notag L^{-1}\Delta A \mskip2mu U^{-1} = L^{-1}\Delta L + \Delta U \mskip2mu U^{-1}.$

Since $\Delta L$ is strictly lower triangular and $\Delta U$ is upper triangular, we have, to first order,

$\notag \Delta L = L \mskip 1mu \mathrm{tril}( L^{-1}\Delta A U^{-1} ), \quad \Delta U = \mathrm{triu}( L^{-1}\Delta A U^{-1} )U,$

where $\mathrm{tril}$ denotes the strictly lower triangular part and $\mathrm{triu}$ the strictly upper triangular part. Clearly, the sensitivity of the LU factors depends on the inverses of $L$ and $U$. However, in most situations, such as when we are solving a linear system $Ax = b$, it is the backward stability of the LU factors, not their sensitivity, that is relevant.

## Pivoting and Numerical Stability

Since not all matrices have an LU factorization, we need the option of applying row and column interchanges to ensure that the pivots are nonzero unless the column in question is already in triangular form.

In finite precision computation it is important that computed LU factors $\widehat L$ and $\widehat U$ are numerically stable in the sense that $\widehat L \widehat U = A + \Delta A$ with $\|\Delta A\|\le c_n u \|A\|$, where $c_n$ is a constant and $u$ is the unit roundoff. For certain matrix properties, such as diagonal dominance by rows or columns, numerical stability is guaranteed, but in general it is necessary to incorporate row interchanges, or row or column interchanges, in order to obtain a stable factorization.

See What Is the Growth Factor for Gaussian Elimination? for details of pivoting strategies and see Randsvd Matrices with Large Growth Factors for some recent research on growth factors.

## References

This is a minimal set of references, which contain further useful references within.

# What’s New in MATLAB R2021a?

In this post I discuss some of the new features in MATLAB R2021a. As usual in this series, I focus on a few of the features most relevant to my interests. See the release notes for a detailed list of the many changes in MATLAB and its toolboxes.

## Name=Value Syntax

In function calls that accept “name, value” pairs, separated by a comma, the values can now be specified with an equals sign. Example:

x = linspace(0,2*pi,100); y = tan(x);

% Existing syntax
plot(x,y,'Color','red','LineWidth',2)
plot(x,y,"Color","red","LineWidth",2)

% New syntax
plot(x,y,Color = "red",LineWidth = 2)
lw = 2; plot(x,y,Color = "red",LineWidth = lw)

Note that the string can be given as a character vector in single quotes or as a string array in double quotes (string arrays were introduced in R2016b).

There are some limitations, including that all name=value arguments must appear after any comma separated pairs and after any positional arguments (arguments that must be passed to a function in a specific order).

## Eigensystem of Skew-Symmetric Matrix

For skew-symmetric and skew-Hermitian matrices, the eig function now guarantees that the matrix of eigenvectors is unitary (to machine precision) and that the computed eigenvalues are pure imaginary. The code

rng(2); n = 5; A = gallery('randsvd',n,-1e3,2); A = 1i*A;
[V,D] = eig(A);
unitary_test = norm(V'*V-eye(n),1)
norm_real_part = norm(real(D),1)

produces

% R2020b
unitary_test =
9.6705e-01
norm_real_part =
8.3267e-17

% R2021a
unitary_test =
1.9498e-15
norm_real_part =
0

For this matrix MATLAB R2020b produces an eigenvector matrix that is far from being unitary and eigenvalues with a nonzero (but tiny) real part, whereas MATLAB R2021a produces real eigenvalues and eigenvectors that are unitary to machine precision.

## Performance Improvements

Among the reported performance improvements are faster matrix multiplication for large sparse matrices and faster solution of multiple right-hand systems with a sparse coefficient matrix, both resulting from added support for multithreading.

## Symbolic Math Toolbox

An interesting addition to the Symbolic Math Toolbox is the symmatrix class, which represents a symbolic matrix. An example of usage is

>> A = symmatrix('A',[2 2]); B = symmatrix('B',[2 2]); whos A B
Name      Size            Bytes  Class        Attributes

A         2x2                 8  symmatrix
B         2x2                 8  symmatrix

>> X = A*B, Y = symmatrix2sym(X), whos X Y
X =
A*B
Y =
[A1_1*B1_1 + A1_2*B2_1, A1_1*B1_2 + A1_2*B2_2]
[A2_1*B1_1 + A2_2*B2_1, A2_1*B1_2 + A2_2*B2_2]
Name      Size            Bytes  Class        Attributes

X         2x2                 8  symmatrix
Y         2x2                 8  sym

The range of functions that can be applied to a symmatrix is as follows:

>> methods symmatrix

Methods for class symmatrix:

cat             isempty         mpower          symmatrix2sym
conj            isequal         mrdivide        tan
cos             isequaln        mtimes          times
ctranspose      kron            norm            trace
det             latex           plus            transpose
diff            ldivide         power           uminus
disp            length          pretty          uplus
display         log             rdivide         vertcat
eq              matlabFunction  sin
exp             minus           size

Static methods:

empty

In order to invert A*B in this example, or find its eigenvalues, use inv(Y) or eig(Y).

# Fifty “What Is” Articles

Last week I posted the fiftieth in my “What Is” series of articles. I began the series just over a year ago, in March 2020. The original aim was to provide “brief descriptions of important concepts in numerical analysis and related areas, with a focus on topics that arise in my research”, and the articles were meant to be short, widely accessible, and contain a minimum of mathematical symbols, equations, and citations. I have largely kept to these aims, though for some topics there is a lot to say and I have been more lengthy.

The articles are also available in PDF form on GitHub.

Below is a list of all the “What Is” articles published at the time of writing, in alphabetical order.

If there is a topic you would like me to cover, please put it in the comments below.

# What is a Diagonally Dominant Matrix?

Matrices arising in applications often have diagonal elements that are large relative to the off-diagonal elements. In the context of a linear system this corresponds to relatively weak interactions between the different unknowns. We might expect a matrix with a large diagonal to be assured of certain properties, such as nonsingularity. However, to ensure nonsingularity it is not enough for each diagonal element to be the largest in its row. For example, the matrix

$\notag \left[\begin{array}{rrr} 3 & -1 & -2\\ -2 & 3 & -1\\ -2 & -1 & 3 \end{array}\right] \qquad (1)$

is singular because $[1~1~1]^T$ is a null vector. A useful definition of a matrix with large diagonal requires a stronger property.

A matrix $A\in\mathbb{C}^{n\times n}$ is diagonally dominant by rows if

$\notag |a_{ii}| \ge \displaystyle\sum_{j\ne i} |a_{ij}|, \quad i=1\colon n. \qquad (2)$

It is strictly diagonally dominant by rows if strict inequality holds in (2) for all $i$. $A$ is (strictly) diagonally dominant by columns if $A^T$ is (strictly) diagonally dominant by rows.

Diagonal dominance on its own is not enough to ensure nonsingularity, as the matrix (1) shows. Strict diagonal dominance does imply nonsingularity, however.

Theorem 1.

If $A\in\mathbb{C}^{n\times n}$ is strictly diagonally dominant by rows or columns then it is nonsingular.

Proof. Since $A$ is nonsingular if and only if $A^T$ is nonsingular, it suffices to consider diagonal dominance by rows. For any nonzero $x$ let $y = Ax$ and choose $k$ so that $|x_k| = \|x\|_{\infty}$. Then the $k$th equation of $y = Ax$ can be written

$\notag a_{kk}x_k = y_k - \displaystyle\sum_{j\ne k} a_{kj}x_j,$

which gives

$\notag |a_{kk}|\|x\|_{\infty} = |a_{kk}||x_k| \le |y_k| + \displaystyle\sum_{j\ne k} |a_{kj}||x_j| \le |y_k| + \|x\|_\infty \displaystyle\sum_{j\ne k} |a_{kj}|.$

Using (2), we have

$\notag |y_k| \ge \|x\|_{\infty} \Bigl(|a_{kk}| - \displaystyle\sum_{j\ne k} |a_{kj}|\Bigr) > 0. \qquad (3)$

Therefore $y\ne0$ and so $A$ is nonsingular. $~\square$

Diagonal dominance plus two further conditions is enough to ensure nonsingularity. We need the notion of irreducibility. A matrix $A\in\mathbb{R}^{n\times n}$ is irreducible if there does not exist a permutation matrix $P$ such that

$\notag P^TAP = \begin{bmatrix} A_{11} & A_{12} \\ 0 & A_{22} \end{bmatrix}$

with $A_{11}$ and $A_{22}$ square matrices. Irreducibility is equivalent to the directed graph of $A$ being strongly connected.

Theorem 2.

If $A\in\mathbb{C}^{n\times n}$ is irreducible and diagonally dominant by rows with strict inequality in $(2)$ for some $i$ then it is nonsingular.

Proof. The proof is by contradiction. Suppose there exists $x\ne 0$ such that $Ax = 0$. Define

$\notag G = \{\, j: |x_j| = \|x\|_{\infty} \,\}, \quad H = \{\, j: |x_j| < \|x\|_{\infty} \,\}.$

The $i$th equation of $Ax = 0$ can be written

$\notag a_{ii}x_i = - \displaystyle\sum_{j\ne i} a_{ij}x_j = - \displaystyle\sum_{j\in G \atop j\ne i } a_{ij}x_j - \displaystyle\sum_{j\in H \atop j\ne i } a_{ij}x_j. \qquad (4)$

Hence for $i = r\in G$,

$\notag |a_{rr}| \le \displaystyle\sum_{j\in G \atop j\ne r } |a_{rj}| + \displaystyle\sum_{j\in H \atop j\ne r } |a_{rj}|\frac{|x_j|}{\|x\|_\infty}.$

The set $H$ is nonempty, because if it were empty then we would have $|x_j| = \|x\|_\infty$ for all $j$ and if there is strict inequality in $(2)$ for $i = m$, then putting $i = m$ in (4) would give $|a_{mm}| \le \sum_{j\ne m} |a_{mj}| |x_j|/|x_m| = \sum_{j\ne m} |a_{mj}|$, which is a contradiction. Hence as long as $a_{rj}\ne0$ for some $j\in H$, we obtain $|a_{rr}| < \sum_{j\ne r } |a_{rj}|$, which contradicts the diagonal dominance. Therefore we must have $a_{rj}= 0$ for all $j\in H$ and all $r\in G$. This means that all the rows indexed by $G$ have zeros in the columns indexed by $H$, which means that $A$ is reducible. This is a contradiction, so $A$ must be nonsingular. $~\square$

The obvious analogue of Theorem 2 holds for column diagonal dominance.

As an example, the $n\times n$ symmetric tridiagonal matrix (minus the second difference matrix)

$\notag T_n = \left[\begin{array}{@{\mskip 5mu}c*{4}{@{\mskip 15mu} r}@{\mskip 5mu}} 2 & -1 & & & \\ -1 & 2 & -1 & & \\ & -1 & 2 & \ddots & \\ & & \ddots & \ddots & -1\\ & & & -1 & 2 \end{array}\right], \qquad (5)$

is row diagonally dominant with strict inequality in the first and last diagonal dominance relations. It can also be shown to be irreducible and so it is nonsingular by Theorem 2. If we replace $t_{11}$ or $t_{nn}$ by $1$, then $T$ remains nonsingular by the same argument. What if we replace both $t_{11}$ and $t_{nn}$ by $1$? We can answer this question by using an observation of Strang. If we define the rectangular matrix

$\notag L_n = \begin{bmatrix} 1 & & & \\ -1 & 1 & & \\ & -1 & \ddots & \\ & & \ddots & 1 \\ & & & -1 \end{bmatrix} \in\mathbb{R}^{(n+1)\times n}$

then $T_n = L_n^T L_n$ and

$\notag \widetilde{T}_{n+1} = \begin{bmatrix} 1 &-1 & & & \\ -1 & 2 & \ddots & & \\ & \ddots & \ddots & -1 & \\ & & -1 & 2 & -1\\ & & & -1 & 1 \end{bmatrix} = L_n L_n^T \in \mathbb{R}^{(n+1) \times (n+1)}.$

Since in general $AB$ and $BA$ have the same nonzero eigenvalues, we conclude that $\Lambda(\widetilde{T}_{n+1}) = \Lambda(T_n) \cup \{0\}$, where $\Lambda(\cdot)$ denotes the spectrum. Hence $T_n$ is symmetric positive definite and $\widetilde{T}_n$ is singular and symmetric positive semidefinite.

## Relation to Gershgorin’s Theorem

Theorem 1 can be used to obtain information about the location of the eigenvalues of a matrix. Indeed if $\lambda$ is an eigenvalue of $A$ then $A - \lambda I$ is singular and hence cannot be strictly diagonally dominant, by Theorem 1. So $|a_{ii}-\lambda| > \sum_{j\ne i} |a_{ij}|$ cannot be true for all $i$. Gershgorin’s theorem is simply a restatement of this fact.

Theorem 3 (Gershgorin’s theorem).

The eigenvalues of $A\in\mathbb{C}^{n\times n}$ lie in the union of the $n$ discs in the complex plane

$\notag D_i = \Big\{ z\in\mathbb{C}: |z-a_{ii}| \le \displaystyle\sum_{j\ne i} |a_{ij}|\Big\}, \quad i=1\colon n.$

If $A$ is symmetric with positive diagonal elements and satisfies the conditions of Theorem 1 or Theorem 2 then it is positive definite. Indeed the eigenvalues are real and so in Gershgorin’s theorem the discs are intervals and $a_{ii} - z \le |z-a_{ii}| \le \sum_{j\ne i}^n |a_{ij}|$, so $z \ge |a_{ii}| - \sum_{j\ne i}^n |a_{ij}| \ge 0$, so the eigenvalues are nonnegative, and hence positive since nonzero. This provides another proof that the matrix $T_n$ in (5) is positive definite.

## Generalized Diagonal Dominance

In some situations $A$ is not diagonally dominant but a row or column scaling of it is. For example, the matrix

$\notag A = \begin{bmatrix} 1 & 1 & 0 \\ 2/3 & 2 & 1/4 \\ 2/3 & 1/2 & 1 \end{bmatrix}$

is not diagonally dominant by rows or columns but

$\notag A \, \mathrm{diag}(3,2,4) = \begin{bmatrix} 3 & 2 & 0 \\ 2 & 4 & 1 \\ 2 & 1 & 4 \end{bmatrix}$

is strictly diagonally dominant by rows.

A matrix $A\in\mathbb{C}^{n\times n}$ is generalized diagonally dominant by rows if $AD$ is diagonally dominant by rows for some diagonal matrix $D = \mathrm{diag}(d_i)$ with $d_i > 0$ for all $i$, that is, if

$\notag |a_{ii}|d_i \ge \displaystyle\sum_{j\ne i} |a_{ij}|d_j, \quad i=1\colon n. \qquad (6)$

It is easy to see that if $A$ is irreducible and there is strictly inequality in (6) for some $i$ then $A$ is nonsingular by Theorem 2.

It can be shown that $A$ is generalized diagonally dominant by rows if and only if it is an $H$-matrix, where an $H$-matrix is a matrix for which the comparison matrix $M(A)$, defined by

$\notag M(A) = (m_{ij}), \quad m_{ij} = \begin{cases} |a_{ii}|, & i=j, \\ -|a_{ij}|, & i\ne j, \end{cases}$

is an $M$-matrix (see What Is an M-Matrix?).

## Block Diagonal Dominance

A matrix $A\in\mathbb{C}^{n\times n}$ is block diagonally dominant by rows if, for a given norm and block $m\times m$ partitioning $A = (A_{ij})$, the diagonal blocks $A_{jj}$ are all nonsingular and

$\notag \displaystyle\sum_{j\ne i} \|A_{ij}\| \le \|A_{ii}^{-1}\|^{-1}, \quad i = 1\colon m. \label{bdd}$

$A$ is block diagonally dominant by columns if $A^T$ is block diagonally dominant by rows. If the blocks are all $1\times 1$ then block diagonal dominance reduces to the usual notion of diagonal dominance. Block diagonal dominance holds for certain block tridiagonal matrices arising in the discretization of PDEs.

Analogues of Theorems 1 and 2 giving conditions under which block diagonal dominance implies nonsingularity are given by Feingold and Varga (1962).

## Bounding the Inverse

If a matrix is strictly diagonally dominant then we can bound its inverse in terms of the minimum amount of diagonal dominance. For full generality, we state the bound in terms of generalized diagonal dominance.

Theorem 4.

If $A\in\mathbb{C}^{n\times n}$ and $AD$ is strictly diagonally dominant by rows for a diagonal matrix $D = \mathrm{diag}(d_i)$ with $d_i > 0$ for all $i$, then

$\notag \|A^{-1}\|_\infty \le \displaystyle\frac{\|D\|_{\infty}}{\alpha},$

where $\alpha = \min_i (|a_{ii}|d_i - \sum_{j\ne i} |a_{ij}|d_j)$.

Proof. Assume first that $D = I$. Let $y$ satisfy $\|A^{-1}\|_{\infty} = \|A^{-1}y\|_{\infty} / \|y\|_{\infty}$ and let $x = A^{-1}y$. Applying (3) gives $\|A^{-1}\|_{\infty} = \|x\|_{\infty} / \|y\|_{\infty} \le \alpha^{-1}$. The result is obtained on applying this bound to $AD$ and using $\|A^{-1}\|_{\infty} \le \|D\|_{\infty} \|(AD)^{-1}\|_{\infty}$. $~\square$.

Another bound for $A^{-1}$ when $A$ is strictly diagonally dominant by rows can be obtained by writing $A = D(I - E)$, where $D = \mathrm{diag}(a_{ii})$, $e_{ii} = 0$, and $e_{ij} = -a_{ij}/a_{ii}$ for $i\ne j$. It is easy to see that $\|E\|_\infty < 1$, which gives another proof that $A$ is nonsingular. Then

\notag \begin{aligned} |A^{-1}| &= |(I-E)^{-1}D^{-1}| = |I + E + E^2 + \cdots | |D^{-1}|\\ &\le (I + |E| + |E|^2 + \cdots ) |D|^{-1}\\ &= (I - |E|)^{-1} |D|^{-1}\\ &= M(A)^{-1}. \end{aligned}

This bound implies that $M(A)^{-1} \ge 0$, so in view of its sign pattern $M(A)$ is an $M$-matrix, which essentially proves one direction of the $H$-matrix equivalence in the previous section. The same bound holds if $A$ is diagonally dominant by columns, by writing $A = (I-E)D$.

An upper bound also holds for block diagonal dominance.

Theorem 5.

If $A\in\mathbb{C}^{n\times n}$ is block diagonally dominant by rows then

$\notag \|A^{-1}\|_\infty \le \displaystyle\frac{1}{\alpha}.$

where $\alpha = \min_i ( \|A_{ii}^{-1}\|^{-1} - \sum_{j\ne i} \|A_{ij}\| )$.

It is interesting to note that the inverse of a strictly row diagonally dominant matrix enjoys a form of diagonal dominance, namely that the largest element in each column is on the diagonal.

Theorem 6.

If $A\in\mathbb{C}^{n\times n}$ is strictly diagonally dominant by rows then $B = A^{-1}$ satisfies $|b_{ij}| < |b_{jj}|$ for all $i\ne j$.

Proof. For $i\ne j$ we have $\sum_{k=1}^n a_{ik}b_{kj} = 0$. Let $\beta_j = \max_k |b_{kj}|$. Taking absolute values in $a_{ii}b_{ij} = -\sum_{k\ne i}a_{ik}b_{kj}$ gives

$\notag |a_{ii}||b_{ij}| \le \beta_j \sum_{k\ne i} |a_{ik}| < \beta_j |a_{ii}|,$

or $|b_{ij}| < \beta_j$, since $a_{ii} \ne 0$. This inequality holds for all $i\ne j$, so we must have $\beta_j = |b_{jj}|$, which gives the result.

## Historical Remarks

Theorems 1 and 2 have a long history and have been rediscovered many times. Theorem 1 was first stated by LÃ©vy (1881) with additional assumptions. In a short but influential paper, Taussky (1949) pointed out the recurring nature of the theorems and gave simple proofs (our proof of Theorem 2 is Taussky’s). Schneider (1977) attributes the surge in interest in matrix theory in the 1950s and 1960s to Taussky’s paper and a few others by her, Brauer, Ostrowski, and Wielandt. The history of Gershgorin’s theorem (published in 1931) is intertwined with that of Theorems 1 and 2; see Varga’s 2004 book for details.

Theorems 4 and 5 are from Varah (1975) and Theorem 6 is from Ostrowski (1952).

## References

This is a minimal set of references, which contain further useful references within.

# Bounds for the Norm of the Inverse of a Triangular Matrix

In many situations we need to estimate or bound the norm of the inverse of a matrix, for example to compute an error bound or to check whether an iterative process is guaranteed to converge. This is the same problem as bounding the condition number $\kappa(A) = \|A\| \|A^{-1}\|$, assuming $\|A\|$ is easy to compute or estimate. Here, we focus on triangular matrices. The bounds we derive can be applied to a general matrix if an LU or QR factorization is available.

We denote by $\|\cdot\|$ any matrix norm, and we take the consistency condition $\|AB\| \le \|A\| \|B\|$ as one of the defining properties of a matrix norm.

It will be useful to note that

$\notag \left[\begin{array}{crrrr} 1 & -\theta & -\theta & -\theta & -\theta\\ & 1 & -\theta & -\theta & -\theta\\ & & 1 & -\theta & -\theta\\ & & & 1 & -\theta\\ & & & & 1 \end{array}\right]^{-1} = \left[\begin{array}{ccccc} 1 & \theta & \theta(1+\theta) & \theta(1+\theta)^2 & \theta(1+\theta)^3\\ & 1 & \theta & \theta(1+\theta) & \theta(1+\theta)^2\\ & & 1 & \theta & \theta(1+\theta)\\ & & & 1 & \theta\\ & & & & 1 \end{array}\right]$

and that more generally the inverse of the $n\times n$ upper triangular matrix $T(\theta)$ with

$\notag (T(\theta))_{ij} = \begin{cases} 1, & i=j, \\ -\theta, & i

is given by

$\notag \bigl(T(\theta)^{-1}\bigr)_{ij} = \begin{cases} 1, & i=j, \\ \theta(1+\theta)^{j-i-1}, & j > i. \end{cases} \qquad (2)$

## Lower Bound

First, we consider a general matrix $A\in\mathbb{C}^{n\times n}$ and let $\lambda$ be an eigenvalue with $|\lambda| = \rho(A)$ (the spectral radius) and $x$ a corresponding eigenvector. With $X = xe^T \in\mathbb{C}^{n\times n}$, where $e$ is the vector of ones, $AX = \lambda X$, so

$\notag |\lambda| \|X\| = \|\lambda X\| = \| AX \| \le \|A\| \|X\|,$

which implies $|\lambda| \le \|A\|$ since $X\ne 0$. Hence

$\notag \|A\| \ge \rho(A).$

Let $T$ be a triangular matrix. Applying the latter bound to $T^{-1}$, whose eigenvalues are its diagonal entries $t_{ii}^{-1}$, gives

$\notag \|T^{-1}\| \ge \displaystyle\frac{1}{\min_i |t_{ii}|}. \qquad (3)$

Combining this bound with the analogous bound for $\|T\|$ gives

$\notag \kappa(T) \ge \displaystyle\frac{\max_i |t_{ii}|}{\min_i |t_{ii}|}. \qquad (4)$

We note that commonly used norms satisfy $\|A\| \ge \max_{i,j}|a_{ij}|$, which yields another proof of (3) and (4).

For any $x$ and $y$ such that $y = Tx$ we have the lower bound $\|x\| / \|y\| \le \| T^{-1} \|$. We can choose $y$ and then solve the triangular system $Tx = y$ for $x$ to obtain the lower bound. Condition number estimation techniques, which we will describe in another article, provide ways to choose $y$ that usually yield estimates of $\| T^{-1} \|$ correct to within an order of magnitude.

For the $2$-norm, we can choose $y$ and then compute $x = (T^TT)^{-k}y$ by repeated triangular solves, obtaining the lower bound $(\|x\|_2 / \|y\|_2)^{\frac{1}{2k}} \le \| T^{-1} \|_2$. This bound is simply the power method applied to $(T^TT)^{-1}$.

## Upper Bounds

Let $T\in\mathbb{C}^{n\times n}$ be an upper triangular matrix. The upper bounds for $\|T^{-1}\|$ that we will discuss depend only on the absolute values of the elements of $T$. This limits the ability of the bounds to distinguish between well-conditioned and ill-conditioned matrices. For example, consider

$\notag \begin{gathered} T_1 = \left[\begin{array}{crrrr} 1 & -2 & -2 & -2 & -2\\ & 1 & -2 & -2 & -2\\ & & 1 & -2 & -2\\ & & & 1 & -2\\ & & & & 1 \end{array}\right], \quad T_1^{-1} = \left[\begin{array}{ccccc} 1 & 2 & 6 & 18 & 54\\ & 1 & 2 & 6 & 18\\ & & 1 & 2 & 6\\ & & & 1 & 2\\ & & & & 1 \end{array}\right], \\ T_2 = \left[\begin{array}{ccccc} 1 & 2 & 2 & 2 & 2\\ & 1 & 2 & 2 & 2\\ & & 1 & 2 & 2\\ & & & 1 & 2\\ & & & & 1 \end{array}\right], \quad T_2^{-1} = \left[\begin{array}{crrrr} 1 & -2 & 2 & -2 & 2\\ & 1 & -2 & 2 & -2\\ & & 1 & -2 & 2\\ & & & 1 & -2\\ & & & & 1 \end{array}\right]. \end{gathered}$

The bounds for $T_1^{-1}$ and $T_2^{-1}$ will be the same, yet the inverses are of different sizes (the more so as the dimension increases).

Let $D = \mathrm{diag}(T)$ and write

$\notag T = D(I - N),$

where $N$ is strictly upper triangular and hence nilpotent with $N^n = 0$. Then

$\notag T^{-1} = (I + N + N^2 + \cdots + N^{n-1}) D^{-1}.$

Taking absolute values and using the triangle inequality gives

$\notag |T^{-1}| \le (I + |N| + |N|^2 + \cdots + |N|^{n-1}) |D|^{-1}, \qquad(5)$

where the inequalities hold elementwise.

The comparison matrix $M(A)$ associated with a general $A\in\mathbb{C}^{n \times n}$ is the matrix with

$\notag (M(A))_{ij} = \begin{cases} |a_{ii}|, & i=j, \\ -|a_{ij}|, & i\ne j. \end{cases}$

It is not hard to see that $M(T)$ is upper triangular with $M(T) = |D| (I - |N|)$ and so the bound (5) is

$\notag |T^{-1}| \le M(T)^{-1}.$

If we replace every element above the diagonal of $M(T)$ by the most negative off-diagonal element in its row we obtain the upper triangular matrix $W(T)$ with

$\notag (W(T))_{ij} = \begin{cases} |t_{ii}|, & i=j, \\ -\max_{i+1\le k\le n}|t_{ik}|, & i

Then $W(T) = |D| (I - |N_1|)$, where $|N| \le |N_1|$, so

\notag \begin{aligned} M(T)^{-1} &= (I + |N| + |N|^2 + \cdots + |N|^{n-1}) |D|^{-1}\\ & \le (I + |N_1| + |N_1|^2 + \cdots + |N_1|^{n-1}) |D|^{-1} = W(T)^{-1}. \end{aligned}

Finally, let $Z(T) = \min_i|t_{ii}|(I - |N_2|)$, where $N_2$ is strictly upper triangular with every element above the diagonal equal to the maximum element of $|N_1|$, that is,

$\notag (Z(T))_{ij} = \begin{cases} \alpha, & i=j, \\ -\alpha\beta, & i

Then

\notag \begin{aligned} W(T)^{-1} &= (I + |N_1| + |N_1|^2 + \cdots + |N_1|^{n-1}) |D|^{-1} \\ &\le \alpha^{-1} (I + |N_2| + |N_2|^2 + \cdots + |N_2|^{n-1}) = Z(T)^{-1}. \end{aligned}

We note that $M(T)$, $W(T)$, and $Z(T)$ are all nonsingular $M$-matrices. We summarize the bounds.

Theorem 1.

If $T\in\mathbb{C}^{n\times n}$ is a nonsingular upper triangular matrix then

$\notag |T^{-1}| \le M(T)^{-1} \le W(T)^{-1} \le Z(T)^{-1}. \qquad (6)$

We make two remarks.

• The bounds (6) are equally valid for lower triangular matrices as long as the maxima in the definitions of $W(T)$ and $Z(T)$ are taken over columns instead of rows.
• We could equally well have written $A = (I-N)D$. The comparison matrix $M(T) = (I - |N|)|D|$ is unchanged, and (6) continues to hold as long as the maxima in the definitions of $W(T)$ and $Z(T)$ are taken over columns rather than rows.

It follows from the theorem that

$\notag \|T^{-1}\| \le \|M(T)^{-1}\| \le \|W(T)^{-1}\| \le \|Z(T)^{-1}\|$

for the 1-, 2-, and $\infty$-norms and the Frobenius norm. Now $M(T)$, $W(T)$, and $Z(T)$ all have nonnegative inverses, and for a matrix $A$ with nonnegative inverse we have $\|A^{-1}\|_{\infty} = \|A^{-1}e\|_{\infty}$. Hence

\notag \begin{aligned} \|T^{-1}\|_{\infty} &\le \|M(T)^{-1}e\|_{\infty} \le \|W(T)^{-1}e\|_{\infty} \le \|Z(T)^{-1}e\|_{\infty}\\ O(n^3) \hskip10pt & \hskip35pt O(n^2) \hskip65pt O(n) \hskip65pt O(1) \end{aligned}

where the big-Oh expressions show the asymptotic cost in flops of evaluating each term by solving the relevant triangular system. As the bounds become less expensive to compute they become weaker. The quantity $\|Z(T)^{-1}\|_p$ can be explicitly evaluated for $p = \infty$, using $(2)$. It has the same value for $p = 1$, and since $\|A\|_2 \le (\|A\|_1\|A\|_{\infty})^{1/2}$ we have

$\notag \|T^{-1}\|_p \le \displaystyle\frac{ (\beta + 1)^{n-1}}{\alpha}, \quad p = 1,2,\infty. \qquad(7)$

This bound is an equality for $p = 1,\infty$ for the matrix $T(\theta)$ in (1).

For the Frobenius norm, evaluating $\|Z(T)^{-1}\|_F$, and using $\|A\|_2 \le \|A\|_F$, gives

$\notag \|T^{-1}\|_{2,F} \le \displaystyle\frac{ \bigl( (\beta + 1)^{2n} + 2n(\beta + 2) - 1 \bigr)^{1/2}} {\alpha(\beta + 2)}. \qquad(8)$

For the $2$-norm, either of (7) and (8) can be the smaller bound depending on $\beta$.

For the special case of a bidiagonal matrix $B$ it is easy to show that $|B^{-1}| = M(B)^{-1}$, and so $\|B^{-1}\|_{\infty} = \|M(B)^{-1}\|_{\infty} = \|M(B)^{-1}e\|_{\infty}$ can be computed exactly in $O(n)$ flops.

These upper bounds can be arbitrarily weak, even for a fixed $n$, as shown by the example

$\notag T(\theta) = \begin{bmatrix} \theta^{-1} & 1 & 1 \\ 0 & \theta^{-1} & \theta^{-1} \\ 0 & 0 & \theta^{-2} \end{bmatrix}, \quad \theta > 0,$

for which

$\notag T(\theta)^{-1} = \begin{bmatrix} \theta & -\theta^2 & 0 \\ 0 & \theta & -\theta^2 \\ 0 & 0 & \theta^2 \end{bmatrix}, \quad M(T(\theta))^{-1} = \begin{bmatrix} \theta & \theta^2 & 2\theta^3 \\ 0 & \theta & \theta^2 \\ 0 & 0 & \theta^2 \end{bmatrix}.$

As $\theta\to\infty$, $\|M(T(\theta))^{-1}\|_{\infty} /\|T(\theta)^{-1}\|_{\infty} \approx 2\theta$. On the other hand, the overestimation is bounded as a function of $n$ for triangular matrices resulting from certain pivoting strategies.

Theorem 1.

Suppose the upper triangular matrix $T\in\mathbb{C}^{n\times n}$ satisfies

$\notag |t_{ii}| \ge |t_{ij}|, \quad j>i. \qquad (9)$

Then, for the $1$-, $2$-, and $\infty$-norms,

$\notag \displaystyle\frac{1}{\min_i|t_{ii}|} \le \|T^{-1}\| \le \|M(T)^{-1}\| \le \|W(T)^{-1}\| \le \|Z(T)^{-1}\| \le \displaystyle\frac{2^{n-1}}{{\min_i|t_{ii}|}}.$

Proof. The first four inequalities are a combination of (3) and (6). The fifth inequality is obtained from the expression (7) for $\|Z(T)^{-1}\|$ with $\beta = 1$.

Condition (9) is satisfied for the triangular factors from QR factorization with column pivoting and for the transpose of the unit lower triangular factors from LU factorization with any form of pivoting.

The upper bounds we have described have been derived independently by several authors, as explained by Higham (2002).

# What Is a Companion Matrix?

A companion matrix $C\in\mathbb{C}^{n\times n}$ is an upper Hessenberg matrix of the form

$\notag C = \begin{bmatrix} a_{n-1} & a_{n-2} & \dots &\dots & a_0 \\ 1 & 0 & \dots &\dots & 0 \\ 0 & 1 & \ddots & & \vdots \\ \vdots & & \ddots & 0 & 0 \\ 0 & \dots & \dots & 1 & 0 \end{bmatrix}.$

Alternatively, $C$ can be transposed and permuted so that the coefficients $a_i$ appear in the first or last column or the last row. By expanding the determinant about the first row it can be seen that

$\notag \det(\lambda I - C) = \lambda^n - a_{n-1}\lambda^{n-1} - \cdots - a_1\lambda - a_0, \qquad (*)$

so the coefficients in the first row of $C$ are the coefficients of its characteristic polynomial. (Alternatively, in $\lambda I - C$ add $\lambda^{n-j}$ times the $j$th column to the last column for $j = 1:n-1$, to obtain $p(\lambda)e_1$ as the new last column, and expand the determinant about the last column.) MacDuffee (1946) introduced the term “companion matrix” as a translation from the German “Begleitmatrix”.

Setting $\lambda = 0$ in $(*)$ gives $\det(C) = (-1)^{n+1} a_0$, so $C$ is nonsingular if and only if $a_0 \ne 0$. The inverse is

$\notag C^{-1} = \begin{bmatrix} 0 & 1 & 0 &\dots& 0 \cr 0 & 0 & 1 &\ddots& 0 \cr \vdots & & \ddots & \ddots & 0\cr \vdots & & & \ddots & 1\cr \displaystyle\frac{1}{a_0} & -\displaystyle\frac{a_{n-1}}{a_0} & -\displaystyle\frac{a_{n-2}}{a_0} & \dots & -\displaystyle\frac{a_1}{a_0} \end{bmatrix}.$

Note that $P^{-1}C^{-1}P$ is in companion form, where $P = I_n(n:-1:1,:)$ is the reverse identity matrix, and the coefficients are those of the polynomial $-\lambda^n p(1/\lambda)$, whose roots are the reciprocals of those of $p$.

A companion matrix has some low rank structure. It can be expressed as a unitary matrix plus a rank-$1$ matrix:

$\notag C = \begin{bmatrix} 0 & 0 & \dots &\dots & 1 \\ 1 & 0 & \dots &\dots & 0 \\ 0 & 1 & \ddots & & 0 \\ \vdots & & \ddots & 0 & 0 \\ 0 & \dots & \dots & 1 & 0 \end{bmatrix} + e_1 \begin{bmatrix} a_{n-1} & a_{n-2} & \dots & a_0-1 \end{bmatrix}. \qquad (1)$

Also, $C^{-T}$ differs from $C$ in just the first and last columns, so $C^{-T} = C + E$, where $E$ is a rank-$2$ matrix.

If $\lambda$ is an eigenvalue of $C$ then $[\lambda^{n-1}, \lambda^{n-2}, \dots, \lambda, 1]^T$ is a corresponding eigenvector. The last $n-1$ rows of $\lambda I - C$ are clearly linearly independent for any $\lambda$, which implies that $C$ is nonderogatory, that is, no two Jordan blocks in the Jordan canonical form contain the same eigenvalue. In other words, the characteristic polynomial is the same as the minimal polynomial.

The MATLAB function compan takes as input a vector $[p_1,p_2, \dots, p_{n+1}]$ of the coefficients of a polynomial, $p_1x^n + p_2 x^{n-1} + \cdots + p_n x + p_{n+1}$, and returns the companion matrix with $a_{n-1} = -p_2/p_1$, â€¦, $a_0 = -p_{n+1}/p_1$.

Perhaps surprisingly, the singular values of $C$ have simple representations, found by Kenney and Laub (1988):

\notag \begin{aligned} \sigma_1^2 &= \displaystyle\frac{1}{2} \left( \alpha + \sqrt{\alpha^2 - 4 a_0^2} \right), \\ \sigma_i^2 &= 1, \qquad i=2\colon n-1, \\ \sigma_n^2 &= \displaystyle\frac{1}{2} \left( \alpha - \sqrt{\alpha^2 - 4 a_0^2} \right), \end{aligned}

where $\alpha = 1 + a_0^2 + \cdots + a_{n-1}^2$. These formulae generalize to block companion matrices, as shown by Higham and Tisseur (2003).

## Applications

Companion matrices arise naturally when we convert a high order difference equation or differential equation to first order. For example, consider the Fibonacci numbers $1$, $1$, $2$, $3$, $5$, $\dots$, which satisfy the recurrence $f_n = f_{n-1} + f_{n-2}$ for $n \ge 2$, with $f_0 = f_1 = 1$. We can write

$\notag \begin{bmatrix} f_n \\ f_{n-1} \end{bmatrix} = \begin{bmatrix} 1 & 1 \\ 1 & 0 \end{bmatrix} \begin{bmatrix} f_{n-1} \\ f_{n-2} \end{bmatrix} = \begin{bmatrix} 1 & 1 \\ 1 & 0 \end{bmatrix}^2 \begin{bmatrix} f_{n-2} \\ f_{n-3} \end{bmatrix} = \cdots = \begin{bmatrix} 1 & 1 \\ 1 & 0 \end{bmatrix}^{n-1} \begin{bmatrix} f_{1} \\ f_{0} \end{bmatrix},$

where $\left[\begin{smallmatrix}1 & 1 \\ 1 & 0 \end{smallmatrix}\right]$ is a companion matrix. This expression can be used to compute $f_n$ in $O(\log_2n)$ operations by computing the matrix power using binary powering.

As another example, consider the differential equation

$\notag y''' = b_2 y'' + b_1 y' + b_0 y.$

Define new variables

$z_1 = y'', \quad z_2 = y', \quad z_3 = y.$

Then

\notag \begin{aligned} z_1' &= b_2 z_1 + b_1 z_2 + b_0 z_3,\\ z_2' &= z_1\\ z_3' &= z_2, \end{aligned}

or

$\notag \begin{bmatrix} z_1 \\ z_2\\ z_3 \end{bmatrix}' = \begin{bmatrix} b_2 & b_1 & b_0 \\ 1 & 0 & 0 \\ 0 & 1 & 0 \\ \end{bmatrix} \begin{bmatrix} z_1 \\ z_2\\ z_3 \end{bmatrix},$

so the third order scalar equation has been converted into a first order system with a companion matrix as coefficient matrix.

## Computing Polynomial Roots

The MATLAB function roots takes as input a vector of the coefficients of a polynomial and returns the roots of the polynomial. It computes the eigenvalues of the companion matrix associated with the polynomial using the eig function. As Moler (1991) explained, MATLAB used this approach starting from the first version of MATLAB, but it does not take advantage of the structure of the companion matrix, requiring $O(n^3)$ flops and $O(n^2)$ storage instead of the $O(n^2)$ flops and $O(n)$ storage that should be possible given the structure of $C$. Since the early 2000s much research has aimed at deriving methods that achieve this objective, but numerically stable methods proved elusive. Finally, a backward stable algorithm requiring $O(n^2)$ flops and $O(n)$ storage was developed by Aurentz, Mach, Vandebril, and Watkins (2015). It uses the QR algorithm and exploits the unitary plus low rank structure shown in (1). Here, backward stability means that the computed roots are the eigenvalues of $C + \Delta C$ for some $\Delta C$ with $\|\Delta C\| \le c_n u \|C\|$. It is not necessarily the case that the computed roots are the exact roots of a polynomial with coefficients $a_i + \Delta a_i$ with $|\Delta a_i| \le c_n u \max_i |a_i|$ for all $i$.

## Rational Canonical Form

It is an interesting observation that

$\notag \begin{bmatrix} 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & -a_3\\ 0 & 1 & -a_3 & -a_2 \\ 1 & -a_3 & -a_2 & -a_1 \end{bmatrix} \begin{bmatrix} a_3 & a_2 & a_1 & a_0 \\ 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \end{bmatrix} = \begin{bmatrix} 0 & 0 & 1 & 0 \\ 0 & 1 & -a_3 & 0 \\ 1 & -a_3 & -a_2 & 0 \\ 0 & 0 & 0 & a_0 \\ \end{bmatrix}.$

Multiplying by the inverse of the matrix on the left we express the $4\times 4$ companion matrix as the product of two symmetric matrices. The obvious generalization of this factorization to $n\times n$ matrices shows that we can write

$\notag C = S_1S_2, \quad S_1 = S_1^T, \quad S_2= S_2^T. \qquad (2)$

We need the rational canonical form of a matrix, described in the next theorem, which Halmos (1991) calls “the deepest theorem of linear algebra”. Let $\mathbb{F}$ denote the field $\mathbb{R}$ or $\mathbb{C}$.

Theorem 1 (rational canonical form).

If $A\in\mathbb{F}^{n\times n}$ then $A = X^{-1} C X$ where $X\in\mathbb{F}^{n\times n}$ is nonsingular and $C = \mathrm{diag}(C_i)\in\mathbb{F}^{n\times n}$, with each $C_i$ a companion matrix.

The theorem says that every matrix is similar over the underlying field to a block diagonal matrix composed of companion matrices. Since we do not need it, we have omitted from the statement of the theorem the description of the $C_i$ in terms of the irreducible factors of the characteristic polynomial. Combining the factorization (2) and Theorem 1 we obtain

\notag \begin{aligned} A &= X^{-1}CX = X^{-1} S_1S_2 X \\ & = X^{-1}S_1X^{-T} \cdot X^T S_2 X \\ & \equiv \widetilde{S}_1 \widetilde{S}_2,\quad \widetilde{S}_1 = \widetilde{S}_1^T, \quad \widetilde{S}_2 = \widetilde{S}_2^T. \end{aligned}

Since $S_1$ is nonsingular, and since $S_2$ can alternatively be taken nonsingular by considering the factorization of $A^T$, this proves a theorem of Frobenius.

Theorem 2 (Frobenius, 1910).

For any $A\in\mathbb{F}^{n\times n}$ there exist symmetric $S_1,S_2\in\mathbb{F}^{n\times n}$, either one of which can be taken nonsingular, such that $A = S_1 S_2$.

Note that if $A = S_1S_2$ with the $S_i$ symmetric then $AS_1 = S_1S_2S_1 = S_1A^T = (AS_1)^T$, so $AS_1$ is symmetric. Likewise, $S_2A$ is symmetric.

## Factorization

Fiedler (2003) noted that a companion matrix can be factorized into the product of $n$ simpler factors, $n-1$ of them being the identity matrix with a $2\times 2$ block placed on the diagonal, and he used this factorization to determine a matrix $\widetilde{C}$ similar to $C$. For $n = 5$ it is

$\notag \widetilde{C} = \begin{bmatrix} a_4 & a_3 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0 & 0 \\ 0 & a_2 & 0 & a_1 & 1 \\ 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & a_0 & 0 \end{bmatrix} = \left[\begin{array}{cc|cc|c} a_4 & 1 & & & \\ 1 & 0 & & & \\\hline & & a_2 & 1 & \\ & & 1 & 0 & \\\hline & & & & a_0 \end{array}\right] \left[\begin{array}{c|cc|cc} 1 & & & & \\\hline & a_3 & 1 & & \\ & 1 & 0 & & \\\hline & & & a_1 & 1 \\ & & & 1 & 0 \end{array}\right].$

In general, Fielder’s construction yields an $n\times n$ pentadiagonal matrix $\widetilde{C}$ that is not simply a permutation similarity of $C$. The fact that $\widetilde{C}$ has block diagonal factors opens the possibility of obtaining new methods for finding the eigenvalues of $C$. This line of research has been extensively pursued in the context of polynomial eigenvalue problems (see Mackey, 2013).

## Generalizations

The companion matrix is associated with the monomial basis representation of the characteristic polynomial. Other polynomial bases can be used, notably orthogonal polynomials, and this leads to generalizations of the companion matrix having coefficients on the main diagonal and the subdiagonal and superdiagonal. Good (1961) calls the matrix resulting from the Chebyshev basis a colleague matrix. Barnett (1981) calls the matrices corresponding to orthogonal polynomials comrade matrices, and for a general polynomial basis he uses the term confederate matrices. Generalizations of the properties of companion matrices can be derived for these classes of matrices.

## Bounds for Polynomial Roots

Since the roots of a polynomial are the eigenvalues of the associated companion matrix, or a Fiedler matrix similar to it, or indeed the associated comrade matrix or confederate matrix, one can obtain bounds on the roots by applying any available bounds for matrix eigenvalues. For example, since any eigenvalue $\lambda$ of matrix $A$ satisfies $|\lambda| \le \|A\|$, by taking the $1$-norm and the $\infty$-norm of the companion matrix $C$ we find that any root $\lambda$ of the polynomial $(*)$ satisfies

\notag \begin{aligned} |\lambda| &\le \max\bigl(|a_0|, 1 + \max_{j = 1:n-1} |a_j| \bigr), \\ |\lambda| &\le \max(1, |a_{n-1}| + |a_{n-2}| + \cdots + |a_0|), \end{aligned}

either of which can be the smaller. A rich variety of such bounds is available, and these techniques extend to matrix polynomials and the corresponding block companion matrices.

## References

This is a minimal set of references, which contain further useful references within.

# What Is an M-Matrix?

An $M$-matrix is a matrix $A\in\mathbb{R}^{n \times n}$ of the form

$\notag A = sI - B, \quad \mathrm{where}~B \ge 0~\mathrm{and}~s > \rho(B). \qquad (*)$

Here, $\rho(B)$ is the spectral radius of $B$, that is, the largest modulus of any eigenvalue of $B$, and $B \ge 0$ denotes that $B$ has nonnegative entries. An $M$-matrix clearly has nonpositive off-diagonal elements. It also has positive diagonal elements, which can be shown using the result that

$\notag \rho(A) = \lim_{k\to\infty} \|A^k\|^{1/k} \qquad (\dagger)$

for any consistent matrix norm:

$\notag s > \rho(B) = \lim_{k\to\infty} \|B^k\|_{\infty}^{1/k} \ge \lim_{k\to\infty} \|\mathrm{diag}(b_{ii})^k\|_{\infty}^{1/k} = \max_i b_{ii}.$

Although the definition of an $M$-matrix does not specify $s$, we can set it to $\max_i a_{ii}$. Indeed let $s$ satisfy $(*)$ and set $\widetilde{s} = \max_i a_{ii}$ and $\widetilde{B} = \widetilde{s}I - A$. Then $\widetilde{B} \ge 0$, since $\widetilde{b}_{ii} = \widetilde{s} - a_{ii} \ge 0$ and $\widetilde{b}_{ij} = -a_{ij} = b_{ij} \ge 0$ for $i \ne j$. Furthermore, for a nonnegative matrix the spectral radius is an eigenvalue, by the Perronâ€“Frobenius theorem, so $\rho(B)$ is an eigenvalue of $B$ and $\rho(\widetilde{B)}$ is an eigenvalue of $\widetilde{B}$. Hence $\rho(\widetilde{B}) = \rho( (\widetilde{s}-s)I + B) = \widetilde{s} -s + \rho(B) < \widetilde{s}$.

The concept of $M$-matrix was introduced by Ostrowski in 1937. $M$-matrices arise in a variety of scientific settings, including in finite difference methods for PDEs, input-output analysis in economics, and Markov chains in stochastic processes.

An immediate consequence of the definition is that the eigenvalues of an $M$-matrix lie in the open right-half plane, which means that $M$-matrices are special cases of positive stable matrices. Hence an $M$-matrix is nonsingular and the determinant, being the product of the eigenvalues, is positive. Moreover, since $C = s^{-1}B$ satisfies $\rho(C) < 1$,

$\notag A^{-1} = s^{-1}(I - C)^{-1} = s^{-1}(I + C + C^2 + \cdots) \ge 0.$

In fact, nonnegativity of the inverse characterizes $M$-matrices. Define

$\notag Z_n = \{ \, A \in \mathbb{R}^{n\times n}: a_{ij} \le 0, \; i\ne j \,\}.$

Theorem 1.

A nonsingular matrix $A\in Z_n$ is an $M$-matrix if and only if $A^{-1} \ge 0$.

Sometimes an $M$-matrix is defined to be a matrix with nonpositive off-diagonal elements and a nonnegative inverse. In fact, this condition is just one of a large number of conditions equivalent to a matrix with nonpositive off-diagonal elements being an $M$-matrix, fifty of which are given in Berman and Plemmons (1994, Chap. 6).

It is easy to check from the definitions, or using Theorem 1, that a triangular matrix $T$ with positive diagonal and nonpositive off-diagonal is an $M$-matrix. An example is

$\notag T_4 = \left[\begin{array}{@{\mskip 5mu}c*{3}{@{\mskip 15mu} r}@{\mskip 5mu}} 1 & -1 & -1 & -1 \\ & 1 & -1 & -1 \\ & & 1 & -1 \\ & & & 1 \end{array}\right], \quad T_4^{-1} = \begin{bmatrix} 1 & 1 & 2 & 4\\ & 1 & 1 & 2\\ & & 1 & 1\\ & & & 1 \end{bmatrix}.$

## Bounding the Norm of the Inverse

An $M$-matrix can be constructed from any nonsingular triangular matrix by taking the comparison matrix. The comparison matrix associated with a general $B\in\mathbb{R}^{n \times n}$ is the matrix

$\notag M(B) = (m_{ij}), \quad m_{ij} = \begin{cases} |b_{ii}|, & i=j, \\ -|b_{ij}|, & i\ne j. \end{cases}$

For a nonsingular triangular $T$, $M(T)$ is an $M$-matrix, and it easy to show that

$\notag |T^{-1}| \le |M(T)^{-1}|,$

where the absolute value is taken componentwise. This bound, and weaker related bounds, can be useful for cheaply bounding the norm of the inverse of a triangular matrix. For example, with $e$ denoting the vector of ones, since $M(T)^{-1}$ is nonnegative we have

$\notag \|T^{-1}\|_{\infty} \le \|M(T)^{-1}\|_{\infty} = \|M(T)^{-1}e\|_{\infty},$

and $\|M(T)^{-1}e\|_{\infty}$ can be computed in $O(n^2)$ flops by solving a triangular system, whereas computing $T^{-1}$ costs $O(n^3)$ flops.

More generally, if we have an LU factorization $PA = LU$ of an $M$-matrix $A\in\mathbb{R}^{n \times n}$ then, since $A^{-1} \ge 0$,

$\notag \|A^{-1}\|_{\infty} = \|A^{-1}e\|_{\infty} = \|U^{-1}L^{-1}Pe\|_{\infty} = \|U^{-1}L^{-1}e\|_{\infty}.$

Therefore the norm of the inverse can be computed in $O(n^2)$ flops with two triangular solves, instead of the $O(n^3)$ flops that would be required if $A^{-1}$ were to be formed explicitly.

## Connections with Symmetric Positive Definite Matrices

There are many analogies between M-matrices and symmetric positive definite matrices. For example, every principal submatrix of a symmetric positive definite matrix is symmetric positive definite and every principal submatrix of an $M$-matrix is an $M$-matrix. Indeed if $\widetilde{B}$ is a principal submatrix of a nonnegative $B$ then $\rho(\widetilde{B}) \le \rho(B)$, which follows from $(\dagger)$ for the $\infty$-norm (for example). Hence on taking principal submatrices in $(*)$ we have $s > \rho(\widetilde{B})$ with the same $s$.

A symmetric $M$-matrix is known as a Stieltjes matrix, and such a matrix is positive definite. An example of a Stieltjes matrix is minus the second difference matrix (the tridiagonal matrix arising from a central difference discretization of a second derivative), illustrated for $n = 4$ by

$\notag A_4 = \left[\begin{array}{@{\mskip 5mu}c*{3}{@{\mskip 15mu} r}@{\mskip 5mu}} 2 & -1 & & \\ -1 & 2 & -1 & \\ & -1 & 2 & -1 \\ & & -1 & 2 \end{array}\right], \quad A_4^{-1} = \begin{bmatrix} \frac{4}{5} & \frac{3}{5} & \frac{2}{5} & \frac{1}{5}\\[\smallskipamount] \frac{3}{5} & \frac{6}{5} & \frac{4}{5} & \frac{2}{5}\\[\smallskipamount] \frac{2}{5} & \frac{4}{5} & \frac{6}{5} & \frac{3}{5}\\[\smallskipamount] \frac{1}{5} & \frac{2}{5} & \frac{3}{5} & \frac{4}{5} \end{bmatrix}.$

## LU Factorization

Since the leading principal submatrices of an $M$-matrix $A$ have positive determinant it follows that $A$ has an LU factorization with $U$ having positive diagonal elements. However, more is true, as the next result shows.

Theorem 2.

An $M$-matrix $A$ has an LU factorization $A = LU$ in which $L$ and $U$ are $M$-matrices.

Proof. We can write

$\notag A = \begin{array}[b]{@{\mskip27mu}c@{\mskip-20mu}c@{\mskip-10mu}c@{}} \scriptstyle 1 & \scriptstyle n-1 & \\ \multicolumn{2}{c}{ \left[\begin{array}{c@{~}c@{~}} \alpha & b^T \\ c^T & E \\ \end{array}\right]} & \mskip-14mu\ \begin{array}{c} \scriptstyle 1 \\ \scriptstyle n-1 \end{array} \end{array}, \quad \alpha > 0, \quad b\le 0, \quad c \le 0.$

The first stage of LU factorization is

$\notag A = \begin{bmatrix} \alpha & b^T \\ c & E \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ \alpha^{-1}c & I \end{bmatrix} \begin{bmatrix} \alpha & b^T \\ 0 & S \end{bmatrix} = L_1U_1, \quad S = E - \alpha^{-1} c\mskip1mu b^T,$

where $S$ is the Schur complement of $\alpha$ in $A$. The first column of $L_1$ and the first row of $U_1$ are of the form required for a triangular $M$-matrix. We have

$\notag A^{-1} = U_1^{-1}L_1^{-1} = \begin{bmatrix} \alpha^{-1} & -\alpha^{-1}b^TS^{-1} \\ 0 & S^{-1} \end{bmatrix} \begin{bmatrix} 1 & 0 \\ -\alpha^{-1}c & I \end{bmatrix} = \begin{bmatrix} \times & \times \\ \times & S^{-1} \end{bmatrix}.$

Since $A^{-1} \ge 0$ it follows that $S^{-1} \ge 0$. It is easy to see that $S\in Z_n$, and hence Theorem $1$ shows that $S$ is an $M$-matrix. The result follows by induction.

The question now arises of what can be said about the numerical stability of LU factorization of an $M$-matrix. To answer it we use another characterization of $M$-matrices, that $DA$ is strictly diagonally dominant by columns for some diagonal matrix $D = \mathrm{diag}(d_i)$ with $d_i>0$ for all $i$, that is,

$\notag d_j|a_{jj}| > \sum_{i\ne j} d_i |a_{ij}|, \quad j=1\colon n.$

(This condition can also be written as $d^TA > 0$ because of the sign pattern of $A$.) Matrices that are diagonally dominant by columns have the properties that an LU factorization without pivoting exists, the growth factor $\rho_n \le 2$, and partial pivoting does not require row interchanges. The effect of row scaling on LU factorization is easy to see:

$\notag A = LU \;\Rightarrow\; DA = DLD^{-1} \cdot DU \equiv \widetilde{L} \widetilde{U},$

where $\widetilde{L}$ is unit lower triangular, so that $\widetilde{L}$ and $\widetilde{U}$ are the LU factors of $DA$. It is easy to see that the growth factor bound of $2$ for a matrix diagonally dominant by columns translates into the bound

$\notag \rho_n \le 2\mskip1mu\displaystyle\frac{\max_i d_i}{\min_i d_i} \qquad(\ddagger)$

for an $M$-matrix, which was obtained by Funderlic, Neumann, and Plemmons (1982). Unfortunately, this bound can be large. Consider the matrix

$\notag A = \begin{bmatrix} \epsilon& 0& -1\\ -1& 1& -1\\ 0& 0& 1 \end{bmatrix} \in Z_3, \quad 0 < \epsilon < 1.$

We have

$\notag A^{-1} = \begin{bmatrix} \displaystyle\frac{1}{\epsilon}& 0& \displaystyle\frac{1}{\epsilon}\\[\bigskipamount] \displaystyle\frac{1}{\epsilon}& 1 & \displaystyle\frac{1 + \epsilon}{\epsilon}\\[\bigskipamount] 0& 0& 1 \end{bmatrix} \ge 0,$

so $A$ is an $M$-matrix. The $(2,3)$ element of the LU factor $U$ of $A$ is $-1 - 1/\epsilon$, which means that

$\notag \rho_3 \ge \displaystyle\frac{1}{\epsilon} + 1,$

and this lower bound can be arbitrarily large. One can verify experimentally that numerical instability is possible when $\rho_3$ is large, in that the computed LU factors have a large relative residual. We conclude that pivoting is necessary for numerical stability in LU factorization of $M$-matrices.

## Stationery Iterative Methods

A stationery iterative method for solving a linear system $Ax = b$ is based on a splitting $A = M - N$ with $M$ nonsingular, and has the form $Mx_{k+1} = Nx_k + b$. This iteration converges for all starting vectors $x_0$ if $\rho(M^{-1}N) < 1$. Much interest has focused on regular splittings, which are defined as ones for which $M^{-1}\ge 0$ and $N \ge 0$. An $M$-matrix has the important property that $\rho(M^{-1}N) < 1$ for every regular splitting, and it follows that the Jacobi iteration, the Gauss-Seidel iteration, and the successive overrelaxation (SOR) iteration (with $0 < \omega \le 1$) are all convergent for $M$-matrices.

## Matrix Square Root

The principal square root $A^{1/2}$ of an $M$-matrix $A$ is an $M$-matrix, and it is the unique such square root. An expression for $A^{1/2}$ follows from $(*)$:

\notag \begin{aligned} A^{1/2} &= s^{1/2}(I - C)^{1/2} \quad (C = s^{-1}B, ~\rho(C) < 1),\\ &= s^{1/2} \sum_{j=0}^{\infty} {\frac{1}{2} \choose j} (-C)^j. \end{aligned}

This expression does not necessarily provide the best way to compute $A^{1/2}$.

## Singular M-Matrices

The theory of $M$-matrices extends to the case where the condition on $s$ is relaxed to $s \ge \rho(B)$ in $(*)$, though the theory is more complicated and extra conditions such as irreducibility are needed for some results. Singular $M$-matrices occur in Markov chains (Berman and Plemmons, 1994, Chapter 8), for example. Let $P$ be the transition matrix of a Markov chain. Then $P$ is stochastic, that is, nonnegative with unit row sums, so $Pe = e$. A nonnegative vector $y$ with $y^Te = 1$ such that $y^T P = y^T$ is called a stationary distribution vector and is of interest for describing the properties of the Markov chain. To compute $y$ we can solve the singular system $Ay = (I - P^T)y = 0$. Clearly, $A\in Z_n$ and $\rho(P) = 1$, so $A$ is a singular $M$-matrix.

## H-Matrices

A more general concept is that of $H$-matrix: $A\in\mathbb{R}^{n \times n}$ is an $H$-matrix if the comparison matrix $M(A)$ is an $M$-matrix. Many results for $M$-matrices extend to $H$-matrices. For example, for an $H$-matrix with positive diagonal elements the principal square root exists and is the unique square root that is an $H$-matrix with positive diagonal elements. Also, the growth factor bound $(\ddagger)$ holds for any $H$-matrix for which $DA$ is diagonally dominant by columns.

## References

This is a minimal set of references, which contain further useful references within.

# Eigenvalue Inequalities for Hermitian Matrices

The eigenvalues of Hermitian matrices satisfy a wide variety of inequalities. We present some of the most useful and explain their implications. Proofs are omitted, but as Parlett (1998) notes, the proofs of the Courantâ€“Fischer, Weyl, and Cauchy results are all consequences of the elementary fact that if the sum of the dimensions of two subspaces of $\mathbb{C}^n$ exceeds $n$ then the subspaces have a nontrivial intersection.

The eigenvalues of a Hermitian matrix $A\in\mathbb{C}^{n\times n}$ are real and we order them $\lambda_n\le \lambda_{n-1} \le \cdots \le \lambda_1$. Note that in some references, such as Horn and Johnson (2013), the reverse ordering is used, with $\lambda_n$ the largest eigenvalue. When it is necessary to specify what matrix $\lambda_k$ is an eigenvalue of we write $\lambda_k(A)$: the $k$th largest eigenvalue of $A$. All the following results also hold for symmetric matrices over $\mathbb{R}^{n\times n}$.

The function $f(x) = x^*Ax/x^*x$ is the quadratic form $x^*Ax$ for $A$ evaluated on the unit sphere, since $f(x) = f(x/\|x\|_2)$. As $A$ is Hermitian it has a spectral decomposition $A = Q\Lambda Q^*$, where $Q$ is unitary and $\Lambda = \mathrm{diag}(\lambda_i)$. Then

$f(x) = \displaystyle\frac{x^*Q\Lambda Q^*x}{x^*x} = \displaystyle\frac{y^*\Lambda y}{y^*y} = \displaystyle\frac{\sum_{i=1}^{n}\lambda_i y_i^2} {\sum_{i=1}^{n}y_i^2} \quad (y = Q^*x),$

from which is it clear that

$\notag \lambda_n = \displaystyle\min_{x\ne0} \displaystyle\frac{x^*Ax}{x^*x}, \quad \lambda_1 = \displaystyle\max_{x\ne0} \displaystyle\frac{x^*Ax}{x^*x}, \qquad(*)$

with equality when $x$ is an eigenvector corresponding to $\lambda_n$ and $\lambda_1$, respectively, This characterization of the extremal eigenvalues of $A$ as the extrema of $f$ is due to Lord Rayleigh (John William Strutt), and $f(x)$ is called a Rayleigh quotient. The intermediate eigenvalues correspond to saddle points of $f$.

## Courantâ€“Fischer Theorem

The Courantâ€“Fischer theorem (1905) states that every eigenvalue of a Hermitian matrix $A\in\mathbb{C}^{n\times n}$ is the solution of both a min-max problem and a max-min problem over suitable subspaces of $\mathbb{C}^n$.

Theorem (Courantâ€“Fischer).

For a Hermitian $A\in\mathbb{C}^{n\times n}$,

\notag \begin{aligned} \lambda_k &= \min_{\dim(S)=n-k+1} \, \max_{0\ne x\in S} \frac{x^*Ax}{x^*x}\\ &= \max_{\dim(S)= k} \, \min_{0\ne x\in S} \frac{x^*Ax}{x^*x}, \quad k=1\colon n. \end{aligned}

Note that the equalities $(*)$ are special cases of these characterizations.

In general there is no useful formula for the eigenvalues of a sum $A+B$ of Hermitian matrices. However, the Courantâ€“Fischer theorem yields the upper and lower bounds

$\notag \lambda_k(A) + \lambda_n(B) \le \lambda_k(A+B) \le \lambda_k(A) + \lambda_1(B), \qquad (1)$

from which it follows that

$\notag \max_k|\lambda_k(A+B)-\lambda_k(A)| \le \max(|\lambda_n(B)|,|\lambda_1(B)|) = \|B\|_2.$

This inequality shows that the eigenvalues of a Hermitian matrix are well conditioned under perturbation. We can rewrite the inequality in the symmetric form

$\notag \max_k |\lambda_k(A)-\lambda_k(B)| \le \|A-B\|_2.$

If $B$ is positive semidefinite then (1) gives

$\notag \lambda_k(A) \le \lambda_k(A + B), \quad k = 1\colon n, \qquad (2)$

while if $B$ is positive definite then strict inequality holds for all $i$. These bounds are known as the Weyl monotonicity theorem.

## Weyl’s Inequalities

Weyl’s inequalities (1912) bound the eigenvalues of $A+B$ in terms of those of $A$ and $B$.

Theorem (Weyl).

For Hermitian $A,B\in\mathbb{C}^{n\times n}$ and $i,j = 1\colon n$,

\notag \begin{aligned} \lambda_{i+j-1}(A+B) &\le \lambda_i(A) + \lambda_j(B), \quad i+j \le n+1, \qquad (3)\\ \lambda_i(A) + \lambda_j(B) &\le \lambda_{i+j-n}(A+B). \quad i+j \ge n+1, \qquad (4) \end{aligned}

The Weyl inequalities yield much information about the effect of low rank perturbations. Consider a positive semidefinite rank-$1$ perturbation $B = zz^*$. Inequality (3) with $j = 1$ gives

$\notag \lambda_i(A+B) \le \lambda_i(A) + z^*z, \quad i = 1\colon n$

(which also follows from (1)). Inequality (3) with $j = 2$, combined with (2), gives

$\notag \lambda_{i+1}(A) \le \lambda_{i+1}(A + zz^*) \le \lambda_i(A), \quad i = 1\colon n-1. \qquad (5)$

These inequalities confine each eigenvalue of $A + zz^*$ to the interval between two adjacent eigenvalues of $A$; the eigenvalues of $A + zz^*$ are said to interlace those of $A$. The following figure illustrates the case $n = 4$, showing a possible configuration of the eigenvalues $\lambda_i$ of $A$ and $\mu_i$ of $A + zz^*$.

A specific example, in MATLAB, is

>> n = 4; eig_orig = 5:5+n-1
>> D = diag(eig_orig); eig_pert = eig(D + ones(n))'
eig_orig =
5     6     7     8
eig_pert =
5.2961e+00   6.3923e+00   7.5077e+00   1.0804e+01

Since $\mathrm{trace}(A + zz^*) = \mathrm{trace}(A) + z^*z$ and the trace is the sum of the eigenvalues, we can write

$\notag \lambda_i(A + zz^*) = \lambda_i(A) + \theta_i z^*z,$

where the $\theta_i$ are nonnegative and sum to $1$. If we greatly increase $z^*z$, the norm of the perturbation, then most of the increase in the eigenvalues is concentrated in the largest, since (5) bounds how much the smaller eigenvalues can change:

>> eig_pert = eig(D + 100*ones(n))'
eig_pert =
5.3810e+00   6.4989e+00   7.6170e+00   4.0650e+02

More generally, if $B$ has $p$ positive eigenvalues and $q$ negative eigenvalues then (3) with $j = p+1$ gives

$\notag \lambda_{i+p}(A+B) \le \lambda_i(A), \quad i = 1\colon n-p,$

while (4) with $j = n-q$ gives

$\notag \lambda_i(A) \le \lambda_{i-q}(A + B), \quad i = q+1\colon n.$

So the inertia of $B$ (the number of negative, zero, and positive eigenvalues) determines how far the eigenvalues can move as measured relative to the indexes of the eigenvalues of $A$.

An important implication of the last two inequalities is for the case $A = I$, for which we have

\notag \begin{aligned} \lambda_{i+p}(I+B) &\le 1, \quad i = 1 \colon n-p, \\ \lambda_{i-q}(I+B) &\ge 1, \quad i = q+1 \colon n. \end{aligned}

Exactly $p+q$ eigenvalues appear in one of these inequalities and $n-(p+q)$ appear in both. Therefore $n - (p+q)$ of the eigenvalues are equal to $1$ and so only $\mathrm{rank}(B) = p+q$ eigenvalues can differ from $1$. So perturbing the identity matrix by a Hermitian matrix of rank $r$ changes at most $r$ of the eigenvalues. (In fact, it changes exactly $r$ eigenvalues, as can be seen from a spectral decomposition.)

Finally, if $B$ has rank $r$ then $\lambda_{r+1}(B) \le 0$ and $\lambda_{n-r}(B) \ge 0$ and so taking $j = r+1$ in (3) and $j = n-r$ in (4) gives

\notag \begin{aligned} \lambda_{i+r}(A+B) &\le \lambda_i(A), ~~\qquad\qquad i = 1\colon n-r, \\ \lambda_i(A) &\le \lambda_{i-r}(A + B), ~~\quad i = r+1\colon n. \end{aligned}

## Cauchy Interlace Theorem

The Cauchy interlace theorem relates the eigenvalues of successive leading principal submatrices of a Hermitian matrix. We denote the leading principal submatrix of $A$ of order $k$ by $A_k = A(1\colon k, 1\colon k)$.

Theorem (Cauchy).

For a Hermitian $A\in\mathbb{C}^{n\times n}$,

$\notag \lambda_{i+1}(A_{k+1}) \le \lambda_i(A_k) \le \lambda_i(A_{k+1}), \quad i = 1\colon k, \quad k=1\colon n-1.$

The theorem says that the eigenvalues of $A_k$ interlace those of $A_{k+1}$ for all $k$. Two immediate implications are that (a) if $A$ is Hermitian positive definite then so are all its leading principal submatrices and (b) appending a row and a column to a Hermitian matrix does not decrease the largest eigenvalue or increase the smallest eigenvalue.

Since eigenvalues are unchanged under symmetric permutations of the matrix, the theorem can be reformulated to say that the eigenvalues of any principal submatrix of order $n-1$ interlace those of $A$. A generalization to principal submatrices of order $n-\ell$ is given in the next result.

Theorem.

If $B$ is a principal submatrix of order $n-\ell$ of a Hermitian $A\in\mathbb{C}^{n\times n}$ then

$\notag \lambda_{i+\ell}(A) \le \lambda_i(B) \le \lambda_i(A), \quad i=1\colon n-\ell.$

## Majorization Results

It follows by taking $x$ to be a unit vector $e_i$ in the formula $\lambda_1 = \max_{x\ne0} x^*Ax/(x^*x)$ that $\lambda_1 \ge a_{ii}$ for all $i$. And of course the trace of $A$ is the sum of the eigenvalues: $\sum_{i=1}^n a_{ii} = \sum_{i=1}^n \lambda_i$. These relations are the first and last in a sequence of inequalities relating sums of eigenvalues to sums of diagonal elements obtained by Schur in 1923.

Theorem (Schur).

For a Hermitian $A\in\mathbb{C}^{n\times n}$,

$\notag \displaystyle\sum_{i=1}^k \lambda_i \ge \displaystyle\sum_{i=1}^k \widetilde{a}_{ii}, \quad k=1\colon n,$

where $\{\widetilde{a}_{ii}\}$ is the set of diagonal elements of $A$ arranged in decreasing order: $\widetilde{a}_{11} \ge \cdots \ge \widetilde{a}_{nn}$.

These inequalities say that the vector $[\lambda_1,\dots,\lambda_n]$ of eigenvalues majorizes the ordered vector $[\widetilde{a}_{11},\dots,\widetilde{a}_{nn}]$ of diagonal elements.

An interesting special case is a correlation matrix, a symmetric positive semidefinite matrix with unit diagonal, for which the inequalities are

$\notag \lambda_1 \ge 1, \quad \lambda_1+ \lambda_2\ge 2, \quad \dots, \quad \lambda_1+ \lambda_2 + \cdots + \lambda_{n-1} \ge n-1,$

and $\lambda_1+ \lambda_2 + \cdots + \lambda_n = n$. Here is an illustration in MATLAB.

>> n = 5; rng(1); A = gallery('randcorr',n);
>> e = sort(eig(A)','descend'), partial_sums = cumsum(e)
e =
2.2701e+00   1.3142e+00   9.5280e-01   4.6250e-01   3.6045e-04
partial_sums =
2.2701e+00   3.5843e+00   4.5371e+00   4.9996e+00   5.0000e+00

Ky Fan (1949) proved a majorization relation between the eigenvalues of $A$, $B$, and $A+B$:

$\notag \displaystyle\sum_{i=1}^k \lambda_i(A+B) \le \displaystyle\sum_{i=1}^k \lambda_i(A) + \displaystyle\sum_{i=1}^k \lambda_i(B), \quad k = 1\colon n.$

For $k = 1$, the inequality is the same as the upper bound of (1), and for $k = n$ it is an equality: $\mathrm{trace}(A+B) = \mathrm{trace}(A) + \mathrm{trace}(B)$.

## Ostrowski’s Theorem

For a Hermitian $A$ and a nonsingular $X$, the transformation $A\to X^*AX$ is a congruence transformation. Sylvester’s law of inertia says that congruence transformations preserve the inertia. A result of Ostrowski (1959) goes further by providing bounds on the ratios of the eigenvalues of the original and transformed matrices.

Theorem (Ostrowski).

For a Hermitian $A\in \mathbb{C}^{n\times n}$ and $X\in\mathbb{C}^{n\times n}$,

$\lambda_k(X^*AX) = \theta_k \lambda_k(A), \quad k=1\colon n,$

where $\lambda_n(X^*X) \le \theta_k \le \lambda_1(X^*X)$.

If $X$ is unitary then $X^*X = I$ and so Ostrowski’s theorem reduces to the fact that a congruence with a unitary matrix is a similarity transformation and so preserves eigenvalues. The theorem shows that the further $X$ is from being unitary the greater the potential change in the eigenvalues.

Ostrowski’s theorem can be generalized to the situation where $X$ is rectangular (Higham and Cheng, 1998).

## Interrelations

The results we have described are strongly interrelated. For example, the Courantâ€“Fischer theorem and the Cauchy interlacing theorem can be derived from each other, and Ostrowski’s theorem can be proved using the Courantâ€“Fischer Theorem.

# SIAM CSE21 Minisymposium on Reduced Precision Arithmetic and Stochastic Rounding

This minisymposium took place at the SIAM Conference on Computational Science and Engineering, March 2, 2021. This page makes available slides from some of the talks.

Minisymposium description: Reduced precision floating-point arithmetic, such as IEEE half precision and bfloat16, is increasingly available in hardware. Low precision computations promise major increases in speed and reductions in data communication costs, but they also bring an increased risk of overflow, underflow, and loss of accuracy. One way to improve the results of low precision computations is to use stochastic rounding instead of round to nearest, and this is proving popular in machine learning. This minisymposium will discuss recent advances in exploitation and analysis of reduced precision arithmetic and stochastic rounding.

Algorithms for Stochastically Rounded Elementary Arithmetic Operations in IEEE 754 Floating-Point Arithmetic Massimiliano Fasi, Ã–rebro University, Sweden; Mantas Mikaitis, University of Manchester, United Kingdom. Abstract. Slides.

Reduced Precision Elementary Functions Jean-Michel Muller, ENS Lyon, France. Abstract. Slides.

Effect of Reduced Precision and Stochastic Rounding in the Numerical Solution of Parabolic Equations Matteo Croci and Michael B. Giles, University of Oxford, United Kingdom. Abstract. Slides.

Stochastic Rounding and its Probabilistic Backward Error Analysis Michael P. Connolly and Nicholas J. Higham, University of Manchester, United Kingdom; Theo Mary, Sorbonne UniversitÃ©s and CNRS, France. Abstract. Slides.

Stochastic Rounding in Weather and Climate Models Milan Kloewer, Edmund Paxton, and Matthew Chantry, University of Oxford, United Kingdom Abstract. Slides.