Bounds for the Matrix Condition Number

We present a selection of bounds for the condition number $\kappa(A) = \|A\| \|A^{-1}\|$ of a nonsingular matrix $A\in\mathbb{C}^{n\times n}$ in terms of quantities that might be known or can be estimated.

General Matrices

From the inequality $\|A\| \ge \rho(A)$ , for any matrix norm, where $\rho(A)$ is the spectral radius (the largest magnitude of any eigenvalue of $A$ ) we have

$\notag \kappa(A) \ge \rho(A) \rho(A^{-1}). \qquad (1)$

Fir the $2$ -norm, this bound is an equality for a normal matrix (one for which $A^*A = AA^*$ ), but it can be arbitrarily weak for nonnormal matrices.

Guggenheimer, Edelman, and Johnson (1995) obtain the bound

$\notag \kappa_2(A) < \displaystyle\frac{2}{|\det(A)|} \left( \frac{\|A\|_F}{n^{1/2}} \right)^n. \qquad (2)$

The proof of the bound applies the arithmetic–geometric mean inequality to the $n$ numbers $\sigma_1^2/2, \sigma_1^2/2, \sigma_2^2, \sigma_3^2, \dots, \sigma_{n-1}^2$ , where the $\sigma_i$ are the singular values of $A$ . This bound can be arbitrarily weak but it is an approximate equality when $\sigma_1,\sigma_2, \dots \sigma_{n-1}$ are of similar order of magnitude.

Merikoski, Urpala, Virtanen, Tam, and Uhlig (1997) obtain the bound

$\notag \kappa_2(A) \le \left(\displaystyle\frac{1+x}{1-x}\right)^{1/2}, \quad x = \sqrt{1 - (n/\|A\|_F^2)^n |\det(A)|^2 }. \qquad (3)$

Their proof uses a more refined application of the arithmetic–geometric mean inequality, and they show that this bound is the smallest that can be obtained based on $\|A\|_F$ , $\det(A)$ , and $n$ only. Hence (3) is no larger than (2), and they show that it can be smaller by no more than 1.5. Equality holds in (3) if and only if $\sigma_2 = \sigma_3 = \cdots = \sigma_{n-1} = (\sigma_1 + \sigma_n)/2$ .

As an example, for three random $25\times 25$ matrices with $\kappa_2(A) = 10$ , generated by gallery('randsvd') with three different singular value dsitributions:

Mode	(2)	(3)
One large singular value	9.88e+07	9.88e+07
One small singular value	1.21e+01	1.20e+01
Geometrically distributed singular values	5.71e+04	5.71e+04

We note that for larger $\kappa_2(A)$ the formula (3) is prone to overflow, which can be avoided by evaluating it in higher precision arithmetic.

Hermitian Positive Definite Matrices

Merikoski et al. (1997) also give a version of (3) for Hermitian positive definite $A\in\mathbb{C}^{n\times n}$ :

$\kappa_2(A) \le \displaystyle\frac{1+x}{1-x}, \quad x = \sqrt{1 - (n/\mathrm{trace}(A))^n \det(A) }. \qquad (4)$

This is the smallest bound that can be obtained based on $\mathrm{trace}(A)$ , $\det(A)$ , and $n$ only. Equality holds in (4) if and only if the eigenvalues $\lambda_1 \ge \lambda_2 \ge \cdots \ge \lambda_n$ of $A$ satisfy $\lambda_2 = \lambda_3 = \cdots = \lambda_{n-1} = (\lambda_1 + \lambda_n)/2$ . We can rewrite this upper bound as

$\displaystyle\frac{1+x}{1-x} = \frac{(1+x)^2}{1-x^2} < \frac{4}{1-x^2},$

which gives the weaker bound

$\notag \kappa_2(A) < \displaystyle\frac{4}{\det(A)} \Bigl(\displaystyle\frac{\mathrm{trace}(A)}{n}\Bigr)^n. \qquad (5)$

This bound is analogous to (2) and is up to a factor $4$ larger than (4), this factor being attained for $A = I$ .

If $\mathrm{trace}(A) = n$ then (4) reduces to

$\notag \begin{aligned} \kappa_2(A) &< \displaystyle\frac{1 + \sqrt{1-\det(A)}}{1 - \sqrt{1-\det(A)}} =\displaystyle\frac{\bigl(1 + \sqrt{1-\det(A)}\,\bigr)^2}{\det(A)} \qquad(6)\\ &< \displaystyle\frac{4}{\det(A)}. \end{aligned}$

These bounds hold for any positive definite matrix with unit diagonal, that is, any nonsingular correlation matrix.

We can sometimes get a sharper bound than (4) and (5) by writing $A = DCD$ , where $D = \mathrm{diag}(a_{ii}^{1/2})$ and $c_{ii} \equiv 1$ (thus $C$ is a correlation matrix), using

$\notag \kappa_2(A) \le \kappa_2(D)^2 \kappa_2(C) = \displaystyle\frac{\max_i a_{ii}}{\min_i a_{ii}} \kappa_2(C), \qquad (7)$

and bounding $\kappa_(C)$ using (6). For example, for the $5\times 5$ Pascal matrix

$\notag P_5 = \left[\begin{array}{ccccc} 1 & 1 & 1 & 1 & 1\\ 1 & 2 & 3 & 4 & 5\\ 1 & 3 & 6 & 10 & 15\\ 1 & 4 & 10 & 20 & 35\\ 1 & 5 & 15 & 35 & 70 \end{array}\right]$

the condition number is $\kappa_1(P_5) = 8.52 \times 10^3$ . The bounds from (4) and (5) are both $1.22 \times 10^7$ , whereas combining (4) and (7) gives a bound of $4.70 \times 10^6$ .

Notes

Many other condition number bounds are available in the literature. All have their pros and cons and any bound based on limited information such as traces of powers of $A$ and the determinant will be potentially very weak.

A drawback of the bounds (3)–(6) is that they require $\det(A)$ . Sometimes the determinant is easily computable, as for a Vandermonde matrix, or can be bounded: for example, $|\det(A)| \ge 1$ for a matrix with integer entries. If a Cholesky, LU, or QR factorization of $A$ is available then $|\det(A)|$ is easily computable, but in this case a good order of magnitude estimate of the condition number can be cheaply computed using condition estimation techniques (Higham, 2002, Chapter 15).

The bounds (3) and (4) are used by Higham and Lettington (2021) in investigating the most ill conditioned $4\times 4$ symmetric matrices with integer elements bounded by $10$ ; see What Is the Wilson Matrix?

References

This is a minimal set of references, which contain further useful references within.

Heinrich Guggenheimer, Alan Edelman, and Charles Johnson, A Simple Estimate of the Condition Number of a Linear System, College Math. J. 26, 2–5, 1995.
Nicholas J. Higham, Accuracy and Stability of Numerical Algorithms, second edition, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2002.
Nicholas J. Higham and Matthew C. Lettington, Optimizing and Factorizing the Wilson Matrix, to appear in Amer. Math. Monthly, 2021.
Jorma Kaarlo Merikoski, Uoti Urpala, Ari Virtanen, Tin-Yau Tam, and Frank Uhlig, A Best Upper Bound for the 2-Norm Condition Number of a Matrix, Linear Algebra Appl. 254, 355–365, 1997.

What Is the Wilson Matrix?

The $4\times 4$ matrix

$\notag W = \begin{bmatrix} 5 & 7 & 6 & 5 \\ 7 & 10 & 8 & 7 \\ 6 & 8 & 10 & 9 \\ 5 & 7 & 9 & 10 \end{bmatrix}$

appears in a 1946 paper by Morris, in which it is described as having been “devised by Mr. T. S. Wilson.” The matrix is symmetric positive definite with determinant $1$ and inverse

$\notag W^{-1} = \begin{bmatrix} 68 & -41 & -17 & 10\\ -41 & 25 & 10 & -6\\ -17 & 10 & 5 & -3\\ 10 & -6 & -3 & 2 \end{bmatrix},$

so it is moderately ill conditioned with $\kappa_2(W) = \|W\|_2 \|W^{-1}\|_2 \approx 2.98409\times 10^3$ . This little matrix has been used as an example and for test purposes in many research papers and books over the years, in particular by John Todd, who described it as “the notorious matrix $W$ of T. S. Wilson”.

Rutishauser (1968) stated that “the famous Wilson matrix is not a very striking example of an ill-conditioned matrix”, on the basis that $\kappa_2(A)\le 40{,}000$ for a “positive definite symmetric $4\times 4$ matrix with integer elements not exceeding $10$ ” and he gave the positive definite matrix

$\notag A_0 = \begin{bmatrix} 10 & 1 & 4 & 0 \\ 1 & 10 & 5 & -1 \\ 4 & 5 & 10 & 7 \\ 0 & -1 & 7 & 9 \end{bmatrix}, \quad A_0^{-1} =\begin{bmatrix} 105& 167 & -304 & 255\\ 167 & 266 & -484 & 406\\ -304 & -484 & 881 & -739\\ 255 & 406 & -739 & 620 \end{bmatrix},$

for which $\kappa_2(A_0) = 3.57924\times 10^4$ . The matrix $A_0$ is therefore a factor 12 more ill conditioned than $W$ . Rutishauser did not give a proof of the stated bound.

Moler (2018) asked how ill-conditioned $W$ is relative to matrices in the set

$\notag \begin{aligned} \mathcal{S} &= \{\, A\in\mathbb{R}^{4\times 4}: A=A^T \mathrm{~is~nonsingular~with~integer~entries}\nonumber\\ & \hspace{2.9cm} \mathrm{between~1~and~10} \,\}. \end{aligned}$

He generated one million random matrices from $\mathcal{S}$ and found that about 0.21 percent of them had a larger condition number than $W$ . The matrix with the largest condition number was the indefinite matrix

$\notag A_1 = \begin{bmatrix} 1 & 3 & 10 & 10\\ 3 & 4 & 8 & 9\\ 10 & 8 & 3 & 9\\ 10 & 9 & 9 & 3 \end{bmatrix}, \quad A_1^{-1} = \begin{bmatrix} 573 & -804 & 159 & 25\\ -804 & 1128 & -223 & -35\\ 159 & -223 & 44 & 7\\ 25 & -35 & 7 & 1 \end{bmatrix},$

for which $\kappa_2(A_1) \approx 4.80867\times 10^4$ . How far is this matrix from being a worst case?

As the Wilson matrix is positive definite, we are also interested in how ill conditioned a matrix in the set

$\notag \begin{aligned} \mathcal{P} &= \{\, A\in\mathbb{R}^{4\times 4}: A=A^T \mathrm{~is~symmetric~positive~definite ~with~integer~entries}\nonumber\\ & \hspace{2.9cm} \mathrm{between~1~and~10} \,\} \end{aligned}$

can be.

Condition Number Bounds

We first consider bounds on $\kappa_2(A)$ for $A \in \mathcal{S}$ . It is possible to obtain a bound from first principles by using the relation $A^{-1} = \mathrm{adj}(A)/\det(A)$ , where $\mathrm{adj}(A)$ is the adjugate matrix, along with the fact that $|\det(A)| \ge 1$ since $A$ has integer entries. Higham and Lettington (2021) found that the smallest bound they could obtain came from a bound of Merikoski et al. (1997): for nonsingular $B\in\mathbb{R}^{n\times n}$ ,

$\notag \kappa_2(B) \le \left(\displaystyle\frac{1+x}{1-x}\right)^{1/2}, \quad x = \sqrt{1 - (n/\|B\|_F^2)^n |\det(B)|^2 }.$

Applying this bound to $A\in\mathcal{S}$ , using the fact that $(1+x)/(1-x)$ is monotonically increasing for $x\in(0,1)$ , gives

$\notag \kappa_2(A) \le 2.97606\dots \times 10^5 =: \beta_S, \quad A\in\mathcal{S}. \qquad (1)$

Another result from Merikoski et al. (1997) gives, for symmetric positive definite $C\in\in\mathbb{R}^{n\times n}$ ,

$\notag \kappa_2(C) \le \displaystyle\frac{1+x}{1-x}, \quad x = \sqrt{1 - (n/\mathrm{trace}(C))^n \det(C) }.$

For $A\in\mathcal{P}$ , since $\det(A) \ge 1$ we have $x \le \sqrt{1 - (1/10)^4}$ , and hence

$\notag \kappa_2(A) \le 3.99980 \times 10^4 =: \beta_P, \quad A\in\mathcal{P}. \qquad (2)$

Recall that Rutishauser’s bound is $4\times 10^4$ . The bounds (1) and (2) remain valid if we modify the definitions of $\mathcal{S}$ and $\mathcal{P}$ to allow zero elements (note that Rutishauser’s matrix $A_0$ has a zero element).

Experiment

The sets $\mathcal{S}$ and $\mathcal{P}$ are large: $\mathcal{S}$ has on the order of $10^{10}$ elements. Exhaustively searching over the sets in reasonable time is possible with a carefully optimized code. Higham and Lettington (2021) use a MATLAB code that loops over all symmetric matrices with integer elements between $1$ and $10$ and

evaluates $\det(A)$ from an explicit expression (exactly computed for such matrices) and discards $A$ if the matrix is singular;
computes the eigenvalues $\lambda_i$ of $A$ and obtains the condition number as $\kappa_2(A) = \max_i |\lambda_i|/\min_i |\lambda_i|$ (since $A$ is symmetric); and
for $\mathcal{P}$ , checks whether $A$ is positive definite by checking whether the smallest eigenvalue is positive.

The code is available at https://github.com/higham/wilson-opt.

The maximum over $\mathcal{S}$ is attained for

$\notag A_2 = \begin{bmatrix} 2 & 7 & 10 & 10\\ 7 & 10 & 10 & 9\\ 10 & 10 & 10 & 1\\ 10 & 9 & 1 & 9 \end{bmatrix}, \quad A_2^{-1} = \begin{bmatrix} 640 & -987 & 323 & 240\\ -987 & 1522 & -498 & -370\\ 323 & -498 & 163 & 121\\ 240 & -370 & 121 & 90 \end{bmatrix},$

which has $\kappa_2(A_2) \approx 7.6119 \times 10^4$ . and determinant $-1$ . The maximum over $\mathcal{P}$ is attained for

$\notag A_3 = \begin{bmatrix} 9 & 1 & 1 & 5\\ 1 & 10 & 1 & 9\\ 1 & 1 & 10 & 1\\ 5 & 9 & 1 & 10 \end{bmatrix}, \quad A_3^{-1} = \begin{bmatrix} 188 & 347 & -13 & -405\\ 347 & 641 & -24 & -748\\ -13 & -24 & 1 & 28\\ -405 & -748 & 28 & 873 \end{bmatrix}.$

which has $\kappa_2(A_3) \approx 3.5529 \times 10^4$ and determinant $1$ . Obviously, symmetric permutations of these matrices are also optimal.

The following table summarizes the condition numbers of the matrices discussed and how close they are to the bounds.

Matrix $A$	Comment	$\kappa_2(A)$	$\beta_S/\kappa_2(A)$	$\beta_P/\kappa_2(A)$
$W$	Wilson matrix	$2.98409\times 10^3$	99.73	13.40
$A_9$	Rutishauser’s matrix	$3.57924\times 10^4$	8.31	1.12
$A_1$	By random sampling	$4.80867\times 10^4$	6.19	—
$A_2$	Optimal matrices in $\mathcal{S}$	$7.61190\times 10^4$	3.91	—
$A_3$	Optimal matrices in $\mathcal{P}$	$3.55286\times 10^4$	8.38	1.13

Clearly, the bounds are reasonably sharp.

We do not know how Wilson constructed his matrix or to what extent he tried to maximize the condition number subject to the matrix entries being small integers. One possibility is that he constructed it via the factorization in the next section.

Integer Factorization

The Cholesky factor of the Wilson matrix is

$\notag R = \begin{bmatrix} \sqrt{5} & \frac{7\,\sqrt{5}}{5} & \frac{6\,\sqrt{5}}{5} & \sqrt{5}\\[\smallskipamount] 0 & \frac{\sqrt{5}}{5} & -\frac{2\,\sqrt{5}}{5} & 0\\[\smallskipamount] 0 & 0 & \sqrt{2} & \frac{3\,\sqrt{2}}{2}\\[\smallskipamount] 0 & 0 & 0 & \frac{\sqrt{2}}{2} \end{bmatrix} \quad (W = R^TR).$

Apart from the zero $(2,4)$ element, it is unremarkable. If we factor out the diagonal then we obtain the $LDL^T$ factorization, which has rational elements:

$\notag L = \begin{bmatrix}1 & 0 & 0 & 0\\ \frac{7}{5} & 1 & 0 & 0\\ \frac{6}{5} & -2 & 1 & 0\\ 1 & 0 & \frac{3}{2} & 1 \end{bmatrix}, \quad D = \begin{bmatrix}5 & 0 & 0 & 0\\ 0 & \frac{1}{5} & 0 & 0\\ 0 & 0 & 2 & 0\\ 0 & 0 & 0 & \frac{1}{2} \end{bmatrix} \quad (W = LDL^T).$

Suppose we drop the requirement of triangularity and ask whether the Wilson matrix has a factorization $W = Z^T\!Z$ with a $4\times4$ matrix $Z$ of integers. It is known that every symmetric positive definite $n\times n$ matrix $A$ of integers with determinant $1$ has a factorization $A = Z^T\!Z$ with $Z$ an $n\times n$ matrix of integers as long as $n \le 7$ , but examples are known for $n = 8$ for which the factorization does not exist. This result is mentioned by Taussky (1961) and goes back to Hermite, Minkowski, and Mordell. Higham and Lettington (2021) found the integer factor

$\notag Z_0 = \begin{bmatrix} 2 & 3 & 2 & 2\\ 1 & 1 & 2 & 1\\ 0 & 0 & 1 & 2\\ 0 & 0 & 1 & 1 \end{bmatrix}$

of $W$ , which is block upper triangular so can be thought of as a block Cholesky factor. Higham, Lettington, and Schmidt (2021) draw on recent research that links the existence of such factorizations to number-theoretic considerations of quadratic forms to show that for the existence of an integer solution $Z$ to $A = Z^TZ$ it is necessary that a certain quadratic equation in $n$ variables has an integer solution. In the case of the Wilson matrix the equation is

$2 w^2+x_1^2+x_1 x_2+x_1 x_3+x_2^2+x_2 x_3+x_3^2=952.$

The authors solve this equation computationally and find $Z_1$ and two rational factors:

$\notag Z_1=\left[ \begin{array}{cccc} \frac{1}{2} & 1 & 0 & 1 \\ \frac{3}{2} & 2 & 3 & 3 \\ \frac{1}{2} & 1 & 0 & 0 \\ \frac{3}{2} & 2 & 1 & 0 \\ \end{array} \right], \quad Z_2=\left[ \begin{array}{@{\mskip2mu}rrrr} \frac{3}{2} & 2 & 2 & 2 \\ \frac{3}{2} & 2 & 2 & 1 \\ \frac{1}{2} & 1 & 1 & 2 \\ -\frac{1}{2} & -1 & 1 & 1 \\ \end{array} \right].$

They show that these matrices are the only factors $Z\in\frac{1}{16}\mathbb{Z}$ of $W$ up to left multiplication by integer orthogonal matrices.

Conclusions

The Wilson matrix has provided sterling service throughout the digital computer era as a convenient symmetric positive definite matrix for use in textbook examples and for testing algorithms. The recent discovery of its integer factorization has led to the development of new theory on when general $n\times n$ integer matrices $A$ can be factored as $A = Z^TZ$ (when $A$ is symmetric positive definite) or $A = Z^2$ (a problem also considered in Higham, Lettington, and Schmidt (2021)), with integer $Z$ .

Olga Taussky Todd wrote in 1961 that “matrices with integral elements have been studied for a very long time and an enormous number of problems arise, both theoretical and practical.” We wonder what else can be learned from the Wilson matrix and other integer test matrices.

References

This is a minimal set of references, which contain further useful references within.

Nicholas J. Higham and Matthew C. Lettington, Optimizing and Factorizing the Wilson Matrix, to appear in Amer. Math. Monthly, 2021.
Nicholas J. Higham, Matthew C. Lettington, and Karl Michael Schmidt, Integer Matrix Factorisations, Superalgebras and the Quadratic Form Obstruction, Linear Algebra Appl. 622, 250–267, 2021.
Cleve B. Moler, Reviving Wilson’s Matrix, 2018.
H. Rutishauser, On Test Matrices, 7 (no. 165), 349-365, in: Programmation en Mathematiques Numeriques, Besancon, 1968.
Olga Taussky, Some Computational Problems Involving Integral Matrices, Journal of Research of the National Bureau of Standards Section B—Mathematical Sciences 65(1), 15–17, 1961.

What Is a Rank-Revealing Factorization?

In many applications a matrix $A\in\mathbb{R}^{m\times n}$ has less than full rank, that is, $r = \mathrm{rank}(A) < \min(m,n)$ . Sometimes, $r$ is known, and a full-rank factorization $A = GH$ with $G\in\mathbb{R}^{m \times r}$ and $H\in\mathbb{R}^{r \times n}$ , both of rank $r$ , is given—especially when $r = 1$ or $r = 2$ . Often, though, the rank $r$ is not known. Moreover, rather than being of exact rank $r$ , $A$ is merely close to a rank $r$ matrix because of errors from various possible sources.

What is usually wanted is a factorization that displays how close $A$ is to having particular ranks and provides an approximation to the range space of a lower rank matrix. The ultimate tool for providing this information is the singular value decomposition (SVD)

$\notag A = U\Sigma V^T, \quad \Sigma = \mathrm{diag}(\sigma_1,\dots, \sigma_p)\in\mathbb{R}^{m\times n},$

where $p = \min(m,n)$ , $\sigma_1\ge \sigma_2\ge \cdots \ge \sigma_p \ge 0$ , and $U\in\mathbb{R}^{m\times m}$ and $V\in\mathbb{R}^{n\times n}$ are orthogonal. The Eckart–Young theorem says that

$\notag \min_{\mathrm{rank}(B) = k} \|A-B\|_q = \begin{cases} \sigma_{k+1}, & q = 2, \\ \Bigl(\sum_{i=k+1}^r \sigma_i^2\Bigr)^{1/2}, & q = F, \end{cases}$

and that the minimum is attained at

$\notag A_k = U \Sigma_k V^T, \quad \Sigma_k = \mathrm{diag}(\sigma_1, \dots, \sigma_k, 0, \dots, 0),$

so $A_k$ is the best rank- $k$ approximation to $A$ in both the $2$ -norm and the Frobenius norm.

Although the SVD is expensive to compute, it may not be significantly more expensive than alternative factorizations. However, the SVD is expensive to update when a row or column is added to or removed from the matrix, as happens repeatedly in signal processing applications.

Many different definitions of a rank-revealing factorization have been given, and they usually depend on a particular matrix factorization. We will use the following general definition.

Definition 1. A rank-revealing factorization (RRF) of $A\in\mathbb{R}^{m\times n}$ is a factorization

$\notag A = XDY^T, \quad X\in\mathbb{R}^{m\times p}, \quad D\in\mathbb{R}^{p\times p}, \quad Y\in\mathbb{R}^{n\times p},$

where $p \le \min(m,n)$ , $D$ is diagonal and nonsingular, and $X$ and $Y$ are well conditioned.

An RRF concentrates the rank deficiency and ill condition of $A$ into the diagonal matrix $D$ . An RRF clearly exists, because the SVD is one, with $X$ and $Y$ having orthonormal columns and hence being perfectly conditioned. Justification for this definition comes from a version of Ostrowski’s theorem, which shows that

$\notag \sigma_i(A) = \theta_i \sigma_i(D), \quad i = 1\colon \min(m,n), \qquad (1)$

where $\sigma_p(X)\sigma_p(Y) \le \theta_i \le \sigma_1(X) \sigma_1(Y)$ . Hence as long as $X$ and $Y$ are well conditioned, the singular values are good order of magnitude approximations to those of $A$ up a scale factor.

Without loss of generality we can assume that

$\notag D = \mathrm{diag}(d_i), \quad |d_1| \ge |d_2| \ge \cdots \ge |d_p|$

(since $XDY^T = XP\cdot P^TDP \cdot P^T Y^T$ for any permutation matrix $P$ and the second expression is another RRF). For $\widetilde{A}_k = X \mathrm{diag}(d_1,\dots,d_k,0,\dots,0)Y^T$ we have

$\notag \|A - \widetilde{A}_k\| \le \|X\| \|Y\| \|\mathrm{diag}(0,\dots,0,d_{k+1},\dots,d_p)\|,$

so $A$ is within distance of order $|d_{k+1}|$ from the rank- $k$ matrix $\widetilde{A}_k$ , which is the same order as the distance to the nearest rank- $k$ matrix if $|d_{k+1}| \approx \sigma_{k+1}$ .

Definition 2 is a strong requirement, since it requires all the singular values of $A$ to be well approximated by the (scaled) diagonal elements of $D$ . We will investigate below how it compares with another definition of RRF.

Numerical Rank

An RRF helps to determine the numerical rank, which we now define.

Definition 2. For a given $\epsilon > 0$ the numerical rank of $A$ is the largest integer $k$ such that $\sigma_k > \epsilon$ .

By the Eckart–Young theorem, the numerical rank is the smallest rank attained over all $A+E$ with $\|E\|_2 \le \epsilon$ . For the numerical rank to be meaningful in the sense that it is unchanged if $\epsilon$ is perturbed slightly, we need $\epsilon$ not to be too close to $\sigma_k$ or $\sigma_{k+1}$ , which means that there must be a significant gap between these two singular values.

QR Factorization

One might attempt to compute an RRF by using a QR factorization $A = QR$ , where $Q\in\mathbb{R}^{m\times n}$ has orthonormal columns, $R\in\mathbb{R}^{n\times n}$ is upper triangular, and we assume that $m\ge n$ . In Definition 1, we can take

$\notag X = I, \quad D = \mathrm{diag}(R), \quad Y^T = D^{-1}R. \qquad (*)$

However, it is easy to see that QR factorization in its basic form is flawed as a means for computing an RRF. Consider the matrix

$\notag A = \begin{bmatrix} 0 & 1 & 0 &0\\ 0 & 0 & 1 &0 \\ 0 & 0 & 0 &1\\ 0 & 0 & 0 &0 \end{bmatrix},$

which is a Jordan block with zero eigenvalue. This matrix is its own QR factorization ( $R = A$ ), and the prescription $(*)$ gives $D = 0$ , so $A \ne XDY^T$ . The essential problem is that the diagonal of $R$ has no connection with the nonzero singular values of $A$ . What is needed are column permutations: $A\Pi = \mathrm{diag}(1,1,1,0)$ for the permutation matrix $\Pi$ that reorders $[a_1,a_2,a_3,a_4]$ to $[a_2,a_3,a_4,a_1]$ , and this is a perfect RRF with $X = Y = I$ .

For a less trivial example, consider the matrix

$\notag A = \left[\begin{array}{rrrr} 1 & 1 &\theta &0\\ 1 & -1 & 2 &1 \\ 1 & 0 &1+\theta &-1\\ 1 &-1 & 2 &-1 \end{array}\right], \quad \theta = 10^{-8}. \qquad (\dagger)$

Computing the QR factorization we obtain

R =
  -2.0000e+00   5.0000e-01  -2.5000e+00   5.0000e-01
            0   1.6583e+00  -1.6583e+00  -1.5076e-01
            0            0  -4.2640e-09   8.5280e-01
            0            0            0  -1.4142e+00

The $(3,3)$ element tells us that $A$ is within distance about $4\times 10^{-9}$ of being rank deficient and so has a singular value bounded above by this quantity, but it does not provide any information about the next larger singular value. Moreover, in $(*)$ , $\kappa_2(Y)$ is of order $10^{16}$ for this factorization. We need any small diagonal elements to be in the bottom right-hand corner, and to achieve this we need to introduce column permutations to move the “dependent columns” to the end.

QR Factorization With Column Pivoting

A common method for computing an RRF is QR factorization with column pivoting, which for a matrix $A\in\mathbb{R}^{m\times n}$ with $m\ge n$ computes a factorization $A\Pi = QR$ , where $\Pi$ is a permutation matrix, $Q\in\mathbb{R}^{m\times n}$ has orthonormal columns, and $R\in\mathbb{R}^{n\times n}$ is upper triangular and satisfies the inequalities

$\notag |r_{kk}|^2 \ge \displaystyle\sum_{i=k}^j |r_{ij}|^2, \quad j=k+1\colon n, \quad k=1\colon n. \qquad (2)$

In particular,

$\notag |r_{11}| \ge |r_{22}| \ge \cdots \ge |r_{nn}|. \qquad(3)$

If $|r_{kk}| \ge \epsilon \ge |r_{k+1,k+1}|$ with $\epsilon > 0$ then we can write

$\notag R = \begin{array}[b]{@{\mskip33mu}c@{\mskip-16mu}c@{\mskip-10mu}c@{}} \scriptstyle k & \scriptstyle n-k & \\ \multicolumn{2}{c}{ \left[\begin{array}{c@{~}c@{~}} R_{11}& R_{12} \\ 0 & R_{22} \\ \end{array}\right]} & \mskip-12mu\ \begin{array}{c} \scriptstyle k \\ \scriptstyle n-k \end{array} \end{array}, \qquad(4)$

with

$\notag \|R_{22}\|_2 \le \|R_{22}\|_F \le 2^{-1/2}(n-k+1)\epsilon.$

Hence $R$ is within $2$ -norm distance $2^{-1/2}(n-k+1)\epsilon$ of the rank- $k$ matrix $\left[\begin{smallmatrix} R_{11} & R_{12} \\ 0 & 0 \end{smallmatrix}\right]$ . Note that if $Q = [Q_1~Q_2]$ is partitioned conformally with $Q$ in (4) then

$\notag A\Pi = \begin{bmatrix} Q_1 & Q_2 \end{bmatrix} \begin{bmatrix} R_{11} & R_{12} \\ 0 & R_{22} \\ \end{bmatrix} = Q_1 \begin{bmatrix} R_{11} & R_{12} \end{bmatrix} + \begin{bmatrix} 0 & Q_2 R_{22} \end{bmatrix},$

so $\| A\Pi - Q_1 [R_{11}~R_{12}]\|_2 \le \|R_{22}\|_2$ , which means that $Q_1$ provides an $O(\epsilon)$ approximation to the range of $A$ .

To assess how good an RRF this factorization is (with $p = n$ ) we write it as

$\notag A = QR\Pi^T = Q D Y^T, \quad D = \mathrm{diag}(r_{ii}), \quad Y^T = D^{-1}R \Pi^T. \quad (\#)$

Applying (1) gives

$\notag \sigma_i(A) = \theta_i \sigma_i(D), \quad i = 1\colon p, \qquad (5)$

where $\sigma_n(Y)\le \theta_i \le \sigma_1(Y)$ , since $Q$ has orthonormal columns and so has unit singular values. Now $D^{-1}R$ has unit diagonal and, in view of (2), its off-diagonal elements are bounded by $1$ . Therefore $\sigma_1(Y) = \|Y\|_2 \le \|Y\|_F \le (n(n+1)/2)^{1/2}$ . On the other hand, $\sigma_p(Y)^{-1} \le 2^{n-1}$ by Theorem 1 in Bounds for the Norm of the Inverse of a Triangular Matrix. Therefore

$\notag 2^{1-n} \le \theta_i \le (n(n+1)/2)^{1/2}.$

The lower bound is an approximate equality for small $\tau$ for the triangular matrix

$\notag R_n(\theta) = \mathrm{diag}(1,s,\dots,s^{n-1}) \begin{bmatrix} 1 & -c & -c & \dots & -c \\ & 1 & -c & \dots & -c \\ & & \ddots &\ddots & \vdots \\ & & &\ddots & -c \\ & & & & 1 \end{bmatrix}, \quad c=\cos\tau, \quad s=\sin\tau,$

devised by Kahan, which is invariant under QR factorization with column pivoting. Therefore QR factorization with column pivoting is not guaranteed to reveal the rank, and indeed it can fail to do so by an exponentially large factor.

For the matrix $(\dagger)$ , QR with column pivoting reorders $A$ to $A\Pi = [a_3,~a_4,~a_2,~a_1]$ and yields

R =
  -3.0000e+00   3.3333e-01   1.3333e+00  -1.6667e+00
            0  -1.6997e+00   2.6149e-01   2.6149e-01
            0            0   1.0742e+00   1.0742e+00
            0            0            0   3.6515e-09

This $R$ suggests a numerical rank of $3$ for $\epsilon = 10^{-8}$ (say). In fact, this factorization provides a very good RRF, as in $(\#)$ we have $\kappa_2(Y) \approx 3.4$ .

QR Factorization with Other Pivoting Choices

Consider a $QR$ factorization $A\Pi = QR$ with triangular factor partitioned as

We have

$\notag \begin{aligned} \sigma_{\min}(R_{11}) &\le \sigma_k(A), \quad \qquad (7)\\ \sigma_{\max}(R_{22}) &\ge \sigma_{k+1}(A), ~\qquad (8) \end{aligned}$

where (7) is from singular value interlacing inequalities and (8) follows from the Eckart-Young theorem, since setting $R_{22}$ to zero gives a rank- $k$ matrix. Suppose $A$ has numerical rank $k$ and $\sigma_{k+1} \ll \sigma_k$ . We would like to be able to detect this situation from $R$ , so clearly we need

$\notag \sigma_{\min}(R_{11}) \approx \sigma_k(A), \quad \sigma_{\max}(R_{22}) \approx \sigma_{k+1}(A). \qquad (9)$

In view of the inequalities (7) and (8) this means that we wish to choose $\Pi$ maximize $\sigma_{\min}(R_{11})$ and minimize $\sigma_{\max}(R_{22})$ .

Some theoretical results are available on the existence of such QR factorizations. First, we give a result that shows that for $k = n-1$ the approximations in (9) can hold to within a factor $n^{1/2}$ .

Theorem 1. For $A\in\mathbb{R}^{m\times n}$ with $m\ge n$ there exists a permutation $\Pi$ such that $A$ has the QR factorization $A\Pi = QR$ with $|r_{nn}| \le n^{1/2}\sigma_n(A)$ and $\sigma_{\min}(R_{11}) \ge n^{-1/2} \sigma_{n-1}(A)$ , where $R_{11} = R(1\colon n-1, 1\colon n-1)$ .

Proof. Let $Av = \sigma_n u$ , with $\|v\|_2 = \|u\|_2 = 1$ and let $\Pi^T$ be such that $\widetilde{v} = \Pi^Tv$ satisfies $|\widetilde{v}_n| = \|\widetilde{v}\|_{\infty}$ . Then if $A\Pi = QR$ is a QR factorization,

$\notag \begin{aligned} \sigma_n = \| \sigma_n u \|_2 = \| Av \|_2 = \| QR\Pi^Tv \|_2 = \| R\mskip1mu \widetilde{v} \|_2 \ge | r_{nn} \widetilde{v}_n | \ge n^{-1/2} | r_{nn} |, \end{aligned}$

since $\|\widetilde{v}\|_2 = 1$ , which yields the result.

Next, we write $\Pi = [\Pi_1~\pi]$ , where $\pi\in\mathbb{R}^n$ , and partition

$\notag R = \begin{bmatrix} R_{11} & R_{12} \\ 0 & R_{22} \\ \end{bmatrix}$

with $R_{11}\in\mathbb{R}^{(n-1)\times (n-1)}$ . Then

$A\Pi_1 = Q \begin{bmatrix} R_{11} \\ 0 \end{bmatrix}$

implies $\sigma_{\min}(A\Pi_1) = \sigma_{\min}(R_{11})$ . On the other hand, if $A = U\Sigma V^T$ is an SVD with $U\in\mathbb{R}^{m\times n}$ , $\Sigma = \mathrm{diag}(D_1,\sigma_n)\in\mathbb{R}^{n\times n}$ , and $V = [V_1~v]$ then

$\notag A\Pi_1 = U\Sigma V^T \Pi_1 = U \begin{bmatrix} D_1 & 0 \\ 0 & \sigma_n \\ \end{bmatrix} \begin{bmatrix} V_1^T \\ v^T \end{bmatrix} \Pi_1 = U \begin{bmatrix} D_1V_1^T \\ \sigma_n v^T \end{bmatrix} \Pi_1,$

so

$\notag \sigma_{\min}(A\Pi_1) = \sigma_{\min}\left( \begin{bmatrix} D_1V_1^T \\ \sigma_n v^T \end{bmatrix} \Pi_1 \right) \ge \sigma_{\min}(D_1V^T\Pi_1) \ge \sigma_{n-1}\sigma_{\min}(V^T\Pi_1).$

Finally, we note that we can partition the orthogonal matrix $V^T\Pi_1$ as

$\notag V^T\Pi = \begin{bmatrix} V_1^T\Pi_1 & V_1^T\pi \\ v^T\Pi_1 & v^T\pi \end{bmatrix},$

and the CS decomposition implies that

$\notag \sigma_{\min}(V_1^T\Pi_1) = \sigma_{\min}(v^T\pi) = |v^T\pi| = |\widetilde{v}_n| \ge n^{-1/2}.$

Hence $\sigma_{\min}(R_{11}) \ge n^{-1/2} \sigma_{n-1}$ , as required. $~\square$

Theorem 1 is a special case of the next result of Hong and Pan (1992).

Theorem 2. For $A\in\mathbb{R}^{m\times n}$ with $m\ge n$ and any $k$ there exists a permutation matrix $\Pi$ such that $A$ has the QR factorization $A\Pi = QR$ where, with $R$ partitioned as in (6),

$\notag \sigma_{\max}(R_{22}) \le f(k,n) \sigma_{k+1}(A), \quad \sigma_{\min}(R_{11}) \ge f(k,n)^{-1} \sigma_k(A),$

where $f(k,n) = (k(n-k) + \min(k,n-k))^{1/2}$ .

The proof of Theorem 2 is constructive and chooses $\Pi$ to move a submatrix of maximal determinant of $V_2$ to the bottom of $V_2$ , where $V_2$ comprises the last $n-k$ columns of the matrix of right singular vectors.

Theorem 2 shows the existence of an RRF up to the factor $f(k,n) \le (n+1)/2$ , but it does not provide an efficient algorithm for computing one.

Much work has been done on algorithms that choose the permutation matrix $\Pi$ in a different way to column pivoting or post-process a QR factorization with column pivoting, with the aim of satisfying (9) at reasonable cost. Typically, these algorithms involve estimating singular values and singular vectors. We are not aware of any algorithm that is guaranteed to satisfy (9) and requires only $O(n^3)$ flops.

UTV Decomposition

By applying Householder transformations on the right, a QR factorization with column pivoting can be turned into a complete orthogonal decomposition of $A\in\mathbb{R}^{m\times n}$ , which has the form

$\notag A = U \begin{bmatrix} T & 0 \\ 0 & 0 \end{bmatrix} V^T, \qquad (10)$

where $T\in\mathbb{R}^{r \times r}$ is upper triangular and $U\in\mathbb{R}^{m\times m}$ and $V\in\mathbb{R}^{n\times n}$ are orthogonal. Stewart (1998) calls (6) with $T$ upper triangular or lower triangular a UTV decomposition and he defines a rank-revealing UTV decomposition of numerical rank $r$ by

$\notag \begin{aligned} A &= U \begin{bmatrix} T & F \\ 0 & G \end{bmatrix} V^T, \qquad T\in\mathbb{R}^{r \times r}, \\ & \sigma_r(T) \approx \sigma_r(A), \quad \|F\|_F^2 + \|G\|_F^2 \approx \sigma_{r+1}^2 + \cdots + \sigma_n^2. \end{aligned}$

The UTV decomposition is easy to update (when a row is added) and downdate (when a row is removed) using Givens rotations and it is suitable for parallel implementation. Initial determination of the UTV decomposition can be done by applying the updating algorithm as the rows are brought in one at a time.

LU Factorization

Instead of QR factorization we can build an RRF from an LU factorization with pivoting. For $A\in\mathbb{R}^{m\times n}$ with $m\ge n$ , let

$\notag \Pi_1 A \Pi_2 = LU = \begin{bmatrix} L_{11} & 0 \\ L_{12} & L_{22} \end{bmatrix} \begin{bmatrix} U_{11} & U_{12}\\ 0 & U_{22} \end{bmatrix},$

where $\Pi_1$ and $\Pi_2$ are permutation matrices, $L$ and $U$ are $m\times n$ lower and $n\times n$ upper triangular, respectively, and $L_{11}$ and $U_{11}$ are $k\times k$ . Analogously to (7) and (8), we always have $\sigma_{\min}(L_{11}U_{11}) \le \sigma_k(A)$ and $\sigma_{\max}(L_{22}U_{22}) \ge \sigma_{k+1}(A)$ . With a suitable pivoting strategy we can hope that $\sigma_{\min}(L_{11}U_{11}) \approx \sigma_k(A)$ and $\sigma_{\max}(L_{22}U_{22}) \approx \sigma_{k+1}(A)$ .

A result of Pan (2000) shows that an RRF based on LU factorization always exists up to a modest factor $f(k,n)$ . This is analogue for LU factorization of Theorem 2.

Theorem 3 For $A\in\mathbb{R}^{m\times n}$ with $m\ge n$ and any $k$ there exist permutation matrices $\Pi_1$ and $\Pi_2$ such that

$\notag \Pi_1 A \Pi_2 = LU = \begin{bmatrix} L_{11} & 0 \\ L_{12} & I_{m-k,n-k} \end{bmatrix} \begin{array}[b]{@{\mskip33mu}c@{\mskip-16mu}c@{\mskip-10mu}c@{}} \scriptstyle k & \scriptstyle n-k & \\ \multicolumn{2}{c}{ \left[\begin{array}{c@{~}c@{~}} U_{11}& U_{12} \\ 0 & U_{22} \\ \end{array}\right]} & \mskip-12mu\ \begin{array}{c} \scriptstyle k \\ \scriptstyle n-k \end{array} \end{array},$

where $L_{11}$ is unit lower triangular, $U_{11}$ is upper triangular, and

$\notag \sigma_{\max}(U_{22}) \le f(k,n) \sigma_{k+1}(A), \quad \sigma_{\min}(L_{11}U_{11}) \ge f(k,n)^{-1} \sigma_k(A),$

where $f(k,n) = k(n-k) + 1$ .

Again the proof is constructive, but the permutations it chooses are too expensive to compute. In practice, complete pivoting often yields a good RRF.

In terms of Definition 1, an RRF has

$\notag X = \Pi_1^TL D, \quad D = \mathrm{diag}(u_{ii}), \quad Y^T = D^{-1}U\Pi_2. \qquad (\ddagger)$

For the matrix ( $\dagger$ ), the $U$ factor for LU factorization without pivoting is

U =
   1.0000e+00   1.0000e+00   1.0000e-08            0
            0  -2.0000e+00   2.0000e+00   1.0000e+00
            0            0   5.0000e-09  -1.5000e+00
            0            0            0  -2.0000e+00

As for QR factorization without pivoting, an RRF is not obtained from $(\ddagger)$ .. However, with complete pivoting we obtain

U =
   2.0000e+00   1.0000e+00  -1.0000e+00   1.0000e+00
            0  -2.0000e+00            0            0
            0            0   1.0000e+00   1.0000e+00
            0            0            0  -5.0000e-09

which yields a very good RRF $(\ddagger)$ with $\kappa_2(X) = 3.5$ and $\kappa_2(Y) = 3.4$ .

Notes

QR factorization with column pivoting is difficult to implement efficiently, as the criterion for choosing the pivots requires the norms of the active parts of the remaining columns and this requires a significant amount of data movement. In recent years, randomized RRF algorithms have been developed that use projections with random matrices to make pivot decisions based on small sample matrices and thereby reduce the amount of data movement. See, for example, Martinsson et al. (2019).

References

This is a minimal set of references, which contain further useful references within.

Shivkumar Chandrasekaran and Ilse C. F. Ipsen, On Rank-Revealing Factorisations, SIAM J. Matrix Anal. Appl. 15 (2), 592–622, 1994
Y. P. Hong and C.-T. Pan, Rank-Revealing QR Factorizations and the Singular Value Decomposition, Math. Comp. 58 (197), 213–232, 1992.
P. G. Martinsson, G. Quintana-Orti, and N. Heavner, randUTV: A Blocked Randomized Algorithm for Computing a Rank-Revealing UTV Factorization, ACM Trans. Math. Software 45(1), 4:1–4:26, 2019.
C.-T. Pan, On the Existence and Computation of Rank-Revealing LU Factorizations, Linear Algebra Appl. 316, 199–222, 2000.
G. W. Stewart, Matrix Algorithms. Volume I: Basic Decompositions, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 1998.

My Blog Workflow

This blog, which I started in 2013, is hosted on wordpress.com. WordPress sites are both websites and blogs, and in late 2020 I translated my website over to this same domain name (nhigham.com).

The whole site is maintained as Emacs Org mode files that I export to WordPress using the Emacs org2blog package. I do not use the WordPress editor, except occasionally to copy the html for a figure back into an Org file for tweaking and re-exporting as html (usually in order to have text next to an image, as here).

WordPress supports $\LaTeX$ math mode, which must be typed within single dollar signs with the word “latex” right after the first dollar sign. Emacs Org mode supports $\LaTeX$ equations and Org2blog does an excellent job of exporting to WordPress with the necessary conversions. For example, to produce the displayed equation

$\notag X = \begin{bmatrix} a_{11} & a_{12} \\ a_{12} & a_{22} \end{bmatrix}^{-1}$

I type

\begin{equation}\notag
 X =  \begin{bmatrix}
       a_{11} & a_{12} \\ 
       a_{12} & a_{22} 
      \end{bmatrix}^{-1}
\end{equation}

in my source and this is converted into

in WordPress. (This fragment is included as an image because if I include the text directly I cannot stop WordPress interpreting it as $\LaTeX$ !)

The beauty of this workflow is that I can export this same source to html, $\LaTeX$ (then PDF), or even a Word file using Org mode’s export dispatcher. This is how I produce the PDF versions of the posts in the What Is series.

Advantages of this workflow are

I can work almost entirely in Emacs and avoid using the WordPress editor.
Org mode files are very readable and easy to edit.
Org2blog automatically uploads images into the WordPress media library and links to them.
Raw html can be included.

Drawbacks of the workflow are

$\LaTeX$ macros cannot be defined, so $\LaTeX$ commands must always be typed in full.
Displayed equations must be typed in an equation environment for reliable results.
All math is formatted in text style, so \displaystyle must be put in front of every \frac, \sum, etc. to obtain display style (so that formulas in fractions are not set in a smaller font, for example).
Inline math has awkward vertical spacing.
In the current version of org2blog, equation* (for an unnumbered equation) is not supported.

The latter three disadvantages would be avoided if the $\LaTeX$ was interpreted by MathJax, but this requires a MathJax plugin, and the Business Plan is needed to be able to install plugins (I have the Premium plan).

Here is what the Emacs Org mode source code looks like for the post What Is the Sylvester Equation? After the properties drawer, which contains information about the post on WordPress, the text looks like standard $\LaTeX$ , with the exception that a comment line begins with the # symbol rather than %.

Singular Value Inequalities

Recall that the singular value decomposition (SVD) of a matrix $A \in\mathbb{C}^{m\times n}$ is a factorization $A = U\Sigma V^*$ , where $U\in\mathbb{C}^{m\times m}$ and $V\in\mathbb{C}^{n\times n}$ are unitary and $\Sigma = \mathrm{diag}(\sigma_1,\dots, \sigma_p)\in\mathbb{R}^{m\times n}$ , with $\sigma_1\ge \sigma_2\ge \cdots \ge \sigma_p \ge 0$ , where where $p = \min(m,n)$ . We sometimes write $\sigma_i(A)$ to specify the matrix to which the singular value belongs.

A standard technique for obtaining singular value inequalities for $A$ is to apply eigenvalue inequalities to the Hermitian positive semidefinite matrices $A^*A$ or $AA^*$ , whose eigenvalues are the squares of the singular values of $A$ , or to the Hermitian matrix

$\notag \begin{bmatrix} 0 & A \\ A^* & 0 \end{bmatrix}, \qquad (1)$

whose eigenvalues are plus and minus the singular values of $A$ together with $|m-n|$ zero eigenvalues if $m\ne n$ .

We begin with a variational characterization of singular values.

Theorem 1. For $A\in\mathbb{C}^{m\times n}$ ,

$\notag \begin{aligned} \sigma_k &= \min_{\dim(S)=n-k+1} \, \max_{0\ne x\in S} \frac{\|Ax\|_2}{\|x\|_2}\\ &= \max_{\dim(S)= k} \, \min_{0\ne x\in S} \frac{\|Ax\|_2}{\|x\|_2}, \quad k=1\colon \min(m,n), \end{aligned}$

where $S\subseteq \mathbb{C}^n$ .

Proof. The result is obtained by applying the Courant–Fischer theorem (a variational characterization of eigenvalues) to $A^*A$ . $~\square$

As a special case of Theorem 1, we have

$\notag \sigma_1 = \displaystyle\max_{x \ne 0}\frac{ \|Ax\|_2 }{ \|x\|_2 }, \qquad (2)$

and, for $m\ge n$ ,

$\notag \sigma_n = \displaystyle\min_{x \ne 0}\frac{ \|Ax\|_2 }{ \|x\|_2 }. \qquad (3)$

The expression in the theorem can be rewritten using $\|x\|_2 = \max_{y\ne 0}|y^*x|/\|y\|_2$ (the equality case in the Cauchy–Schwarz inequality). For example, (2) is equivalent to

$\notag \sigma_1 = \displaystyle\max_{0\ne x\in \mathbb{C}^n\atop 0 \ne y \in \mathbb{C}^m} \displaystyle\frac{|y^*Ax|}{\|x\|_2\|y\|_2}.$

Our first perturbation result bounds the change in a singular value.

Theorem 2. For $A,B\in\mathbb{C}^{m\times n}$ ,

$\notag |\sigma_i(A) - \sigma_i(B)| \le \|A - B \|_2, \quad i = 1\colon \min(m,n). \qquad (4)$

Proof. The bound is obtained by applying the corresponding result for the Hermitian eigenvalue problem to (1). $~\square$

The bound (4) says that the singular values of a matrix are well conditioned under additive perturbation. Now we consider multiplicative perturbations.

The next result is an analogue for singular values of Ostrowski’s theorem for eigenvalues.

Theorem 3. For $A\in \mathbb{C}^{m\times n}$ and nonsingular $X\in\mathbb{C}^{n\times n}$ and $Y\in\mathbb{C}^{m\times m}$ ,

$\notag \sigma_i(Y^*AX) = \theta_i \sigma_i(A), \quad i = 1\colon \min(m,n), \qquad (5)$

where $\sigma_n(X)\sigma_m(Y) \le \theta_i \le \sigma_1(X) \sigma_1(Y)$ .

A corollary of this result is

$\notag |\sigma_i(A) - \sigma_i(Y^*AX)| \le \sigma_i(A) \epsilon, \quad i = 1\colon \min(m,n), \qquad (6)$

where $\epsilon = \max(\|X^*X - I\|_2,\|Y^*Y - I\|_2)$ . The bounds (5) and (6) are intuitively reasonable, because unitary transformations preserve singular values and the bounds quantify in different ways how close $X$ and $Y$ are to being unitary.

Next, we have an interlacing property.

Theorem 4. Let $A\in\mathbb{C}^{m\times n}$ , $A_k = A(:,1\colon k)$ , and $q = \min(m,k)$ . Then

$\notag \sigma_{i+1}(A_{k+1}) \le \sigma_i(A_k) \le \sigma_i(A_{k+1}), \quad i=1\colon q, \quad k = 1\colon n-1,$

where we define $\sigma_{q+1}(A_{k+1}) = 0$ if $m < k+1$ .

Proof. The result is obtained by applying the Cauchy interlace theorem to $A^*A$ , noting that $A_k^*A_k$ is the leading principal submatrix of order $k$ of $A^*A$ . $~\square$

An analogous result holds with rows playing the role of columns (just apply Theorem 4 to $A^*$ ).

Theorem 4 encompasses two different cases, which we illustrate with $i = q$ and $k = n-1$ . The first case is $m \ge n$ , so that $q = n-1$ and

$\notag \sigma_n(A) \le \sigma_{n-1}(A_{n-1}) \le \sigma_{n-1}(A).$

The second case is $m < n$ , so $q = m$ and

$\notag 0 \le \sigma_m(A_{n-1}) \le \sigma_m(A).$

Therefore Theorem 3 shows that removing a column from $A$ does not increase any singular value and that when $m\ge n$ no singular value decreases below $\sigma_n(A)$ . However, when $m < n$ the smallest singular value of $A_{n-1}$ may be less than the smallest singular value of $A$ .

Here is a numerical example. Note that transposing $A$ does not change its singular values.

>> rng(1), A = rand(5,4); % Case 1.
>> B = A(:,1:end-1); sv_A = svd(A)', sv_B = svd(B)'
sv_A =
   1.7450e+00   6.4492e-01   5.5015e-01   3.2587e-01
sv_B =
   1.5500e+00   5.8472e-01   3.6128e-01
> A = A'; B = A(:,1:end-1); sv_B = svd(B)' % Case 2
sv_B =
   1.7098e+00   6.0996e-01   4.6017e-01   1.0369e-01

By applying Theorem 4 repeatedly we find that if we partition $A = [A_{11}~A_{12}]$ then $\sigma_i(A_{11}) \le \sigma_i(A)$ for all $i$ for which the left-hand side is defined.

References

Roger A. Horn and Charles R. Johnson, Matrix Analysis, second edition, Cambridge University Press, 2013. My review of the second edition.
Ilse Ipsen, Relative Perturbation Results for Matrix Eigenvalues and Singular Values, Acta Numerica 7, 151–201, 1998.

What Is a Pseudo-Orthogonal Matrix?

A matrix $Q\in\mathbb{R}^{n\times n}$ is pseudo-orthogonal if

$\notag Q^T \Sigma Q = \Sigma, \qquad (1)$

where $\Sigma = \mathrm{diag}(\pm 1)$ is a signature matrix. A matrix $Q$ satisfying (1) is also known as a $J$ -orthogonal matrix, where $J$ is another notation for a signature matrix. Of course, if $\Sigma = I$ then $Q$ is orthogonal.

It is easy to show that $Q^T$ is also pseudo-orthogonal. Furthermore, $Q$ is clearly nonsingular and it satisfies

$\notag Q = \Sigma Q^{-T}\Sigma. \qquad (2)$

Since $\Sigma$ is orthogonal, this equation implies that $\|Q\|_\ell = \|Q^{-T}\|_\ell = \|Q^{-1}\|_\ell$ and hence that

$\notag \kappa_p(Q) = \|Q\|_\ell^2, \quad \ell = 2,F. \qquad(3).$

What are some examples of pseudo-orthogonal matrices? For $n = 2$ and $\Sigma = \left[\begin{smallmatrix}1 & 0 \\ 0 & -1 \end{smallmatrix}\right]$ , $Q$ is of the form

$\notag Q = \begin{bmatrix} a & b \\ c & d \end{bmatrix}, \quad ab - cd = 0, \quad a^2 - c^2 = 1, \quad b^2 - d^2 = -1,$

which includes the matrices

$\notag Q = \begin{bmatrix} \cosh \theta & -\sinh\theta \\ -\sinh\theta & \cosh\theta \end{bmatrix}, \quad \theta\in\mathbb{R}. \qquad (4)$

The Lorentz group, representing symmetries of the spacetime of special relativity, corresponds to $4\times 4$ matrices with $\Sigma = \mathrm{diag}(1,1,1,-1)$ .

Equation (2) shows that $Q$ is similar to the inverse of its transpose and hence (since every matrix is similar to its transpose) similar to its inverse. It follows that if $\lambda$ is an eigenvalue of $Q$ then $\lambda^{-1}$ is also an eigenvalue and it has the same algebraic and geometric multiplicities as $\lambda$ .

By permuting rows and columns in (1) we can arrange that

$\notag \Sigma = \Sigma_{p,q} := \begin{bmatrix} I_p & 0 \\ 0 & -I_q \end{bmatrix}. \qquad (5)$

We assume that $\Sigma$ has this form throughout the rest of this article. This form of $\Sigma$ allows us to conveniently characterize matrices that are both orthogonal and pseudo-orthogonal. Such a matrix must satisfy $\Sigma Q = Q\Sigma$ , which means that $Q = \mathrm{diag}(Q_{11},Q_{22})$ , and any such orthogonal matrix is pseudo-orthogonal.

Applications

Pseudo-orthogonal matrices arise in hyperbolic problems, that is, problems where there is an underlying indefinite scalar product or weight matrix. An example is the problem of downdating the Cholesky factorization, where in the simplest case we have the Cholesky factorization $C = R^T\!R$ of a symmetric positive definite $C\in\mathbb{R}^{n\times n}$ and want the Cholesky factorization of $\widetilde{C} = C - zz^T$ , which is assumed to be symmetric positive definite. A more general downdating problem is that we are given

$\notag A = \begin{array}[b]{cc} \left[\begin{array}{@{}c@{}} A_1\\ A_2 \end{array}\right] & \mskip-22mu\ \begin{array}{l} \scriptstyle p \\ \scriptstyle q \end{array} \end{array}, \quad p\ge n,$

and the Cholesky factorization $A^T\!A = R^T\!R$ and wish to obtain the Cholesky factor $S$ of $A_1^TA_1 = R^T\!R - A_2^TA_2$ . Note that $R$ and $S$ are $n\times n$ . This task arises when we solve a regression problem after the observations corresponding to $A_2$ have been removed. The simple case above corresponds to removing one row ( $q = 1$ ). Assuming that $q \ll p$ , we would like to obtain $S$ cheaply from $R$ , and numerical stability considerations dictate that we should avoid explicitly forming $A_1^TA_1$ . If we can find a pseudo-orthogonal matrix $Q$ such that

$\notag Q \begin{bmatrix} R \\ A_2 \end{bmatrix} = \begin{bmatrix} S \\ 0 \end{bmatrix}, \qquad (6)$

with $\Sigma$ given by (5) and $S\in\mathbb{R}^{n\times n}$ upper triangular, then

$\notag A_1^TA_1 = \begin{bmatrix} R \\ A_2 \end{bmatrix}^T \Sigma \begin{bmatrix} R \\ A_2 \end{bmatrix} = \begin{bmatrix} R \\ A_2 \end{bmatrix}^T Q^T \Sigma Q \begin{bmatrix} R \\ A_2 \end{bmatrix} = \begin{bmatrix} S \\ 0 \end{bmatrix}^T \Sigma \begin{bmatrix} S \\ 0 \end{bmatrix} = S^T\!S,$

so $S$ is the desired Cholesky factor.

The factorization (6) is called a hyperbolic QR factorization and it can be computed by using hyperbolic rotations to zero out the elements of $A_2$ . A $2\times2$ hyperbolic rotation has the form (4), and an $n\times n$ hyperbolic rotation is an identity matrix with a $2\times 2$ hyperbolic rotation embedded in it at the intersection of rows and columns $i$ and $j$ , for some $i$ and $j$ .

In general, a hyperbolic QR factorization of $A\in\mathbb{R}^{m\times n}$ with $m = p+q$ and $p\ge n$ has the form $QA = \left[\begin{smallmatrix} R \\ 0 \end{smallmatrix}\right]$ with $Q$ pseudo-orthogonal with respect to $\Sigma = \Sigma_{p,q}$ and $R \in\mathbb{R}^{n\times n}$ upper triangular. The factorization exists if $A^T\Sigma A$ is positive definite.

Another hyperbolic problem is the indefinite least squares problem

$\notag \min_x \,(b-Ax)^T \Sigma (b-Ax), \qquad (7)$

where $A\in\mathbb{R}^{m\times n}$ , $m\ge n$ , and $b\in\mathbb{R}^m$ are given, and $\Sigma = \Sigma_{p,q}$ with $m = p + q$ . For $p=0$ or $q=0$ we have the standard least squares (LS) problem and the quadratic form is definite, while for $pq>0$ the problem is to minimize a genuinely indefinite quadratic form. This problem arises, for example, in the area of optimization known as $H^{\infty}$ smoothing.

The normal equations for (7) are $A^T\Sigma Ax = A^T\Sigma b$ , and since the Hessian matrix of the quadratic objective function in (7) is $A^T\Sigma A$ it follows that the indefinite least squares problem has a unique solution if and only if $A^T\Sigma A$ is positive definite. To solve the problem we can use a hyperbolic QR factorization $QA = \left[\begin{smallmatrix} R \\ 0 \end{smallmatrix}\right]$ to write

$\notag \begin{aligned} A^T\Sigma A &= A^T Q^T \Sigma Q A = \begin{bmatrix} R \\ 0 \end{bmatrix}^T \begin{bmatrix} I_p & 0 \\ 0 & -I_q \end{bmatrix} \begin{bmatrix} R \\ 0 \end{bmatrix} = R^T\!R, \\ A^T\Sigma b &= A^T Q^T\Sigma Q b = \begin{bmatrix} R \\ 0 \end{bmatrix}^T \! \Sigma Q b. \end{aligned}$

Solving the problem now reduces to solving the triangular system $Rx = d$ , where $d$ comprises the first $n$ components of $Qb$ . The same equation can also be obtained without using the normal equations by substituting the hyperbolic QR factorization into (7).

The Exchange Operator

A simple technique exists for converting pseudo-orthogonal matrices into orthogonal matrices and vice versa. Let $A\in\mathbb{R}^{n\times n}$ with $n = p + q$ , partition

$\notag A = \mskip5mu \begin{array}[b]{@{\mskip-20mu}c@{\mskip0mu}c@{\mskip-1mu}c@{}} & \mskip10mu\scriptstyle p & \scriptstyle q \\ \mskip15mu \begin{array}{r} \scriptstyle p \\ \scriptstyle q \end{array}~ & \multicolumn{2}{c}{\mskip-15mu \left[\begin{array}{c@{~}c@{~}} A_{11} & A_{12}\\ A_{21} & A_{22} \end{array}\right] } \end{array}, \qquad (8)$

and assume $A_{11}$ is nonsingular. The exchange operator is defined by

$\notag \mathrm{exc}(A) = \begin{bmatrix} A_{11}^{-1} & -A_{11}^{-1}A_{12} \\ A_{21}A_{11}^{-1} & A_{22} -A_{21}A_{11}^{-1}A_{12} \end{bmatrix}.$

It is easy to see that the exchange operator is involutory, that is,

$\notag \mathrm{exc}(\mathrm{exc}(A)) = A,$

and moreover (recalling that $\Sigma$ is given by (5)) that

$\notag \mathrm{exc}(\Sigma A\Sigma) = \Sigma \mathrm{exc}(A)\Sigma = \mathrm{exc}(A^T)^T. \qquad(9)$

The next result gives a formula for the inverse of $\mathrm{exc}(A)$ .

Lemma 1. Let $A\in\mathbb{R}^{n\times n}$ with $A_{11}$ nonsingular. If $A$ is nonsingular and $\mathrm{exc}(A^{-1})$ exists then $\mathrm{exc}(A)$ is nonsingular and $\mathrm{exc}(A)^{-1} = \mathrm{exc}(A^{-1})$ .

Proof. Consider the equation

$\notag y = \begin{bmatrix} y_1 \\ y_2 \end{bmatrix} = \begin{bmatrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = Ax.$

By solving the first equation for $x_1$ and then eliminating $x_1$ from the second equation we obtain

$\notag \begin{bmatrix} x_1 \\ y_2 \end{bmatrix} = \mathrm{exc}(A) \begin{bmatrix} y_1 \\ x_2 \end{bmatrix}. \qquad (10)$

By the same argument applied to $x = A^{-1}y$ , we have

$\notag \begin{bmatrix} y_1 \\ x_2 \end{bmatrix} = \mathrm{exc}(A^{-1}) \begin{bmatrix} x_1 \\ y_2 \end{bmatrix}.$

Hence for any $x_1$ and $y_2$ there is a unique $y_1$ and $x_2$ , which implies by (10) that $\mathrm{exc}(A)$ is nonsingular and that $\mathrm{exc}(A)^{-1} = \mathrm{exc}(A^{-1})$ . $~\square$

Now we will show that the exchange operator maps pseudo-orthogonal matrices to orthogonal matrices and vice versa.

Theorem 2. Let $A\in\mathbb{R}^{n\times n}$ . If $A$ is pseudo-orthogonal then $\mathrm{exc}(A)$ is orthogonal. If $A$ is orthogonal and $A_{11}$ is nonsingular then $\mathrm{exc}(A)$ is pseudo-orthogonal.

Proof. If $A$ is pseudo-orthogonal then $A_{11}^TA_{11} = I + A_{21}^TA_{21}$ , which implies that $A_{11}$ is nonsingular. Since $\Sigma A^T\Sigma = A^{-1}$ , it follows that $A^{-1}$ also has a nonsingular $(1,1)$ block and so $\mathrm{exc}(A^{-1})$ exists. Furthermore, using Lemma 1, $\mathrm{exc}(\Sigma A^T\Sigma) = \mathrm{exc}(A^{-1}) = \mathrm{exc}(A)^{-1}$ . But (9) shows that $\mathrm{exc}(\Sigma A^T\Sigma) = \mathrm{exc}(A)^T$ , and we conclude that $\mathrm{exc}(A)$ is orthogonal.

Assume now that $A$ is orthogonal with $A_{11}$ nonsingular. Then $\mathrm{exc}(A^T) = \mathrm{exc}(A^{-1})$ exists and Lemma 1 shows that $\mathrm{exc}(A)$ is nonsingular and $\mathrm{exc}(A)^{-1} = \mathrm{exc}(A^{-1}) = \mathrm{exc}(A^T)$ . Hence, using (9),

$I = \mathrm{exc}(A^T) \mathrm{exc}(A) = \Sigma\mathrm{exc}(A)^T\Sigma \cdot \mathrm{exc}(A),$

which shows that $\mathrm{exc}(A)$ is pseudo-orthogonal. $~\square$

This MATLAB example uses the exchange operator to convert an orthogonal matrix obtained from a Hadamard matrix into a pseudo-orthogonal matrix.

>> p = 2; n = 4;
>> A = hadamard(n)/sqrt(n), Sigma = blkdiag(eye(p),-eye(n-p))
A =
   5.0000e-01   5.0000e-01   5.0000e-01   5.0000e-01
   5.0000e-01  -5.0000e-01   5.0000e-01  -5.0000e-01
   5.0000e-01   5.0000e-01  -5.0000e-01  -5.0000e-01
   5.0000e-01  -5.0000e-01  -5.0000e-01   5.0000e-01
Sigma =
     1     0     0     0
     0     1     0     0
     0     0    -1     0
     0     0     0    -1
>> Q = exc(A,p), Q'*Sigma*Q
Q =
     1     1    -1     0
     1    -1     0    -1
     1     0    -1    -1
     0     1    -1     1
ans =
     1     0     0     0
     0     1     0     0
     0     0    -1     0
     0     0     0    -1

The code uses the function

function X = exc(A,p)
%EXC     Exchange operator.
%   EXC(A,p) is the result of applying the exchange operator to 
%   the square matrix A, which is regarded as a block 2-by-2 
%   matrix with leading block of dimension p.  
%   p defaults to floor(n)/2.

[m,n] = size(A);
if m ~= n, error('Matrix must be square.'), end
if nargin < 2, p = floor(n/2); end

A11 = A(1:p,1:p);
A12 = A(1:p,p+1:n);
A21 = A(p+1:n,1:p);
A22 = A(p+1:n,p+1:n);

X21 = A11\A12;
X = [inv(A11) -X21;
     A21/A11  A22-A21*X21];

Hyperbolic CS Decomposition

For an orthogonal matrix expressed in block $2\times 2$ form there is a close relationship between the singular value decompositions (SVDs) of the blocks, as revealed by the CS decomposition (see What Is the CS Decomposition?). An analogous decomposition holds for a pseudo-orthogonal matrix. Let $Q\in\mathbb{R}^{n \times n}$ be pseudo-orthogonal with respect to $\Sigma$ in (5), and suppose that $Q$ is partitioned as

$\notag Q = \begin{array}[b]{@{\mskip33mu}c@{\mskip-16mu}c@{\mskip-10mu}c@{}} \scriptstyle p & \scriptstyle n-p & \\ \multicolumn{2}{c}{ \left[\begin{array}{c@{~}c@{~}} Q_{11}& Q_{12} \\ Q_{21}& Q_{22} \\ \end{array}\right]} & \mskip-12mu\ \begin{array}{c} \scriptstyle p \\ \scriptstyle n-p \end{array} \end{array}, \quad p \le \displaystyle\frac{n}{2}.$

Then there exist orthogonal matrices $U_1,V_1\in\mathbb{R}^{p \times p}$ and $U_2,V_2\in\mathbb{R}^{q \times q}$ such that

$\notag \begin{bmatrix} U_1^T & 0\\ 0 & U_2^T \end{bmatrix} \begin{bmatrix} Q_{11} & Q_{12}\\ Q_{21} & Q_{22} \end{bmatrix} \begin{bmatrix} V_1 & 0\\ 0 & V_2 \end{bmatrix} = \begin{array}[b]{@{\mskip35mu}c@{\mskip30mu}c@{\mskip-10mu}c@{}c} \scriptstyle p & \scriptstyle p & \scriptstyle n-2p & \\ \multicolumn{3}{c}{ \left[\begin{array}{c@{~}|c@{~}c} C & -S & 0 \\ \hline -S & C & 0 \\ 0 & 0 & I_{n-2p} \end{array}\right]} & \mskip-12mu \begin{array}{c} \scriptstyle p \\ \scriptstyle p \\ \scriptstyle n-2p \end{array} \end{array}, \qquad (11)$

where $C = \mathrm{diag}(c_i)$ , $S = \mathrm{diag}(s_i)$ , and $C^2 - S^2 = I$ , with $c_i > s_i \ge 0$ for all $i$ . This is the hyperbolic CS decomposition, and it can be proved by applying the CS decomposition of an orthogonal matrix to $\mathrm{exc}(Q)$ .

The leading principal submatrix $\left[\begin{smallmatrix}C & -S \\ -S & C \end{smallmatrix}\right]$ in (11) generalizes the $2\times 2$ matrix (4), and in fact it can be permuted into a direct sum of such matrices.

Note that the matrix on the right in (11) is symmetric positive definite. Therefore the singular values of $Q$ are the eigenvalues of that matrix, namely

$\notag c_1 \pm s_1, \dots, c_p \pm s_p; \quad 1~\mathrm{with~multiplicity~}n - 2p.$

Since $c_i^2 - s_i^2 = 1$ for all $i$ , the first $2p$ singular values occur in reciprocal pairs, hence the largest and smallest singular values satisfy $\sigma_1 = \sigma_n^{-1}\ge 1$ (with strict inequality unless $p = 0$ ). This gives another proof of (3).

Numerical Stability

While an orthogonal matrix is perfectly conditioned, a pseudo-orthogonal matrix can be arbitrarily ill conditioned, as follows from (3). For example, the MATLAB function gallery('randjorth') produces a random pseudo-orthogonal matrix with a default condition number of sqrt(1/eps).

>> rng(1); A = gallery('randjorth',2,2) % p = 2, n = 4
A =
   2.9984e+03  -4.2059e+02   1.5672e+03  -2.5907e+03
   1.9341e+03  -2.6055e+03   3.1565e+03  -7.5210e+02
   3.1441e+03  -6.2852e+02   1.8157e+03  -2.6427e+03
   1.6870e+03  -2.5633e+03   3.0204e+03  -5.4157e+02
>> cond(A)
ans =
   6.7109e+07

This means that algorithms that use pseudo-orthogonal matrices are potentially numerically unstable. Therefore algorithms need to be carefully constructed and rounding error analysis must be done to ensure that an appropriate form of numerical stability is obtained.

Notes

Pseudo-orthogonal matrices form the automorphism group of the scalar product defined by $\langle x,y\rangle = x^T\Sigma y$ for $x,y\in\mathbb{R}^n$ . More results for pseudo-orthogonal matrices can be obtained as special cases of results for automorphism groups of general scalar products. See, for example, Mackey, Mackey, and Tisseur (2006).

For $\Sigma \ne \pm I$ the set of pseudo-orthogonal matrices is known to have four connected components, a topological property that can be proved using the hyperbolic CS decomposition (Motlaghian, Armandnejad, and Hall, 2018).

One can define pseudo-unitary matrices in an analogous way, as $Q\in\mathbb{C}^{n\times n}$ such that $Q^*\Sigma Q = \Sigma$ . These correspond to the automorphism group of the scalar product $\langle x,y\rangle = x^*\Sigma y$ for $x,y\in\mathbb{C}^n$ . The results we have discussed generalize in a straightforward way to pseudo-unitary matrices.

The exchange operator is also known as the principal pivot transform and as the sweep operator in statistics. Tsatsomeros (2000) gives a survey of its properties

The hyperbolic CS decomposition was derived by Lee (1948) and, according to Lee, was present in work of Autonne (1912).

References

This is a minimal set of references, which contain further useful references within.

Adam Bojanczyk, Nicholas J. Higham and Harikrishna Patel, Solving the Indefinite Least Squares Problem by Hyperbolic QR Factorization, SIAM J. Matrix Anal. Appl. 24(3), 914–931, 2003
Nicholas J. Higham, $J$ -Orthogonal Matrices: Properties and Generation, SIAM Rev. 45(3), 504–519, 2003
H. C. Lee, Canonical Factorization of Pseudo-Unitary Matrices, Proc. London Math. Soc. s2-50, 230–241, 1948.
D. Steven Mackey, Niloufer Mackey, and Françoise Tisseur, Structured Factorizations in Scalar Product Spaces, SIAM J. Matrix Anal. Appl. 27(3), 821–850, 2006.
Sara M. Motlaghian, Ali Armandnejad, and Frank J. Hall, Topological Properties of $J$ -Orthogonal Matrices, Linear and Multilinear Algebra 66(12), 2524–2533, 2018.
Michael Stewart and G. W. Stewart, On Hyperbolic Triangularization: Stability and Pivoting, SIAM J. Matrix Anal. Appl. 19(4), 847–860, 1998
Michael J. Tsatsomeros, Principal Pivot Transforms: Properties and Applications, Linear Algebra Appl. 307, 151–165, 2000.

What Is an LU Factorization?

An LU factorization of an $n\times n$ matrix $A$ is a factorization $A = LU$ , where $L$ is unit lower triangular and $U$ is upper triangular. “Unit” means that $L$ has ones on the diagonal. Example:

$\notag \left[\begin{array}{rrrr} 3 & -1 & 1 & 1\\ -1 & 3 & 1 & -1\\ -1 & -1 & 3 & 1\\ 1 & 1 & 1 & 3 \end{array}\right] = \left[\begin{array}{rrrr} 1 & 0 & 0 & 0\\ -\frac{1}{3} & 1 & 0 & 0\\ -\frac{1}{3} & -\frac{1}{2} & 1 & 0\\ \frac{1}{3} & \frac{1}{2} & 0 & 1 \end{array}\right] \left[\begin{array}{rrrr} 3 & -1 & 1 & 1\\ 0 & \frac{8}{3} & \frac{4}{3} & -\frac{2}{3}\\ 0 & 0 & 4 & 1\\ 0 & 0 & 0 & 3 \end{array}\right]. \qquad (1)$

An LU factorization simplifies the solution of many problems associated with linear systems. In particular, solving a linear system $Ax = b$ reduces to solving the triangular systems $Ly = b$ and $Ux = y$ , since then $b = L(Ux)$ .

For a given $A$ , an LU factorization may or may not exist, and if it does exist it may not be unique. Conditions for existence and uniqueness are given in the following result (see Higham, 2002, Thm. 9.1 for a proof). Denote by $A_k = A(1\colon k,1\colon k)$ the leading principal submatrix of $A$ of order $k$ .

Theorem 1. The matrix $A\in\mathbb{R}^{n\times n}$ has a unique LU factorization if and only if $A_k$ is nonsingular for $k=1\colon n-1$ . If $A_k$ is singular for some $1\le k \le n-1$ then the factorization may exist, but if so it is not unique.

Note that the (non)singularity of $A$ plays no role in Theorem 1. However, if $A$ is nonsingular and has an LU factorization then the factorization is unique. Indeed if $A$ has LU factorizations $A = L_1U_1 = L_2U_2$ then the $U_i$ are necessarily nonsingular and so $L_2^{-1}L_1 = U_2U_1^{-1}$ . The left side of this equation is unit lower triangular and the right side is upper triangular; therefore both sides must equal the identity matrix, which means that $L_1 = L_2$ and $U_1 = U_2$ , as required.

Equating leading principal submatrices in $A = LU$ gives $A_k = L_k U_k$ , which implies that $\det(A_k) = \det(U_k) = u_{11} u_{22} \dots u_{kk}$ . Hence $u_{kk} = \det(A_k)/\det(A_{k-1})$ . In fact, such determinantal formulas hold for all the elements of $L$ and $U$ :

$\notag \begin{aligned} \ell_{ij} &= \frac{ \det\bigl( A( [1:j-1, \, i], 1:j ) \bigr) }{ \det( A_j ) }, \quad i > j, \\ u_{ij} &= \frac{ \det\bigl( A( 1:i, [1:i-1, \, j] ) \bigr) } { \det( A_{i-1} ) }, \quad i \le j. \end{aligned}$

Here, $A(u,v)$ , where $u$ and $v$ are vectors of subscripts, denotes the submatrix formed from the intersection of the rows indexed by $u$ and the columns indexed by $v$ .

Relation with Gaussian Elimination

LU factorization is intimately connected with Gaussian elimination. Recall that Gaussian elimination transforms a matrix $A^{(1)} = A\in\mathbb{R}^{n\times n}$ to upper triangular form $U = A^{(n)}$ in $n-1$ stages. At the $k$ th stage, multiples of row $k$ are added to the later rows to eliminate the elements below the diagonal in column $k$ , using the formulas

$\notag a_{ij}^{(k+1)} = a_{ij}^{(k)} - m_{ik} a_{kj}^{(k)}, \quad i = k+1 \colon n, \; j = k+1 \colon n,$

where the quantities $m_{ik} = a_{ik}^{(k)} / a_{kk}^{(k)}$ are the multipliers. Of course each $a_{kk}^{(k)}$ must be nonzero for these formulas to be defined, and this is connected with the conditions of Theorem 1, since $u_{kk} = a_{kk}^{(k)}$ . The final $U$ is the upper triangular LU factor, with $u_{ij} = a_{ij}^{(i)}$ for $j\ge i$ , and $\ell_{ij} = m_{ij}$ for $i > j$ , that is, the multipliers make up the $L$ factor (for a proof of these properties see any textbook on numerical linear algebra).

The matrix factorization viewpoint is well established as a powerful paradigm for thinking and computing. Separating the computation of LU factorization from its application is beneficial. For example, given $A = LU$ we saw above how to solve $Ax = b$ . If we need to solve for another right-hand side $b_2$ we can just solve $Ly_2 = b_2$ and $Ux_2 = y_2$ , re-using the LU factorization. Similarly, solving $A^Tz = c$ reduces to solving the triangular systems $U^T w = c$ and $L^Tz = w$ .

Computation

An LU factorization can be computed by directly solving for the components of $L$ and $U$ in the equation $A = LU$ . Indeed because $L$ has unit diagonal the first row of $U$ is the same as the first row of $A$ , and $a_{k1} = \ell_{k1} u_{11} = \ell_{k1} a_{11}$ then determines the first column of $L$ . One can go on to determine the $k$ th row of $U$ and the $k$ th row of $L$ , for $k = 2\colon n$ . This leads to the Doolittle method, which involves inner products of partial rows of $L$ and partial columns of $U$ .

Given the equivalence between LU factorization and Gaussian elimination we can also employ the Gaussian elimination equations:

$\notag \begin{array}{l} \%~kji~\mathrm{form~of~LU~factorization.}\\ \mbox{for}~k=1:n-1 \\ \qquad \mbox{for}~ j=k+1:n \\ \qquad \qquad \mbox{for}~ i=k+1:n \\ \qquad\qquad\qquad a_{ij}^{(k+1)} = a_{ij}^{(k)} - a_{ik}^{(k)}a_{kj}^{(k)} / a_{kk}^{(k)}\\ \qquad\qquad\mbox{end}\\ \qquad\mbox{end}\\ \mbox{end}\\ \end{array}$

This $kji$ ordering of the loops in the factorization is the basis of early Fortran implementations of LU factorization, such as that in LINPACK. The inner loop travels down the columns of $A$ , accessing contiguous elements of $A$ since Fortran stores arrays by column. Interchanging the two inner loops gives the $kij$ ordering, which updates the matrix a row at a time, and is appropriate for a language such as C that stores arrays by row.

The $ijk$ and $jik$ orderings correspond to the Doolittle method. The last two of the $3! = 6$ orderings are the $ikj$ and $jki$ orderings, to which we will return later.

Schur Complements

For $A\in\mathbb{R}^{n\times n}$ with $\alpha = a_{11} \ne 0$ we can write

$\notag A = \begin{bmatrix} \alpha & b^T \\ c & D \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ c/\alpha & I_{n-1} \end{bmatrix} \begin{bmatrix} \alpha & b^T \\ 0 & D - cb^T/\alpha \end{bmatrix} = : L_1U_1. \qquad (2)$

The $(n-1)\times (n-1)$ matrix $S = D - cb^T/\alpha$ is called the Schur complement of $\alpha$ in $A$ .

The first row and column of $L_1$ and $U_1$ have the correct forms for a unit lower triangular matrix and an upper triangular matrix, respectively. If we can find an LU factorization $S = L_2U_2$ then

$\notag A = \begin{bmatrix} 1 & 0 \\ c/\alpha & L_2 \end{bmatrix} \begin{bmatrix} \alpha & b^T \\ 0 & U_2 \end{bmatrix}$

is an LU factorization of $A$ . Note that this is simply another way to express the $kji$ algorithm above.

For several matrix structures it is immediate that $\alpha \ne 0$ . If we can show that the Schur complement inherits the same structure then it follows by induction that we can compute the factorization for $S$ , and so an LU factorization of $A$ exists. Classes of matrix for which $a_{11} \ne 0$ and $A$ being in the class implies the Schur complement $S$ is also in the class include

symmetric positive definite matrices,
$M$ -matrices,
matrices (block) diagonally dominant by rows or columns.

(The proofs of these properties are nontrivial.) Note that the matrix (1) is row diagonally dominant, as is its $U$ factor, as must be the case since its rows are contained in Schur complements.

We say that $A$ has upper bandwidth $q$ if $a_{ij} = 0$ for $j>i+q$ and lower bandwidth $p$ if $a_{ij} = 0$ for $i>j+p$ . Another use of (2) is to show that $L$ and $U$ inherit the bandwidths of $A$ .

Theorem 2. Let $A\in\mathbb{R}^{n\times n}$ have lower bandwidth $p$ and upper bandwidth $q$ . If $A$ has an LU factorization then $L$ has lower bandwidth $p$ and $U$ has upper bandwidth $q$ .

Proof. In (2), the first column of $L_1$ and the first row of $U_1$ have the required structure and $S$ has upper bandwidth $q$ and lower bandwidth $p$ , since $c$ and $b$ have only $p$ and $q$ nonzero components, respectively. The result follows by induction.

Block Implementations

In order to achieve high performance on modern computers with their hierarchical memories, LU factorization is implemented in a block form expressed in terms of matrix multiplication and the solution of multiple right-hand side triangular systems. We describe two block forms of LU factorization. First, consider a block form of (2) with block size $p$ , where $A_{11}$ is $p \times p$ :

$\notag A = \begin{bmatrix} A_{11} & A_{12}\\ A_{21} & A_{22} \end{bmatrix} = \begin{bmatrix} L_{11} & 0 \\ L_{21}& I_{n-p} \end{bmatrix} \begin{bmatrix} U_{11} & U_{12} \\ 0 & S \end{bmatrix}.$

Here, $S$ is the Schur complement of $A_{11}$ in $A$ , given by $S = A_{22} - A_{21}A_{11}^{-1}A_{12}$ . This leads to the following algorithm:

Factor $A_{11} = L_{11}U_{11}$ .

Solve $L_{11}U_{12} = A_{12}$ for $U_{12}$ .

Solve $L_{21}U_{11} = A_{21}$ for $L_{21}$ .

Form $S = A_{22}-L_{21}U_{12}$ .

Repeat steps 1–4 on $S$ to obtain $S = L_{22}U_{22}$ .

The factorization on step 1 can be done by any form of LU factorization. This algorithm is known as a right-looking algorithm, since it accesses data to the right of the block being worked on (in particular, at each stage lines 2 and 4 access the last few columns of the matrix).

An alternative algorithm can derived by considering a block $3\times 3$ partitioning, in which we assume we have already computed the first block column of $L$ and $U$ :

$\notag A = \begin{bmatrix} A_{11} & A_{12} & A_{13}\\ A_{21} & A_{22} & A_{23}\\ A_{31} & A_{32} & A_{33} \end{bmatrix} = \begin{bmatrix} L_{11} & 0 & 0 \\ L_{21} & L_{22}& 0 \\ L_{31} & L_{32} & I \end{bmatrix} \begin{bmatrix} U_{11} & U_{12} & \times \\ 0 & U_{22} & \times \\ 0 & 0 & \times \end{bmatrix}.$

We now compute the middle block column of $L$ and $U$ , comprising $p$ columns, by

Solve $L_{11}U_{12} = A_{12}$ for $U_{12}$ .

Factor $A_{22}-L_{21}U_{12} = L_{22}U_{22}$ .

Solve $L_{32}U_{22} = A_{32} - L_{31}U_{12}$ for $L_{32}$ .

Repartition so that the first two block columns become a single block column and repeat steps 1–4.

This algorithm corresponds to the $jki$ ordering. Note that the Schur complement is updated only a block column at a time. Because the algorithm accesses data only to the left of the block column being worked on, it is known as a left-looking algorithm.

Our description of these block algorithms emphasizes the mathematical ideas. The implementation details, especially for the left-looking algorithm, are not trivial. The optimal choice of block size $p$ will depend on the machine, but $p$ is typically in the range $64$ — $512$ .

An important point is that all these different forms of LU factorization, no matter which $ijk$ ordering or which value of $p$ , carry out the same operations. The only difference is the order in which the operations are performed (and the order in which the data is accessed). Even the rounding errors are the same for all versions (assuming the use of “plain vanilla” floating-point arithmetic).

Rectangular Matrices

Although it is most commonly used for square matrices, LU factorization is defined for rectangular matrices, too. If $A\in\mathbb{R}^{m\times n}$ then the factorization has the form $A = LU$ with $L\in\mathbb{R}^{m\times m}$ lower triangular and $U\in\mathbb{R}^{m\times n}$ upper trapezoidal. The conditions for existence and uniqueness of an LU factorization of $A$ are the same as those for $A(1\colon p, 1\colon p)$ , where $p = \min(m,n)$ .

Block LU Factorization

Another form of LU factorization relaxes the structure of $L$ and $U$ from triangular to block triangular, with $L$ having identity matrices on the diagonal:

$\notag L = \begin{bmatrix} I & & & \\ L_{21} & I & & \\ \vdots & & \ddots & \\ L_{m1} & \dots & L_{m,m-1} & I \end{bmatrix}, \quad U = \begin{bmatrix} U_{11} & U_{12} & \dots & U_{1m} \\ & U_{22} & & \vdots \\ & & \ddots & U_{m-1,m} \\ & & & U_{mm} \end{bmatrix}.$

Note that $U$ is not, in general, upper triangular.

An example of a block LU factorization is

$\notag A = \left[ \begin{array}{rr|rr} 0 & 1 & 1 & 1 \\ -1 & 1 & 1 & 1 \\\hline -2 & 3 & 4 & 2 \\ -1 & 2 & 1 & 3 \\ \end{array} \right] = \left[ \begin{array}{cc|cc} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\\hline 1 & 2 & 1 & 0 \\ 1 & 1 & 0 & 1 \\ \end{array} \right] \left[ \begin{array}{rr|rr} 0 & 1 & 1 & 1 \\ -1 & 1 & 1 & 1 \\\hline 0 & 0 & 1 & -1 \\ 0 & 0 & -1 & 1 \\ \end{array} \right].$

LU factorization fails on $A$ because of the zero $(1,1)$ pivot. This block LU factorization corresponds to using the leading $2\times 2$ principal submatrix of $A$ to eliminate the elements in the $(3\colon 4,1\colon 2)$ submatrix. In the context of a linear system $Ax=b$ , we have effectively solved for the variables $x_1$ and $x_2$ in terms of $x_3$ and $x_4$ and then substituted for $x_1$ and $x_2$ in the last two equations.

Conditions for the existence of a block LU factorization are analogous to, but less stringent than, those for LU factorization in Theorem 1.

Theorem 3. The matrix $A\in\mathbb{R}^{n\times n}$ has a unique block LU factorization if and only if the first $m-1$ leading principal block submatrices of $A$ are nonsingular.

The conditions in Theorem 3 can be shown to be satisfied if $A$ is block diagonally dominant by rows or columns.

Note that to solve a linear system $Ax = b$ using a block LU factorization we need to solve $Ly = b$ and $Ux = y$ , but the latter system is not triangular and requires the solution of systems $U_{ii}x_i = y_i$ involving the diagonal blocks of $U$ , which would normally be done by (standard) LU factorization.

Sensitivity

If $A$ has a unique LU factorization then for a small enough perturbation $\Delta A$ an LU factorization $A + \Delta A = (L + \Delta L)(U + \Delta U)$ exists. To first order, this equation is $\Delta A = \Delta L U + L \Delta U$ , which gives

$\notag L^{-1}\Delta A \mskip2mu U^{-1} = L^{-1}\Delta L + \Delta U \mskip2mu U^{-1}.$

Since $\Delta L$ is strictly lower triangular and $\Delta U$ is upper triangular, we have, to first order,

$\notag \Delta L = L \mskip 1mu \mathrm{tril}( L^{-1}\Delta A U^{-1} ), \quad \Delta U = \mathrm{triu}( L^{-1}\Delta A U^{-1} )U,$

where $\mathrm{tril}$ denotes the strictly lower triangular part and $\mathrm{triu}$ the strictly upper triangular part. Clearly, the sensitivity of the LU factors depends on the inverses of $L$ and $U$ . However, in most situations, such as when we are solving a linear system $Ax = b$ , it is the backward stability of the LU factors, not their sensitivity, that is relevant.

Pivoting and Numerical Stability

Since not all matrices have an LU factorization, we need the option of applying row and column interchanges to ensure that the pivots are nonzero unless the column in question is already in triangular form.

In finite precision computation it is important that computed LU factors $\widehat L$ and $\widehat U$ are numerically stable in the sense that $\widehat L \widehat U = A + \Delta A$ with $\|\Delta A\|\le c_n u \|A\|$ , where $c_n$ is a constant and $u$ is the unit roundoff. For certain matrix properties, such as diagonal dominance by rows or columns, numerical stability is guaranteed, but in general it is necessary to incorporate row interchanges, or row or column interchanges, in order to obtain a stable factorization.

See What Is the Growth Factor for Gaussian Elimination? for details of pivoting strategies and see Randsvd Matrices with Large Growth Factors for some recent research on growth factors.

References

This is a minimal set of references, which contain further useful references within.

Jack J. Dongarra, Iain S. Duff, Danny C. Sorensen, and Henk A. Van der Vorst, Numerical Linear Algebra for High-Performance Computers, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 1998. (For different implementations of LU factorization.)
Nicholas J. Higham, Accuracy and Stability of Numerical Algorithms, second edition, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2002.

What’s New in MATLAB R2021a?

In this post I discuss some of the new features in MATLAB R2021a. As usual in this series, I focus on a few of the features most relevant to my interests. See the release notes for a detailed list of the many changes in MATLAB and its toolboxes.

Name=Value Syntax

In function calls that accept “name, value” pairs, separated by a comma, the values can now be specified with an equals sign. Example:

x = linspace(0,2*pi,100); y = tan(x);

% Existing syntax
plot(x,y,'Color','red','LineWidth',2)
plot(x,y,"Color","red","LineWidth",2)

% New syntax
plot(x,y,Color = "red",LineWidth = 2)
lw = 2; plot(x,y,Color = "red",LineWidth = lw)

Note that the string can be given as a character vector in single quotes or as a string array in double quotes (string arrays were introduced in R2016b).

There are some limitations, including that all name=value arguments must appear after any comma separated pairs and after any positional arguments (arguments that must be passed to a function in a specific order).

Eigensystem of Skew-Symmetric Matrix

For skew-symmetric and skew-Hermitian matrices, the eig function now guarantees that the matrix of eigenvectors is unitary (to machine precision) and that the computed eigenvalues are pure imaginary. The code

rng(2); n = 5; A = gallery('randsvd',n,-1e3,2); A = 1i*A; 
[V,D] = eig(A); 
unitary_test = norm(V'*V-eye(n),1)
norm_real_part = norm(real(D),1)

produces

% R2020b
unitary_test =
   9.6705e-01
norm_real_part =
   8.3267e-17

% R2021a
unitary_test =
   1.9498e-15
norm_real_part =
     0

For this matrix MATLAB R2020b produces an eigenvector matrix that is far from being unitary and eigenvalues with a nonzero (but tiny) real part, whereas MATLAB R2021a produces real eigenvalues and eigenvectors that are unitary to machine precision.

Performance Improvements

Among the reported performance improvements are faster matrix multiplication for large sparse matrices (based on the use of the GraphBLAS: see here and here) and faster solution of multiple right-hand systems with a sparse coefficient matrix, both resulting from added support for multithreading.

Symbolic Math Toolbox

An interesting addition to the Symbolic Math Toolbox is the symmatrix class, which represents a symbolic matrix. An example of usage is

>> A = symmatrix('A',[2 2]); B = symmatrix('B',[2 2]); whos A B
  Name      Size            Bytes  Class        Attributes

  A         2x2                 8  symmatrix              
  B         2x2                 8  symmatrix              

>> X = A*B, Y = symmatrix2sym(X), whos X Y
X =
A*B
Y =
[A1_1*B1_1 + A1_2*B2_1, A1_1*B1_2 + A1_2*B2_2]
[A2_1*B1_1 + A2_2*B2_1, A2_1*B1_2 + A2_2*B2_2]
  Name      Size            Bytes  Class        Attributes

  X         2x2                 8  symmatrix              
  Y         2x2                 8  sym

The range of functions that can be applied to a symmatrix is as follows:

>> methods symmatrix

Methods for class symmatrix:

adjoint         horzcat         mldivide        symmatrix       
cat             isempty         mpower          symmatrix2sym   
conj            isequal         mrdivide        tan             
cos             isequaln        mtimes          times           
ctranspose      kron            norm            trace           
det             latex           plus            transpose       
diff            ldivide         power           uminus          
disp            length          pretty          uplus           
display         log             rdivide         vertcat         
eq              matlabFunction  sin             
exp             minus           size            

Static methods:

empty

In order to invert A*B in this example, or find its eigenvalues, use inv(Y) or eig(Y).

Fifty “What Is” Articles

Last week I posted the fiftieth in my “What Is” series of articles. I began the series just over a year ago, in March 2020. The original aim was to provide “brief descriptions of important concepts in numerical analysis and related areas, with a focus on topics that arise in my research”, and the articles were meant to be short, widely accessible, and contain a minimum of mathematical symbols, equations, and citations. I have largely kept to these aims, though for some topics there is a lot to say and I have been more lengthy.

The articles are also available in PDF form on GitHub.

Below is a list of all the “What Is” articles published at the time of writing, in alphabetical order.

If there is a topic you would like me to cover, please put it in the comments below.

What is a Diagonally Dominant Matrix?

Matrices arising in applications often have diagonal elements that are large relative to the off-diagonal elements. In the context of a linear system this corresponds to relatively weak interactions between the different unknowns. We might expect a matrix with a large diagonal to be assured of certain properties, such as nonsingularity. However, to ensure nonsingularity it is not enough for each diagonal element to be the largest in its row. For example, the matrix

$\notag \left[\begin{array}{rrr} 3 & -1 & -2\\ -2 & 3 & -1\\ -2 & -1 & 3 \end{array}\right] \qquad (1)$

is singular because $[1~1~1]^T$ is a null vector. A useful definition of a matrix with large diagonal requires a stronger property.

A matrix $A\in\mathbb{C}^{n\times n}$ is diagonally dominant by rows if

$\notag |a_{ii}| \ge \displaystyle\sum_{j\ne i} |a_{ij}|, \quad i=1\colon n. \qquad (2)$

It is strictly diagonally dominant by rows if strict inequality holds in (2) for all $i$ . $A$ is (strictly) diagonally dominant by columns if $A^T$ is (strictly) diagonally dominant by rows.

Diagonal dominance on its own is not enough to ensure nonsingularity, as the matrix (1) shows. Strict diagonal dominance does imply nonsingularity, however.

Theorem 1.

If $A\in\mathbb{C}^{n\times n}$ is strictly diagonally dominant by rows or columns then it is nonsingular.

Proof. Since $A$ is nonsingular if and only if $A^T$ is nonsingular, it suffices to consider diagonal dominance by rows. For any nonzero $x$ let $y = Ax$ and choose $k$ so that $|x_k| = \|x\|_{\infty}$ . Then the $k$ th equation of $y = Ax$ can be written

$\notag a_{kk}x_k = y_k - \displaystyle\sum_{j\ne k} a_{kj}x_j,$

which gives

$\notag |a_{kk}|\|x\|_{\infty} = |a_{kk}||x_k| \le |y_k| + \displaystyle\sum_{j\ne k} |a_{kj}||x_j| \le |y_k| + \|x\|_\infty \displaystyle\sum_{j\ne k} |a_{kj}|.$

Using (2), we have

$\notag |y_k| \ge \|x\|_{\infty} \Bigl(|a_{kk}| - \displaystyle\sum_{j\ne k} |a_{kj}|\Bigr) > 0. \qquad (3)$

Therefore $y\ne0$ and so $A$ is nonsingular. $~\square$

Diagonal dominance plus two further conditions is enough to ensure nonsingularity. We need the notion of irreducibility. A matrix $A\in\mathbb{R}^{n\times n}$ is irreducible if there does not exist a permutation matrix $P$ such that

$\notag P^TAP = \begin{bmatrix} A_{11} & A_{12} \\ 0 & A_{22} \end{bmatrix}$

with $A_{11}$ and $A_{22}$ square matrices. Irreducibility is equivalent to the directed graph of $A$ being strongly connected.

Theorem 2.

If $A\in\mathbb{C}^{n\times n}$ is irreducible and diagonally dominant by rows with strict inequality in $(2)$ for some $i$ then it is nonsingular.

Proof. The proof is by contradiction. Suppose there exists $x\ne 0$ such that $Ax = 0$ . Define

$\notag G = \{\, j: |x_j| = \|x\|_{\infty} \,\}, \quad H = \{\, j: |x_j| < \|x\|_{\infty} \,\}.$

The $i$ th equation of $Ax = 0$ can be written

$\notag a_{ii}x_i = - \displaystyle\sum_{j\ne i} a_{ij}x_j = - \displaystyle\sum_{j\in G \atop j\ne i } a_{ij}x_j - \displaystyle\sum_{j\in H \atop j\ne i } a_{ij}x_j. \qquad (4)$

Hence for $i = r\in G$ ,

$\notag |a_{rr}| \le \displaystyle\sum_{j\in G \atop j\ne r } |a_{rj}| + \displaystyle\sum_{j\in H \atop j\ne r } |a_{rj}|\frac{|x_j|}{\|x\|_\infty}.$

The set $H$ is nonempty, because if it were empty then we would have $|x_j| = \|x\|_\infty$ for all $j$ and if there is strict inequality in $(2)$ for $i = m$ , then putting $i = m$ in (4) would give $|a_{mm}| \le \sum_{j\ne m} |a_{mj}| |x_j|/|x_m| = \sum_{j\ne m} |a_{mj}|$ , which is a contradiction. Hence as long as $a_{rj}\ne0$ for some $j\in H$ , we obtain $|a_{rr}| < \sum_{j\ne r } |a_{rj}|$ , which contradicts the diagonal dominance. Therefore we must have $a_{rj}= 0$ for all $j\in H$ and all $r\in G$ . This means that all the rows indexed by $G$ have zeros in the columns indexed by $H$ , which means that $A$ is reducible. This is a contradiction, so $A$ must be nonsingular. $~\square$

The obvious analogue of Theorem 2 holds for column diagonal dominance.

As an example, the $n\times n$ symmetric tridiagonal matrix (minus the second difference matrix)

$\notag T_n = \left[\begin{array}{@{\mskip 5mu}c*{4}{@{\mskip 15mu} r}@{\mskip 5mu}} 2 & -1 & & & \\ -1 & 2 & -1 & & \\ & -1 & 2 & \ddots & \\ & & \ddots & \ddots & -1\\ & & & -1 & 2 \end{array}\right], \qquad (5)$

is row diagonally dominant with strict inequality in the first and last diagonal dominance relations. It can also be shown to be irreducible and so it is nonsingular by Theorem 2. If we replace $t_{11}$ or $t_{nn}$ by $1$ , then $T$ remains nonsingular by the same argument. What if we replace both $t_{11}$ and $t_{nn}$ by $1$ ? We can answer this question by using an observation of Strang. If we define the rectangular matrix

$\notag L_n = \begin{bmatrix} 1 & & & \\ -1 & 1 & & \\ & -1 & \ddots & \\ & & \ddots & 1 \\ & & & -1 \end{bmatrix} \in\mathbb{R}^{(n+1)\times n}$

then $T_n = L_n^T L_n$ and

$\notag \widetilde{T}_{n+1} = \begin{bmatrix} 1 &-1 & & & \\ -1 & 2 & \ddots & & \\ & \ddots & \ddots & -1 & \\ & & -1 & 2 & -1\\ & & & -1 & 1 \end{bmatrix} = L_n L_n^T \in \mathbb{R}^{(n+1) \times (n+1)}.$

Since in general $AB$ and $BA$ have the same nonzero eigenvalues, we conclude that $\Lambda(\widetilde{T}_{n+1}) = \Lambda(T_n) \cup \{0\}$ , where $\Lambda(\cdot)$ denotes the spectrum. Hence $T_n$ is symmetric positive definite and $\widetilde{T}_n$ is singular and symmetric positive semidefinite.

Relation to Gershgorin’s Theorem

Theorem 1 can be used to obtain information about the location of the eigenvalues of a matrix. Indeed if $\lambda$ is an eigenvalue of $A$ then $A - \lambda I$ is singular and hence cannot be strictly diagonally dominant, by Theorem 1. So $|a_{ii}-\lambda| > \sum_{j\ne i} |a_{ij}|$ cannot be true for all $i$ . Gershgorin’s theorem is simply a restatement of this fact.

Theorem 3 (Gershgorin’s theorem).

The eigenvalues of $A\in\mathbb{C}^{n\times n}$ lie in the union of the $n$ discs in the complex plane

$\notag D_i = \Big\{ z\in\mathbb{C}: |z-a_{ii}| \le \displaystyle\sum_{j\ne i} |a_{ij}|\Big\}, \quad i=1\colon n.$

If $A$ is symmetric with positive diagonal elements and satisfies the conditions of Theorem 1 or Theorem 2 then it is positive definite. Indeed the eigenvalues are real and so in Gershgorin’s theorem the discs are intervals and $a_{ii} - z \le |z-a_{ii}| \le \sum_{j\ne i}^n |a_{ij}|$ , so $z \ge |a_{ii}| - \sum_{j\ne i}^n |a_{ij}| \ge 0$ , so the eigenvalues are nonnegative, and hence positive since nonzero. This provides another proof that the matrix $T_n$ in (5) is positive definite.

Generalized Diagonal Dominance

In some situations $A$ is not diagonally dominant but a row or column scaling of it is. For example, the matrix

$\notag A = \begin{bmatrix} 1 & 1 & 0 \\ 2/3 & 2 & 1/4 \\ 2/3 & 1/2 & 1 \end{bmatrix}$

is not diagonally dominant by rows or columns but

$\notag A \, \mathrm{diag}(3,2,4) = \begin{bmatrix} 3 & 2 & 0 \\ 2 & 4 & 1 \\ 2 & 1 & 4 \end{bmatrix}$

is strictly diagonally dominant by rows.

A matrix $A\in\mathbb{C}^{n\times n}$ is generalized diagonally dominant by rows if $AD$ is diagonally dominant by rows for some diagonal matrix $D = \mathrm{diag}(d_i)$ with $d_i > 0$ for all $i$ , that is, if

$\notag |a_{ii}|d_i \ge \displaystyle\sum_{j\ne i} |a_{ij}|d_j, \quad i=1\colon n. \qquad (6)$

It is easy to see that if $A$ is irreducible and there is strictly inequality in (6) for some $i$ then $A$ is nonsingular by Theorem 2.

It can be shown that $A$ is generalized diagonally dominant by rows if and only if it is an $H$ -matrix, where an $H$ -matrix is a matrix for which the comparison matrix $M(A)$ , defined by

$\notag M(A) = (m_{ij}), \quad m_{ij} = \begin{cases} |a_{ii}|, & i=j, \\ -|a_{ij}|, & i\ne j, \end{cases}$

is an $M$ -matrix (see What Is an M-Matrix?).

Block Diagonal Dominance

A matrix $A\in\mathbb{C}^{n\times n}$ is block diagonally dominant by rows if, for a given norm and block $m\times m$ partitioning $A = (A_{ij})$ , the diagonal blocks $A_{jj}$ are all nonsingular and

$\notag \displaystyle\sum_{j\ne i} \|A_{ij}\| \le \|A_{ii}^{-1}\|^{-1}, \quad i = 1\colon m. \label{bdd}$

$A$ is block diagonally dominant by columns if $A^T$ is block diagonally dominant by rows. If the blocks are all $1\times 1$ then block diagonal dominance reduces to the usual notion of diagonal dominance. Block diagonal dominance holds for certain block tridiagonal matrices arising in the discretization of PDEs.

Analogues of Theorems 1 and 2 giving conditions under which block diagonal dominance implies nonsingularity are given by Feingold and Varga (1962).

Bounding the Inverse

If a matrix is strictly diagonally dominant then we can bound its inverse in terms of the minimum amount of diagonal dominance. For full generality, we state the bound in terms of generalized diagonal dominance.

Theorem 4.

If $A\in\mathbb{C}^{n\times n}$ and $AD$ is strictly diagonally dominant by rows for a diagonal matrix $D = \mathrm{diag}(d_i)$ with $d_i > 0$ for all $i$ , then

$\notag \|A^{-1}\|_\infty \le \displaystyle\frac{\|D\|_{\infty}}{\alpha},$

where $\alpha = \min_i (|a_{ii}|d_i - \sum_{j\ne i} |a_{ij}|d_j)$ .

Proof. Assume first that $D = I$ . Let $y$ satisfy $\|A^{-1}\|_{\infty} = \|A^{-1}y\|_{\infty} / \|y\|_{\infty}$ and let $x = A^{-1}y$ . Applying (3) gives $\|A^{-1}\|_{\infty} = \|x\|_{\infty} / \|y\|_{\infty} \le \alpha^{-1}$ . The result is obtained on applying this bound to $AD$ and using $\|A^{-1}\|_{\infty} \le \|D\|_{\infty} \|(AD)^{-1}\|_{\infty}$ . $~\square$ .

Another bound for $A^{-1}$ when $A$ is strictly diagonally dominant by rows can be obtained by writing $A = D(I - E)$ , where $D = \mathrm{diag}(a_{ii})$ , $e_{ii} = 0$ , and $e_{ij} = -a_{ij}/a_{ii}$ for $i\ne j$ . It is easy to see that $\|E\|_\infty < 1$ , which gives another proof that $A$ is nonsingular. Then

$\notag \begin{aligned} |A^{-1}| &= |(I-E)^{-1}D^{-1}| = |I + E + E^2 + \cdots | |D^{-1}|\\ &\le (I + |E| + |E|^2 + \cdots ) |D|^{-1}\\ &= (I - |E|)^{-1} |D|^{-1}\\ &= M(A)^{-1}. \end{aligned}$

This bound implies that $M(A)^{-1} \ge 0$ , so in view of its sign pattern $M(A)$ is an $M$ -matrix, which essentially proves one direction of the $H$ -matrix equivalence in the previous section. The same bound holds if $A$ is diagonally dominant by columns, by writing $A = (I-E)D$ .

An upper bound also holds for block diagonal dominance.

Theorem 5.

If $A\in\mathbb{C}^{n\times n}$ is block diagonally dominant by rows then

$\notag \|A^{-1}\|_\infty \le \displaystyle\frac{1}{\alpha}.$

where $\alpha = \min_i ( \|A_{ii}^{-1}\|^{-1} - \sum_{j\ne i} \|A_{ij}\| )$ .

It is interesting to note that the inverse of a strictly row diagonally dominant matrix enjoys a form of diagonal dominance, namely that the largest element in each column is on the diagonal.

Theorem 6.

If $A\in\mathbb{C}^{n\times n}$ is strictly diagonally dominant by rows then $B = A^{-1}$ satisfies $|b_{ij}| < |b_{jj}|$ for all $i\ne j$ .

Proof. For $i\ne j$ we have $\sum_{k=1}^n a_{ik}b_{kj} = 0$ . Let $\beta_j = \max_k |b_{kj}|$ . Taking absolute values in $a_{ii}b_{ij} = -\sum_{k\ne i}a_{ik}b_{kj}$ gives

$\notag |a_{ii}||b_{ij}| \le \beta_j \sum_{k\ne i} |a_{ik}| < \beta_j |a_{ii}|,$

or $|b_{ij}| < \beta_j$ , since $a_{ii} \ne 0$ . This inequality holds for all $i\ne j$ , so we must have $\beta_j = |b_{jj}|$ , which gives the result.

Historical Remarks

Theorems 1 and 2 have a long history and have been rediscovered many times. Theorem 1 was first stated by Lévy (1881) with additional assumptions. In a short but influential paper, Taussky (1949) pointed out the recurring nature of the theorems and gave simple proofs (our proof of Theorem 2 is Taussky’s). Schneider (1977) attributes the surge in interest in matrix theory in the 1950s and 1960s to Taussky’s paper and a few others by her, Brauer, Ostrowski, and Wielandt. The history of Gershgorin’s theorem (published in 1931) is intertwined with that of Theorems 1 and 2; see Varga’s 2004 book for details.

Theorems 4 and 5 are from Varah (1975) and Theorem 6 is from Ostrowski (1952).

References

This is a minimal set of references, which contain further useful references within.

David G. Feingold and Richard S. Varga, Block Diagonally Dominant Matrices and Generalizations of the Gerschgorin Circle Theorem, Pacific J. Math. 12(4), 1241–1250, 1962.
A. M. Ostrowski, Note on Bounds for Determinants with Dominant Principal Diagonal, Proc. Amer. Math. Soc. 3, 260–30, 1952.
Hans Schneider, Olga Taussky-Todd’s Influence on Matrix Theory and Matrix Theorists: A Discursive Personal Tribute, Linear and Multilinear Algebra 5, 197–224, 1977.
Olga Taussky, A Recurring Theorem on Determinants, Amer. Math. Monthly 56(2), 672–676, 1949.
J. M. Varah, A Lower Bound for the Smallest Singular Value of a Matrix, Linear Algebra Appl. 11, 3–5, 1975.
Richard Varga, Geršgorin and His Circles, Springer-Verlag, Berlin, 2004.

General Matrices

Hermitian Positive Definite Matrices

Notes

References

Share this:

Condition Number Bounds

Experiment

Integer Factorization

Conclusions

References

Related Blog Posts

Share this:

Numerical Rank

QR Factorization

QR Factorization With Column Pivoting

QR Factorization with Other Pivoting Choices

UTV Decomposition

LU Factorization

Notes

References

Related Blog Posts

Share this:

Share this:

References

Related Blog Posts

Share this:

Applications

The Exchange Operator

Hyperbolic CS Decomposition

Numerical Stability

Notes

References

Related Blog Posts

Share this:

Relation with Gaussian Elimination

Computation

Schur Complements

Block Implementations

Rectangular Matrices

Block LU Factorization

Sensitivity

Pivoting and Numerical Stability

References

Related Blog Posts

Share this:

Name=Value Syntax

Eigensystem of Skew-Symmetric Matrix

Performance Improvements

Symbolic Math Toolbox

Share this:

Share this:

Relation to Gershgorin’s Theorem

Generalized Diagonal Dominance

Block Diagonal Dominance

Bounding the Inverse

Historical Remarks

References

Related Blog Posts

Share this: