What Is the Spectral Radius of a Matrix?

The spectral radius \rho(A) of a square matrix A\in\mathbb{C}^{n\times n} is the largest absolute value of any eigenvalue of A:

\notag \rho(A) = \max\{\, |\lambda|: \lambda~ \mbox{is an eigenvalue of}~ A\,\}.

For Hermitian matrices (or more generally normal matrices, those satisfying AA^* = A^*A) the spectral radius is just the 2-norm, \rho(A) = \|A\|_2. What follows is most interesting for nonnormal matrices.

Two classes of matrices for which the spectral radius is known are as follows.

  • Unitary matrices (U^*U=I): these have all their eigenvalues on the unit circle and so \rho(U) = 1.
  • Nilpotent matrices (A^p = 0 for some positive integer p): such matrices have only zero eigenvalues, so \rho(A) = 0, even though \|A\| can be arbitrarily large.

The spectral radius of A is not necessarily an eigenvalue, though it is if A is nonnegative (see below).

Bounds

The spectral radius is bounded above by any consistent matrix norm (one satisfying \|AB\| \le \|A\|\|B\| for all A and B), for example any matrix p-norm.

Theorem 1. For any A\in\mathbb{C}^{n\times n} and any consistent matrix norm,

\notag      \rho(A) \le \|A\|.

However, the spectral radius is not a norm and it can be zero when the p-norms are of order 1, as illustrated by the nilpotent matrix

\notag      A = \begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix}      \;\Rightarrow\; \rho(A) = 0, \quad       \|A\|_p = 1, \; 1 \le p \le \infty.

Limit of Norms of Powers

The spectral radius can be expressed as a limit of norms of matrix powers.

Theorem 2 (Gelfand). For A\in\mathbb{C}^{n\times n} and any matrix norm, \rho(A) = \lim_{k\to\infty}\|A^k\|^{1/k}.

The theorem implies that for large enough k, \|A^k\|^{1/k} is a good approximation to \rho(A). The use of this formula with repeated squaring of A has been analyzed by Friedland (1991) and Wilkinson (1965). However, squaring requires O(n^3) flops for a full matrix, so this approach is expensive.

Spectral Radius of a Product

Little can be said about the spectral radius of a product of two matrices. The example

\notag      A = \begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix}, \quad      B = \begin{bmatrix} 0 & 0 \\ 0 & 1 \end{bmatrix}, \quad      AB = 0

shows that we can have \rho(AB) = 0 when \rho(A) = \rho(B) = 1. On the other hand, for

\notag      A = \begin{bmatrix} 0 & 1 \\ 0 & 1 \end{bmatrix}, \quad      B = \begin{bmatrix} 0 & 0 \\ 1 & 1 \end{bmatrix}, \quad      AB = \begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix}

we have 1 = \rho(A)\rho(B) < \rho(AB) = 2.

Condition Number Lower Bound

For a nonsingular matrix A, by applying Theorem 1 to A and A^{-1} we obtain

\notag \begin{aligned}\notag   \kappa(A) &= \|A\| \|A^{-1}\| \ge \rho(A) \rho(A^{-1})\\             &= \frac{ \max_i |\lambda_i| }                     { \min_i |\lambda_i| }. \end{aligned}

This bound can be very weak for nonnormal matrices.

Power Boundedness

In many situations we wish to know whether the powers of a matrix converge to zero. The spectral radius gives a necessary and sufficient condition for convergence.

Theorem 3. For A\in\mathbb{C}^{n\times n}, A^k\to0 as k\to\infty if and only if \rho(A) < 1.

The proof of Theorem 3 is straightforward if A is diagonalizable and it can be done in general using the Jordan canonical form.

Computing the Spectral Radius

Suppose A has a dominant eigenvalue \lambda_1, that is, |\lambda_1| > |\lambda_i|, i=2\colon n. Then \rho(A) = |\lambda_1|. The dominant eigenvalue \lambda_1 can be computed by the power method.

In this pseudocode, norm(x) denotes the 2-norm of x.

Choose n-vector q_0 such that norm(q_0) = 1.
for k=1,2,...                                         
    z_k = A q_{k-1}            % Apply A.
    q_k = z_k / norm(z_k)      % Normalize.     
    mu_k = q_k^*Aq_k           % Rayleigh quotient.
end

The normalization is to avoid overflow and underflow and the |\mu_k| are approximations to \rho(A).

If q_0 has a nontrivial component in the direction of the eigenvector corresponding to the dominant eigenvalue then the power method converges linearly, with a constant that depends on the ratio of the spectral radius to the magnitude of the next largest eigenvalue in magnitude.

Here is an example where the power method converges quickly, thanks to the substantial ratio of 6.49 between the spectral radius and next largest eigenvalue in magnitude.

>> rng(1); A = rand(4); eig_abs = abs(eig(A)), q = rand(4,1); 
>> for k = 1:5, q = A*q; q = q/norm(q); mu = q'*A*q; 
>>              fprintf('%1.0f %7.4e\n',k,mu)
>> end
eig_abs =
   1.3567e+00
   2.0898e-01
   2.5642e-01
   1.9492e-01
1 1.4068e+00
2 1.3559e+00
3 1.3580e+00
4 1.3567e+00
5 1.3567e+00

Nonnegative Matrices

For real matrices with nonnegative elements, much more is known about the spectral radius. Perron–Frobenius theory says that if A\in\mathbb{R}^{n\times n} is nonnegative then \rho(A) is an eigenvalue of A and there is a nonnegative eigenvector x such that Ax = \rho(A)x. This is why if you generate a random matrix in MATLAB using the rand function, which produces random matrices with elements on (0,1), there is always an eigenvalue equal to the spectral radius:

>> rng(1);  A = rand(4); e = sort(eig(A))
e =
  -2.0898e-01
   1.9492e-01
   2.5642e-01
   1.3567e+00

If A is stochastic, that is, it is nonnegative and has unit row sums, then \rho(A) = 1. Indeed e = [1,1,\dots,1]^T is an eigenvector corresponding to the eigenvalue 1, so \rho(A) \ge 1, but \rho(A) \le \|A\|_{\infty} = 1, by Theorem 1.

References

  • Shmuel Friedland, Revisiting Matrix Squaring, Linear Algebra Appl., 154–156, 59-63, 1991.
  • J. H. Wilkinson, The Algebraic Eigenvalue Problem, Oxford University Press, 1965, pp.~615–617.

Related Blog Posts

What Is a Hessenberg Matrix?

An n\times n upper Hessenberg matrix H has the property that h_{ij} = 0 for i>j+1. For n=4, the structure is

\notag     H = \begin{bmatrix}   \times & \times & \times & \times \\   \times & \times & \times & \times \\    0     & \times & \times & \times \\     0    & 0      & \times & \times \end{bmatrix}.

H is an upper triangular matrix with an extra subdiagonal.

A lower Hessenberg matrix H is the transpose of an upper Hessenberg matrix. In the rest of this article, the Hessenberg matrices are upper Hessenberg.

Hessenberg matrices play a key role in the QR algorithm for computing the eigenvalues of a general matrix. The first step of the algorithm is to reduce A to Hessenberg form by unitary Householder transformations and then carry out the QR iteration on the Hessenberg form.

Because H is so close to being triangular, its LU factorization and QR factorization can both be computed in O(n^2) flops. For example, the first stage of LU factorization needs to eliminate just one element, by adding a multiple of row 1 to row 2 (assuming h_{11}\ne 0):

\notag     H = \begin{bmatrix}   \times & \times & \times & \times \\   \times & \times & \times & \times \\    0     & \times & \times & \times \\     0    & 0      & \times & \times \end{bmatrix} \quad \to \quad \begin{bmatrix}   \times & \times & \times & \times \\   \fbox{0} & \times & \times & \times \\    0     & \times & \times & \times \\     0    & 0      & \times & \times \end{bmatrix}.

Partial pivoting can be used and adds no extra flops. For QR factorization, Givens rotations are used and just two rows need to be rotated on each step.

If the subdiagonal is nonzero, that is, h_{i+1,i}\ne 0 for all i, then H is said to be unreduced. In this case, \mathrm{rank}(H - \lambda I ) \ge n-1 for any \lambda, which means that there is one linearly independent eigenvector associated with each eigenvalue of H (equivalently, no eigenvalue appears in more than one Jordan block in the Jordan canonical form of H).

Unitary Hessenberg Matrices

A unitary Hessenberg matrix H with positive subdiagonal elements can be represented in terms of 2n-1 real parameters (the Schur parametrization), which are essentially the angles in the Givens QR factorization (note that in the QR factorization H = QR, R is triangular and unitary and hence diagonal). The MATLAB function gallery('randhess',n) generates random real orthogonal Hessenberg matrices by choosing the Schur parameters randomly:

>> n = 4; rng(1), H = gallery('randhess',n)
H =
  -8.6714e-01  -9.2330e-02  -4.8943e-01   3.5172e-04
  -4.9807e-01   1.6075e-01   8.5211e-01  -6.1236e-04
            0   9.8267e-01  -1.8538e-01   1.3322e-04
            0            0  -7.1864e-04  -1.0000e+00
>> check_unitary = norm(H'*H-eye(n))
check_unitary =
   1.1757e-16

Examples

A famous Hessenberg matrix with some interesting properties is the Frank Matrix. For example, the eigenvalues appear in reciprocal pairs (\lambda,1/\lambda). For n = 5:

>> F = gallery('frank',5)
F =
     5     4     3     2     1
     4     4     3     2     1
     0     3     3     2     1
     0     0     2     2     1
     0     0     0     1     1

>> e = eig(F); sort([e 1./e])
ans =
   9.9375e-02   9.9375e-02
   2.8117e-01   2.8117e-01
   1.0000e+00   1.0000e+00
   3.5566e+00   3.5566e+00
   1.0063e+01   1.0063e+01

Another well-known Hessenberg matrix is the Grcar matrix, a Toeplitz matrix of the form

>> G = gallery('grcar',5)
G =
     1     1     1     1     0
    -1     1     1     1     1
     0    -1     1     1     1
     0     0    -1     1     1
     0     0     0    -1     1

It has interesting pseudospectra.

Companion matrices are upper Hessenberg with a unit subdiagonal:

\notag   C = \begin{bmatrix} a_{n-1} & a_{n-2} & \dots  &\dots & a_0 \\                 1       & 0       & \dots  &\dots &  0 \\                 0       & 1       & \ddots &      &  \vdots \\                 \vdots  &         & \ddots & 0    &  0 \\                 0       &  \dots  & \dots  & 1    &  0           \end{bmatrix}.

References

  • W. B. Gragg, The QR Algorithm for Unitary Hessenberg Matrices, J. Comp. Appl. Math., 16 (1986), pp. 1-8.
  • Lloyd Trefethen and Mark Embree, Spectra and Pseudospectra: The Behavior of Nonnormal Matrices and Operators, Princeton University Press, Princeton, NJ, USA, 2005. For the Grcar matrix.

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/index-of-what-is-articles/ and in PDF form from the GitHub repository https://github.com/higham/what-is.

What Is an Invariant Subspace?

A subspace S of \mathbb{C}^n is an invariant subspace for A\in\mathbb{C}^{n \times n} if AS \subseteq S, that is, if x\in S implies Ax \in S.

Here are some examples of invariant subspaces.

  • \{0\} and \mathbb{C}^n are trivially invariant subspaces of any A.
  • The null space \mathrm{null}(A) = \{ x: Ax = 0\} is an invariant subspace of A because x\in\mathrm{null}(A) implies Ax =   0\in\mathrm{null}(A).
  • If x is an eigenvector of A then \mathrm{span}(x) = \{\, \alpha x:   \alpha\in\mathbb{C} \,\} is a 1-dimensional invariant subspace, since A\alpha x = \lambda \alpha x \in S, where \lambda is the eigenvalue corresponding to x.
  • The matrix

    \notag   \begin{bmatrix}      1 & 1 & 1 \\      0 & 1 & 1 \\      0 & 0 & 1   \end{bmatrix}

    has a one-dimensional invariant subspace \mathrm{span}(e_1) and a two-dimensional invariant subspace \mathrm{span}(e_1, e_2), where e_i denotes the ith column of the identity matrix.

Let x_1,x_2, \dots,x_p\in\mathbb{C}^n be linearly independent vectors. Then S = \mathrm{span}(x_1,x_2, \dots,x_p) is an invariant subspace of A if and only if Ax_i\in S for i=1\colon p. Writing X = [x_1,x_2,\dots,x_p]\in\mathbb{C}^{n \times p}, this condition can be expressed as

\notag        AX = XB, \qquad(1)

for some B\in\mathbb{C}^{p \times p}.

If p = n in (1) then AX = XB with X square and nonsingular, so X^{-1}AX = B, that is, A and B are similar.

Eigenvalue Relations

We denote by \Lambda(A) the spectrum (set of eigenvalues) of A and by A^+ the pseudoinverse of A.

Theorem.

Let A\in\mathbb{C}^{n\times n} and suppose that (1) holds for some full-rank X\in\mathbb{C}^{n\times p} and B\in\mathbb{C}^{p\times p}. Then \Lambda(B)\subseteq\Lambda(A). Furthermore, if (\lambda,x) is an eigenpair of A with x\in\mathrm{range}(X) then (\lambda,X^+x) is an eigenpair of B.

Proof. If (\lambda,z) is an eigenpair of B then AXz = XBz = \lambda Xz, and X z\ne 0 since the columns of X are independent, so (\lambda,Xz) is an eigenpair of A.

If (\lambda,x) is an eigenpair of A with x\in\mathrm{range}(X) then x = Xz for some z\ne0, and z = X^+x, since X being full rank implies that X^+X = I. Hence

\notag      \lambda x = Ax = AXz = XBz.

Multiplying on the left by X^+ gives \lambda z = Bz, so (\lambda,z) is an eigenpair of B.

Block Triangularization

Assuming that X in (1) has full rank p we can choose Y\in\mathbb{C}^{p \times (n-p)} so that W = [X,\,Y] is nonsingular. Let W^{-1} = \left[\begin{smallmatrix} G \\ H                \end{smallmatrix}\right] and note that W^{-1}W = I implies GX = I and HX = 0. We have

\notag  W^{-1}AW =  \begin{bmatrix}    G \\H  \end{bmatrix} [AX,\, AY]  = \begin{bmatrix}    G \\H  \end{bmatrix}   [XB,\, AY]    =  \begin{bmatrix}    B & GAY\\    0 & HAY  \end{bmatrix}, \qquad (2)

which is block upper triangular. This construction is used in the proof of the Schur decomposition with p=1, x an eigenvector of unit 2-norm, and W chosen to be unitary.

The Schur Decomposition

Suppose A\in\mathbb{C}^{n \times n} has the Schur decomposition Q^*AQ = R, where Q is unitary and R is upper triangular. Then AQ = QR and writing Q = [Q_1,\,Q_2], where Q_1 is n\times p, and

\notag   R =   \begin{bmatrix}   R_{11} & R_{12} \\       0  & R_{22} \\   \end{bmatrix},

where R_{11} is p\times p, we have A Q_1 = Q_1 R_{11}. Hence Q_1 is an invariant subspace of A corresponding to the eigenvalues of A that appear on the diagonal of R_{11}. Since p can take any value from 1 to n, the Schur decomposition provides a nested sequence of invariant subspaces of A.

Notes and References

Many books on numerical linear algebra or matrix analysis contain material on invariant subspaces, for example

  • David S. Watkins. Fundamentals of Matrix Computations Third edition, Wiley, New York, USA, 2010.

The ultimate reference is perhaps the book by Gohberg, Lancaster, and Rodman, which has an accessible introduction but is mostly at the graduate textbook or research monograph level.

  • Israel Gohberg, Peter Lancaster, and Leiba Rodman, Invariant Subspaces of Matrices with Applications, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2006 (unabridged republication of book first published by Wiley in 1986).

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/index-of-what-is-articles/ and in PDF form from the GitHub repository https://github.com/higham/what-is.

What Is a Submatrix?

A submatrix of a matrix is another matrix obtained by forming the intersection of certain rows and columns, or equivalently by deleting certain rows and columns. More precisely, let A be an m\times n matrix and let 1\le i_1 < i_2 < \cdots < i_p \le m and 1\le j_1 < j_2 < \cdots < j_q \le n. Then the p\times q matrix B with b_{rs} = a_{i_rj_s} is the submatrix of A comprising the elements at the intersection of the rows indexed by i_1,\dots,i_p and the columns indexed by j_1,\dots,j_q. For example, for the matrix

\notag     \begin{bmatrix} ~\fbox{1} & \fbox{2} & 3\mskip2mu \\                        ~~4 & 5 & 6\mskip2mu~ \\                        ~\fbox{7} & \fbox{8} & 9~\mskip2mu        \end{bmatrix} =     \begin{bmatrix} ~\fbox{1} & \fbox{2} & 3\mskip2mu~ \\[1pt]                        ~\fbox{4} & 5 & 6\mskip2mu~ \\                        7 & \fbox{8} & 9\mskip2mu        \end{bmatrix},

shown with four elements highlighted in two different ways,

\notag        \begin{bmatrix} 1 & 2 \\                        7 & 8 \\        \end{bmatrix}

is a submatrix (the intersection of rows 1 and 3 and columns 1 and 2, or what is left after deleting row 2 and column 3), but

\notag        \begin{bmatrix} 1 & 2 \\                        4 & 8 \\        \end{bmatrix}

is \emph{not} a submatrix.

Submatrices include the mn matrix elements and the matrix itself, but there are many of intermediate size: an m\times n matrix has (2^m-1)(2^n-1) submatrices in total (counting both square and nonsquare submatrices).

If p = q and i_k = j_k, k = 1\colon p, then B is a principal submatrix of A, which is a submatrix symmetrically located about the diagonal. If, in addition, i_k = k, k=1\colon p, then B is a leading principal submatrix of A, which is one situated in the top left corner of A.

The determinant of a square submatrix is called a minor. The Laplace expansion of the determinant expresses the determinant as a weighted sum of minors.

The Colon Notation

In various programming languages, notably MATLAB, and in numerical linear algebra, a colon notation is used to denote submatrices consisting of contiguous rows and columns.

For integers p and q we denote by p\colon q the sequence p,p+1,\dots,q. Thus i = 1\colon n is another way of writing i = 1,2,\dots,n.

We write A(p\colon q,r \colon s) for the submatrix of A comprising the intersection of rows p to q and columns r to s, that is,

A(p\colon q,r \colon s) = \begin{bmatrix}a_{pr} & \cdots & a_{ps} \\                             \vdots                & \ddots & \vdots\cr a_{qr} & \cdots & a_{qs}                              \end{bmatrix}.

We can think of A(p\colon q,r \colon s) as a projection of A using the corresponding rows and columns of the identity matrix:

\notag    A(p\colon q,r \colon s) = I(p\colon q,:) \,A \,I(:,r\colon s).

As special cases, A(k,\colon) denotes the kth row of A and A(\colon,k) the kth column of A.

Here are some examples of using the colon notation to extract submatrices in MATLAB. Rows and columns can be indexed by a range using the colon notation or by specifying the required indices in a vector. The matrix used is from the Anymatrix collection.

>> A = anymatrix('core/beta',5)
A =
     1     2     3     4     5
     2     6    12    20    30
     3    12    30    60   105
     4    20    60   140   280
     5    30   105   280   630

>> A(3:4, [2 4 5]) % Rectangular submatrix.
ans =
    12    60   105
    20   140   280

>> A(1:2,4:5)      % Square, but nonprincipal, submatrix.
ans =
     4     5
    20    30

>> A([3 5],[3 5])  % Principal submatrix.
ans =
    30   105
   105   630

Block Matrices

Submatrices are intimately associated with block matrices, which are matrices in which the elements are themselves matrices. For example, a 4\times 4 matrix A can be regarded as a block 2\times 2 matrix, where each element is a 2\times 2 submatrix of A:

\notag   A = \left[\begin{array}{cc|cc}         a_{11} & a_{12} & a_{13} & a_{14}\\         a_{21} & a_{22} & a_{23} & a_{24}\\\hline         a_{31} & a_{32} & a_{33} & a_{34}\\         a_{41} & a_{42} & a_{43} & a_{44}\\         \end{array}\right]    =  \begin{bmatrix}         A_{11} & A_{12} \\         A_{21} & A_{22}        \end{bmatrix},

where

\notag  A_{11} =   A(1\colon2,1\colon2) = \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix},

and likewise for the other three blocks.

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/index-of-what-is-articles/ and in PDF form from the GitHub repository https://github.com/higham/what-is.

What Is a Flop?

A flop is one of the elementary arithmetic operations +, -, *, / carried out on floating-point numbers. For example, evaluating the expression (b - ax)/c takes three flops. A square root, which occurs infrequently in numerical computation, is also counted as one flop.

As an example, the computation of the inner product x^Ty of two n-vectors x and y can be written

s = x(1)*y(1)
for i = 2:n
    s = s + x(i)*y(i)
end

which requires n multiplications and n-1 additions, or 2n-1 flops. A matrix multiplication AB of two n\times n matrices involves computing the inner products of every row of A with every column of B, that is n^2 inner products, costing 2n^3 - n^2 flops. As we are usually interested in flop counts for large dimensions we retain only the highest order terms, so we regard AB as costing 2n^3 flops.

The number of flops required by a numerical computation is one measure of its cost. However, it ignores the cost of data movement, which can be as large as, or even larger, than the cost of floating-point arithmetic for large computations implemented on modern machines with hierarchical memories. Nevertheless, flop counts provide a useful way to compare the cost of competing algorithms when most of the flops occur in similar types of operations, such as matrix multiplications, and they can be used to choose between algorithm variants (López et al., 2022).

This table summarizes some flop counts, where \alpha is a scalar, x,y are n-vectors, and A,B are n\times n matrices, and A^{-1} is computed via LU factorization.

Computation Cost
x + \alpha y 2n flops
x^Ty 2n flops
Ax 2n^2 flops
AB 2n^3 flops
A^{-1} 2n^3 flops

As the table suggests, most standard problems involving n\times n matrices can be solved with a cost of order n^3 flops or less, so the interest is in the exponent (1, 2, or 3) of the dominant term and the multiplicative constant. For m\times n matrices there can be several potentially dominant terms m^kn^\ell and comparing competing algorithms is more complicated.

Early Usage

Moler and Van Loan give a different definition of “flop” in their classic paper “Nineteen Dubious Ways to Compute the Exponential of a Matrix” (1978), and their definition was adopted in the flop command in the original Fortran version of MATLAB, which counts the number of flops executed in the most recent command or since MATLAB started. In the 1981 MATLAB Users’ Guide, Cleve Moler explains that

One flop is one execution of a Fortran statement like S = S + X(I)*Y(I) or Y(I) = Y(I) + T*X(I). In other words, it consists of one floating point multiplication, together with one floating point addition and the associated indexing and storage reference operations.

This original definition attempted to account for indexing and storage costs in the computer implementation of an algorithm as well as the cost of the floating-point arithmetic. It was motivated by the fact that in linear algebra computations most arithmetic appears in statements of the form s = s + x*y, which appear in evaluating inner products and in adding one multiple of a vector to another.

In the LINPACK Users’ Guide (1979, App. B) flops were used to normalize timing data, but the word “flop” was not used.

The widespread adoption of the flop was ensured by its use in Golub and Van Loan’s book Matrix Computations (1981). In the 1989 second edition of the book a flop was redefined as in our first paragraph and this definition quickly became the standard usage. The Fortran statements given by Moler now involve two flops.

The MATLAB flop function survived until around 2000, when MATLAB started using LAPACK in place of LINPACK for its core linear algebra computations and it became no longer feasible to keep track of the number of flops executed.

It is interesting to note that an operation of the form S + X(I)*Y(I) that was used in the original definition of flop is now supported in the hardware of certain processors. The operation is called a fused multiply-add and is done in the same time as one multiplication or addition and with one rounding error.

Flops as a Measure of Computer Performance

Another usage of “flop”, in the plural, is in measuring computer performance, with “flops” meaning floating-point operations per second and having a prefix specifying a power of ten. Thus we have the terms megaflops, gigaflops, through to exaflops (10^{18} floating-point operations per second) for today’s fastest computers.

The earliest citation given in the Oxford English Dictionary for this use of the word “flops” is in a 1977 paper by Calahan, who writes

The most common vector performance measure is the number of floating point operations per second (FLOPS), obtained by dividing the number of floating point operations—known from the algorithmic complexity—by the computation time.

The term “MFLOPS” for “million floating point operations per second” is used in an earlier Cray-1 evaluation report (Keller, 1976).

If you are aware of any earlier references please put them in the comments below.

Dictionary Definitions

The word “flop” appears in general dictionaries, but there is some variation in whether it appears under “flop” or “flops”, in capitalization, and whether it means an operation or an execution rate.

Dictionary Term Definition
Oxford English Dictionary (online, 2023) FLOP “A floating-point operation per second”
Concise Oxford English Dictionary (11th ed., 2004) flop “Floating-point operations per second”
The Chambers Dictionary (13th ed., 2014) flop “A floating-point operation”
Collins English Dictionary (13th ed., 2018) flops or FLOPS “Floating-point operations per second”
The American Heritage Dictionary of the English Language (5th ed., 2018) flops or flop “A measure of the speed of a computer in operations per second, especially operations involving floating-point numbers”
Merriam-Webster (online, 2023) flops “A unit of measure for calculating the speed of a computer equal to one floating-point operation per second”

Notes and References

I thank Jack Dongarra, Cleve Moler, and Charlie Van Loan for comments on the history of flops.

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/index-of-what-is-articles/ and in PDF form from the GitHub repository https://github.com/higham/what-is.

What Is the Pseudoinverse of a Matrix?

The pseudoinverse is an extension of the concept of the inverse of a nonsingular square matrix to singular matrices and rectangular matrices. It is one of many generalized inverses, but the one most useful in practice as it has a number of special properties.

The pseudoinverse of a matrix A\in\mathbb{C}^{m\times n} is an n \times m matrix X that satisfies the Moore–Penrose conditions

\notag \begin{array}{rcrc}  \mathrm{(1)}   &    AXA = A,  \; & \mathrm{(2)} & XAX=X,\\  \mathrm{(3)} & (AX)^* = AX, \; & \mathrm{(4)} & (XA)^* = XA. \end{array}

Here, the superscript * denotes the conjugate transpose. It can be shown that there is a unique X satisfying these equations. The pseudoinverse is denoted by A^+; some authors write A^\dagger.

The most important property of the pseudoinverse is that for any system of linear equations Ax = b (overdetermined or underdetermined) x = A^+b minimizes \|Ax - b\|_2 and has the minimum 2-norm over all minimizers. In other words, the pseudoinverse provides the minimum 2-norm least squares solution to Ax = b.

The pseudoinverse can be expressed in terms of the singular value decomposition (SVD). If A = U\Sigma V^* is an SVD, where the m\times m matrix U and n\times n matrix V are unitary and \Sigma = \mathrm{diag}(\sigma_1,\dots , \sigma_p) with \sigma_1 \ge \sigma_2 \ge \cdots \ge \sigma_r > \sigma_{r+1} = \cdots =\sigma_p = 0 (so that \mathrm{rank}(A) = r) with p = \min(m,n), then

\notag    A^+ = V\mathrm{diag}(\sigma_1^{-1},\dots,\sigma_r^{-1},0,\dots,0)U^*,    \qquad (1)

where the diagonal matrix is n\times m. This formula gives an easy way to prove many identities satisfied by the pseudoinverse. In MATLAB, the function pinv computes A^+ using this formula.

From the Moore–Penrose conditions or (1) it can be shown that (A^+)^+ = A and (A^*)^+ = (A^+)^*.

For full rank A we have the concise formulas

\notag     A^+ =     \begin{cases}     (A^*A) ^{-1}A^*, & \textrm{if}~\mathrm{rank}(A) = n \le m, \\     A^*(AA^*)^{-1},  & \textrm{if}~ \mathrm{rank}(A) = m \le n.     \end{cases}    \qquad (2)

Consequently,

\notag   A^+A = I_n ~~\mathrm{if}~ \mathrm{rank}(A) = n, \qquad   AA^+ = I_m ~~\mathrm{if}~ \mathrm{rank}(A) = m.

Some special cases are worth noting.

  • The pseudoinverse of a zero m\times n matrix is the zero n\times m matrix.
  • The pseudoinverse of a nonzero vector x\in\mathbb{C}^n is x^*/(x^*x).
  • For x,y\in\mathbb{C}^n, (xy^*)^+ = (y^*)^+ x^+ and if x and y are nonzero then (xy^*)^+ = yx^*/ (x^*x\cdot y^*y).
  • The pseudoinverse of a Jordan block with eigenvalue 0 is the transpose:

\notag       \begin{bmatrix}         0 & 1 & 0 \\         0 & 0 & 1 \\         0 & 0 & 0 \\       \end{bmatrix}^+    =       \begin{bmatrix}         0 & 0 & 0 \\         1 & 0 & 0 \\         0 & 1 & 0         \end{bmatrix}.

The pseudoinverse differs from the usual inverse in various respects. For example, the pseudoinverse of a triangular matrix is not necessarily triangular (here we are using MATLAB with the Symbolic Math Toolbox):

>> A = sym([1 1 1; 0 0 1; 0 0 1]), X = pinv(A)
A =
[1, 1, 1]
[0, 0, 1]
[0, 0, 1]
X =
[1/2, -1/4, -1/4]
[1/2, -1/4, -1/4]
[  0,  1/2,  1/2]

Furthermore, it is not generally true that (AB)^+ = B^+A^+ for A\in\mathbb{C}^{m\times n} and B\in\mathbb{C}^{n\times p}. A sufficient condition for this equality to hold is that \mathrm{rank}(A) = \mathrm{rank}(B) = n.

It is not usually necessary to compute the pseudoinverse, but if it is required it can be obtained using (1) or (2) or from the Newton–Schulz iteration

\notag       X_{k+1} = 2X_k - X_kAX_k, \quad X_0 = \alpha A^*,

for which X_k \to A^+ as k\to\infty if 0 < \alpha < 2/\|A\|_2^2. The convergence is at an asymptotically quadratic rate. However, about 2\log_2\kappa_2(A) iterations are required to reach the asymptotic phase, where \kappa_2(A) = \|A\|_2 \|A^+\|_2, so the iteration is slow to converge when A is ill conditioned.

Notes and References

The pseudoinverse was first introduced by Eliakim Moore in 1920 and was independently discovered by Roger Penrose in 1955. For more on the pseudoinverse see, for example, Ben-Israel and Greville (2003) or Campbell and Meyer (2009). For analysis of the Newton–Schulz iteration see Pan and Schreiber (1991).

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

What Is the Numerical Range of a Matrix?

The numerical range of a matrix A\in\mathbb{C}^{n\times n}, also known as the field of values, is the set of complex numbers

W(A) = \biggl\{\, \displaystyle\frac{z^*Az}{z^*z}: 0\ne z\in\mathbb{C}^n \,\biggr\}.

The set W(A) is compact and convex (a nontrivial property proved by Toeplitz and Hausdorff), and it contains all the eigenvalues of A. For normal matrices it is the convex hull of the eigenvalues. For a Hermitian matrix, W(A) is a segment of the real axis, while for a skew-Hermitian matrix it is a segment of the imaginary axis.

The following figure plots in blue the boundaries of the the numerical ranges of nine 16\times 16 matrices, with the eigenvalues shown as black dots. They were plotted using the function fv in the Matrix Computation Toolbox. The matrices are from Anymatrix.

fov_plots.jpg

A method for computing the boundary of the numerical range is based on the observation that for any z\in\mathbb{C},

\notag  z^*Az = \underbrace{z^* \frac{1}{2}(A+A^*) z}_{\mathrm{\phantom{1234}real\phantom{1234}}}         + \underbrace{z^* \frac{1}{2}(A-A^*) z}_{\mathrm{pure~imaginary}}        \qquad (1)

This implies that the real part of z^*Az/z^*z lies between the largest and smallest eigenvalues of the Hermitian matrix (A+A^*)/2, which define a vertical strip in which the numerical range lies. Since W(\mathrm{e}^{\mathrm{i}\theta} A) = \mathrm{e}^{\mathrm{i}\theta} W(A), we can apply the same reasoning to the rotated matrix A(\theta) = \mathrm{e}^{\mathrm{i}\theta} A, and taking a range of \theta\in[0,\pi] we obtain an approximation the boundary of the numerical range.

The quantity

\notag  \omega(A) = \max \bigl\{\, \mathrm{Re}\, w : w \in W(A) \,\bigr\}

is called the numerical abscissa, and by (1), \omega(A) is the largest eigenvalue of the Hermitian matrix (A+A^*)/2. The numerical abscissa determines the rate of growth of \|\mathrm{e}^{tA}\| for small positive t.

Associated with the numerical range is the numerical radius

\notag      r(A) = \max\{\, |w| : w \in W(A) \,\}.

Note that r(A^*) = r(A). Also, r(A) \ge \rho(A), where \rho is the spectral radius (the largest absolute value of any eigenvalue), since W(A) contains the eigenvalues of A.

The numerical radius differs by at most a factor 2 from the 2-norm:

\notag       \frac{1}{2} \|A\|_2 \le r(A) \le \|A\|_2. \qquad (2)

When A is normal, r(A) = \|A\|_2.

The numerical radius is a matrix norm, but not a consistent norm (that is, r(AB) \le r(A)r(B) does not hold in general). However, it is it true that

\notag     r(A^k) \le r(A)^k, \quad k\ge 1.

Combining this with with the lower bound in (2) gives

\notag  \|A^k\|_2 \le 2 r(A)^k, \quad k \ge 1,

so if we know r(A) then we can bound \|A^k\|_2 for all k.

The numerical radius can be characterized as the solution of an optimization problem over the Hermitian matrices:

\notag      r(A) = - \max\biggl\{\, \lambda_{\min}\biggl(     \begin{bmatrix} Z\!\! & A \\ A^*\!\! & -Z \end{bmatrix} \biggr):                Z=Z^*\in\mathbb{C}^{n\times n} \,\biggr\}.

Notes and References

For proofs of the results given here see Horn and Johnson (2013) or Horn and Johnson (1991), and see the latter for details of the algorithm for computing the numerical range. See Benzi (2020) for a discussion of applications of the numerical range in numerical analysis.

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

What Is the Schur Complement of a Matrix?

For an n\times n matrix

\notag   A =   \begin{bmatrix} A_{11} & A_{12} \\                   A_{21} & A_{22}   \end{bmatrix} \qquad (1)

with nonsingular (1,1) block A_{11} the Schur complement is A_{22} - A_{21}A_{11}^{-1}A_{12}. It is denoted by A/A_{11}. The block with respect to which the Schur complement is taken need not be the (1,1) block. Assuming that A_{22} is nonsingular, the Schur complement of A_{22} in A is A_{11} - A_{12}A_{22}^{-1}A_{21}.

More generally, for an index vector \alpha = [i_1,i_2,\dots,i_k], where 1\le i_1 < i_2 < \cdots < i_p \le n, the Schur complement of A(\alpha,\alpha) in A is A(\alpha^c,\alpha^c) - A(\alpha^c,\alpha) A(\alpha,\alpha)^{-1} A(\alpha,\alpha^c), where \alpha^c is the complement of \alpha (the vector of indices not in \alpha). Most often, it is the Schur complement of A_{11} in A that is of interest, and the general case can be reduced to it by row and column permutations that move A(\alpha,\alpha) to the (1,1) block of A.

Schur Complements in Gaussian Elimination

The reduced submatrices in Gaussian elimination are Schur complements. Write an n\times n matrix A with nonzero (1,1) element \alpha as

\notag   A =   \begin{bmatrix} \alpha & a^*   \\                    b     & C   \end{bmatrix}    =   \begin{bmatrix}  1            & 0      \\                    b/\alpha     & I_{n-1}   \end{bmatrix}   \begin{bmatrix}  \alpha       & a^*    \\                     0           & C - ba^*/\alpha   \end{bmatrix}.

This factorization represents the first step of Gaussian elimination.

After k stages of Gaussian elimination we have computed the following factorization, in which the (1,1) blocks are k\times k:

\notag   \begin{bmatrix} L_{11} & 0      \\                   L_{21} & I   \end{bmatrix}   \begin{bmatrix} A_{11} & A_{12} \\                   A_{21} & A_{22}   \end{bmatrix}  =   \begin{bmatrix} U_{11} & U_{12}      \\                      0   & S   \end{bmatrix},

where the first matrix on the left is the product of the elementary transformations that reduce the first k columns to upper triangular form. Equating (2,1) and (2,2) blocks in this equation gives L_{21}A_{11} + A_{21} = 0 and L_{21}A_{12} + A_{22} = S. Hence S= A_{22} + L_{21}A_{12}                       = A_{22} + (-A_{21}A_{11}^{-1})A_{12}, which is the Schur complement of A_{11} in A.

The next step of the elimination, which zeros out the first column of S below the diagonal, succeeds as long as the (1,1) element of S is nonzero (or if the whole first column of S is zero, in which case there is nothing to do, and A is singular in this case).

For a number of matrix structures, such as

  • Hermitian (or real symmetric) positive definite matrices,
  • totally positive matrices,
  • matrices diagonally dominant by rows or columns,
  • M-matrices,

one can show that the Schur complement inherits the structure. For these four structures the (1,1) element of the matrix is nonzero, so the success of Gaussian elimination (or equivalently, the existence of an LU factorization) is guaranteed. In the first three cases the preservation of structure is also the basis for a proof of the numerical stability of Gaussian elimination.

Inverse of a Block Matrix

The Schur complement arises in formulas for the inverse of a block 2\times 2 matrix. If A_{11} is nonsingular, we can write

\notag   A = \begin{bmatrix}A_{11} & A_{12} \\ A_{21} & A_{22} \end{bmatrix}     = \begin{bmatrix}I & 0 \\ A_{21}A_{11}^{-1} & I \end{bmatrix}       \begin{bmatrix}A_{11} & A_{12} \\ 0 & S \end{bmatrix}, \quad                       S = A_{22} - A_{21}A_{11}^{-1}A_{12}.  \qquad (2)

If S is nonsingular then inverting gives

\notag \begin{aligned}   A^{-1} &=       \begin{bmatrix}A_{11}^{-1} & -A_{11}^{-1}A_{12}S^{-1} \\                     0 & S^{-1}                     \end{bmatrix}                         \begin{bmatrix}I & 0 \\ -A_{21}A_{11}^{-1} & I                           \end{bmatrix} \notag\\         &=       \begin{bmatrix}A_{11}^{-1} + A_{11}^{-1}A_{12}S^{-1}A_{21}A_{11}^{-1} & -A_{11}^{-1}A_{12}S^{-1} \\                -S^{-1}A_{21}A_{11}^{-1} & S^{-1}       \end{bmatrix}. \end{aligned}

So S^{-1} is the (2,2) block of A^{-1}.

One can obtain an analogous formula for A^{-1} in terms of the Schur complement of A_{22} in A, in which the inverse of the Schur complement is the (1,1) block of A^{-1}. More generally, we have (A/A(\alpha,\alpha))^{-1} = A^{-1}(\alpha^c,\alpha^c).

Test for Positive Definiteness

For Hermitian matrices the Schur complement provides a test for positive definiteness. Suppose

\notag   A =   \begin{bmatrix} A_{11} & A_{12} \\                   A_{12}^* & A_{22}   \end{bmatrix}

with A_{11} positive definite. The factorization

\notag  A =  \begin{bmatrix} I & 0 \cr A_{12}^*A_{11}^{-1} & I \end{bmatrix}       \begin{bmatrix} A_{11} & 0 \cr 0 & S \end{bmatrix}        \begin{bmatrix} I & A_{11}^{-1}A_{12} \cr 0 & I \end{bmatrix}, \quad    S = A_{22} - A_{12}^*A_{11}^{-1}A_{12},

shows that A is congruent to the block diagonal matrix \mathrm{diag}(A_{11},S), and since congruences preserve inertia (the number of positive, zero, and negative eigenvalues), A is positive definite if and only if \mathrm{diag}(A_{11},S) is positive definite, that is, if and only if S is positive definite. The same equivalence holds with “definite” replace by “semidefinite” (with A_{11} still required to be positive definite).

Generalized Schur Complement

For A in (1) with a possibly singular (1,1) block A_{11} we can define the generalized Schur complement A_{22} - A_{21}A_{11}^+A_{12}, where “+” denotes the Moore-Penrose pseudo-inverse. The generalized Schur complement is useful in the context of Hermitian positive semidefinite matrices, as the following result shows.

Theorem 1.

If

\notag        A = \begin{bmatrix} A_{11} & A_{12} \\                            A_{12}^* & A_{22}            \end{bmatrix} \in\mathbb{C}^{n\times n}

is Hermitian then A is positive semidefinite if and only if \mathrm{range}(A_{12}) \subseteq \mathrm{range}(A_{11}) and A_{11} and the generalized Schur complement S = A_{22} - A_{12}^*A_{11}^+ A_{12} are both positive semidefinite.

If A_{11} is positive definite then the range condition is trivially satisfied.

Notes and References

The term “Schur complement” was coined by Haynsworth in 1968, thereby focusing attention on this important form of matrix and spurring many subsequent papers that explore its properties and applications. The name “Schur” was chosen because a 1917 determinant lemma of Schur says that if A_{11} is nonsingular then

\notag    \det\left(        \begin{bmatrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{bmatrix} \right)      = \det(A_{11}) \det(A_{22} - A_{21}A_{11}^{-1} A_{12})      = \det(A_{11}) \det(A/A_{11}),

which is obtained by taking determinants in (2).

For much more about the Schur complement see Zhang (2005) or Horn and Johnson (2013).

What Is the Minimal Polynomial of a Matrix?

For a polynomial

\notag  \phi(t) = a_kt^k + \cdots + a_1t + a_0,

where a_k\in\mathbb{C} for all k, the matrix polynomial obtained by evaluating \phi at A\in\mathbb{C}^{n \times n} is

\notag  \phi(A) = a_kA^k + \cdots + a_1A + a_0 I.

(Note that the constant term is a_0 A^0 = a_0 I). The polynomial \phi is monic if a_k = 1.

The characteristic polynomial of a matrix A\in\mathbb{C}^{n \times n} is p(t) = \det(t I - A), a degree n monic polynomial whose roots are the eigenvalues of A. The Cayley–Hamilton theorem tells us that p(A) = 0, but p may not be the polynomial of lowest degree that annihilates A. The monic polynomial \psi of lowest degree such that \psi(A) = 0 is the minimal polynomial of A. Clearly, \psi has degree at most n.

The minimal polynomial divides any polynomial \phi such that \phi(A) = 0, and in particular it divides the characteristic polynomial. Indeed by polynomial long division we can write \phi(t) = q(t)\psi(t) + r(t), where the degree of r is less than the degree of \psi. Then

\notag      0 = \phi(A) = q(A) \psi(A) + r(A) = r(A).

If r\ne 0 then we have a contradiction to the minimality of the degree of \psi. Hence r = 0 and so \psi divides \phi.

The minimal polynomial is unique. For if \psi_1 and \psi_2 are two different monic polynomials of minimum degree m such that \psi_i(A) = 0, i = 1,2, then \psi_3 = \psi_1 - \psi_2 is a polynomial of degree less than m and \psi_3(A) = \psi_1(A) - \psi_2(A) = 0, and we can scale \psi_3 to be monic, so by the minimality of m, \psi_3 = 0, or \psi_1 = \psi_2.

If A has distinct eigenvalues then the characteristic polynomial and the minimal polynomial are equal. When A has repeated eigenvalues the minimal polynomial can have degree less than n. An extreme case is the identity matrix, for which \psi(t) = t - 1, since \psi(I) = I - I = 0. On the other hand, for the Jordan block

\notag  J = \begin{bmatrix}      \lambda & 1 & 0 \\      0 & \lambda & 1 \\      0 & 0 & \lambda   \end{bmatrix}

the characteristic polynomial and the minimal polynomial are both equal to (\lambda - 1)^3.

The minimal polynomial has degree less than n when in the Jordan canonical form of A an eigenvalue appears in more than one Jordan block. Indeed it is not hard to show that the minimal polynomial can be written

\notag     \psi(t) = \displaystyle\prod_{i=1}^s (t-\lambda_i)^{n_i},

where \lambda_1,\lambda_2,\dots,\lambda_s are the distinct eigenvalues of A and n_i is the dimension of the largest Jordan block in which \lambda_i appears. This expression is composed of linear factors (that is, n_i = 1 for all i) if and only if A is diagonalizable.

To illustrate, for the matrix

\notag  A = \left[\begin{array}{cc|c|c|c}      \lambda &  1      &         &     &   \\              & \lambda &         &     &   \\\hline              &         & \lambda &     &   \\\hline              &         &         & \mu &    \\\hline              &         &         &     & \mu   \end{array}\right]

in Jordan form (where blank elements are zero), the minimal polynomial is \psi(t) = (t-\lambda)^2(t-\mu), while the characteristic polynomial is p(t) = (t-\lambda)^3(t-\mu)^2.

What is the minimal polynomial of a rank-1 matrix, A = xy^* \ne 0? Since A^2 = (y^*x) xy^*, we have q(A) = 0 for q(t) = t^2 - (y^*x) t = t^2 - \mathrm{trace}(A) t. For any linear polynomial p(t) = t - a_0, p(A) = xy^* - a_0 I, which is nonzero since xy^* has rank 1 and I has rank n. Hence the minimal polynomial is q.

The minimal polynomial is important in the theory of matrix functions and in the theory of Krylov subspace methods. One does not normally need to compute the minimal polynomial in practice.

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

What Is Iterative Refinement?

Iterative refinement is a method for improving the quality of an approximate solution \widehat{x} to a linear system of equations Ax = b, where A is an n\times n nonsingular matrix. The basic iterative refinement algorithm is very simple.

  1. Compute the residual r = b - A\widehat{x}.
  2. Solve Ad = r.
  3. Update \widehat{x} \gets \widehat{x} + d.
  4. Repeat from step 1 if necessary.

At first sight, this algorithm seems as expensive as solving for the original \widehat{x}. However, usually the solver is LU factorization with pivoting, A = LU (where we include the permutations in L). Most of the work is in the LU factorization, which costs O(n^3) flops, and each iteration requires a multiplication with A for the residual and two substitutions to compute d, which total only O(n^2) flops. If the refinement converges quickly then it is inexpensive.

Turning to the error, with a stable LU factorization the initial \widehat{x} computed in floating-point arithmetic of precision u satisfies (omitting constants)

\notag     \displaystyle\frac{\|x- \widehat{x}\|}{\|x\|}\lesssim \kappa_{\infty}(A) u, \qquad (1)

where \kappa_{\infty}(A) = \|A\|_{\infty} \|A^{-1}\|_{\infty} \ge 1 is the matrix condition number in the \infty-norm. We would like refinement to produce a solution accurate to precision u:

\notag     \displaystyle\frac{\|x- \widehat{x}\|}{\|x\|}\approx u. \qquad (2)

But if the solver cannot compute the initial \widehat{x} accurately when A is ill-conditioned why should it be able to produce an update d that improves \widehat{x}?

The simplest answer is that when iterative refinement was first used on digital computers the residual r was computed at twice the working precision, which could be done at no extra cost in the hardware. If \widehat{x} is a reasonable approximation then we expect cancellation in forming r = b - A\widehat{x}, so using extra precision in forming r ensures that r has enough correct digits to yield a correction d that improves \widehat{x}. This form of iterative refinement produces a solution satisfying (2) as long as \kappa_{\infty}(A) < u^{-1}.

Here is a MATLAB example, where the working precision is single and residuals are computed in double precision.

n = 8; A = single(gallery('frank',n)); xact = ones(n,1);
b = A*xact; % b is formed exactly for small n.
x = A\b;
fprintf('Initial_error = %4.1e\n', norm(x - xact,inf))
r = single( double(b) - double(A)*double(x) );
d = A\r;
x = x + d;
fprintf('Second error  = %4.1e\n', norm(x - xact,inf))

The output is

Initial_error = 9.1e-04
Second error  = 6.0e-08

which shows that after just one step the error has been brought down from 10^{-4} to the level of u_s\approx 5\times 10^{-8}, the unit roundoff for IEEE single precision arithmetic.

Fixed Precision Iterative Refinement

By the 1970s, computers had started to lose the ability to cheaply accumulate inner products in extra precision, and extra precision could not be programmed portably in software. It was discovered, though, that even if iterative refinement is run entirely in one precision it can bring benefits when \kappa_{\infty}(A) < u^{-1}. Specifically,

  • if the solver is somewhat numerically unstable the instability is cured by the refinement, in that a relative residual satisfying

    \notag     \displaystyle\frac{\|b-A\widehat{x}\|_{\infty}}{\|A\|_{\infty} \|\widehat{x}\|_{\infty} + \|b\|_{\infty}} \approx u \qquad(3)

    is produced, and

  • a relative error satisfying

    \notag     \displaystyle\frac{\|x- \widehat{x}\|_{\infty}}{\|x\|_{\infty}}      \lesssim \mathrm{cond}(A,x) u   \qquad (4)

    is produced, where

\notag    \mathrm{cond}(A,x) = \displaystyle\frac{ \|\, |A^{-1}| |A| |x| \,\|_{\infty} } {\|x\|_{\infty}}.

The bound (4) is stronger than (1) because \mathrm{cond}(A,x) is no larger than \kappa_{\infty}(A) and can be much smaller, especially if A has badly scaled rows.

Low Precision Factorization

In the 2000s processors became available in which 32-bit single precision arithmetic ran at twice the speed of 64-bit double precision arithmetic. A new usage of iterative refinement was developed in which the working precision is double precision and a double precision matrix A is factorized in single precision.

  1. Factorize A = LU in single precision.
  2. Solve LUx = b by substitution in single precision, obtaining \widehat{x}.
  3. Compute the residual r = b - A\widehat{x} in double precision.
  4. Solve LUd = r in single precision.
  5. Update \widehat{x} \gets \widehat{x} + d in double precision.
  6. Repeat from step 3 if necessary.

Since most of the work is in the single precision factorization, this algorithm is potentially twice as fast as solving Ax=b entirely in double precision arithmetic. The algorithm achieves the same limiting accuracy (4) and limiting residual (3) provided that \kappa_{\infty}(A) < u_s^{-1} \approx 2\times 10^7.

GRMES-IR

A way to weaken this restriction on \kappa_{\infty}(A) is to use a different solver on step 4: solve

\notag      \widetilde{A}d := U^{-1}L^{-1}Ad = U^{-1}L^{-1}r

by GMRES (a Krylov subspace iterative method) in double precision. Within GMRES, \widetilde{A}d is not formed but is applied to a vector as a multiplication with A followed by substitutions with L and U. As long as GMRES converges quickly for this preconditioned system the speed again from the fast single precision factorization will not be lost. Moreover, this different step 4 results in convergence for \kappa_{\infty}(A) a couple of orders magnitude larger than before, and if residuals are computed in quadruple precision then a limiting accuracy (2) is achieved.

This GMRES-based iterative refinement becomes particularly advantageous when the fast half precision arithmetic now available in hardware is used within the LU factorization, and one can use three or more precisions in the algorithm in order to balance speed, accuracy, and the range of problems that can be solved.

Finally, we note that iterative refinement can also be applied to least squares problems, eigenvalue problems, and the singular value decomposition. See Higham and Mary (2022) for details and references.

References

We give five references, which contain links to the earlier literature.

Related Blog Posts