What Is the Spectral Radius of a Matrix?

The spectral radius $\rho(A)$ of a square matrix $A\in\mathbb{C}^{n\times n}$ is the largest absolute value of any eigenvalue of $A$ :

$\notag \rho(A) = \max\{\, |\lambda|: \lambda~ \mbox{is an eigenvalue of}~ A\,\}.$

For Hermitian matrices (or more generally normal matrices, those satisfying $AA^* = A^*A$ ) the spectral radius is just the $2$ -norm, $\rho(A) = \|A\|_2$ . What follows is most interesting for nonnormal matrices.

Two classes of matrices for which the spectral radius is known are as follows.

Unitary matrices ( $U^*U=I$ ): these have all their eigenvalues on the unit circle and so $\rho(U) = 1$ .
Nilpotent matrices ( $A^p = 0$ for some positive integer $p$ ): such matrices have only zero eigenvalues, so $\rho(A) = 0$ , even though $\|A\|$ can be arbitrarily large.

The spectral radius of $A$ is not necessarily an eigenvalue, though it is if $A$ is nonnegative (see below).

Bounds

The spectral radius is bounded above by any consistent matrix norm (one satisfying $\|AB\| \le \|A\|\|B\|$ for all $A$ and $B$ ), for example any matrix $p$ -norm.

Theorem 1. For any $A\in\mathbb{C}^{n\times n}$ and any consistent matrix norm,

$\notag \rho(A) \le \|A\|.$

However, the spectral radius is not a norm and it can be zero when the $p$ -norms are of order $1$ , as illustrated by the nilpotent matrix

$\notag A = \begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix} \;\Rightarrow\; \rho(A) = 0, \quad \|A\|_p = 1, \; 1 \le p \le \infty.$

Limit of Norms of Powers

The spectral radius can be expressed as a limit of norms of matrix powers.

Theorem 2 (Gelfand). For $A\in\mathbb{C}^{n\times n}$ and any matrix norm, $\rho(A) = \lim_{k\to\infty}\|A^k\|^{1/k}$ .

The theorem implies that for large enough $k$ , $\|A^k\|^{1/k}$ is a good approximation to $\rho(A)$ . The use of this formula with repeated squaring of $A$ has been analyzed by Friedland (1991) and Wilkinson (1965). However, squaring requires $O(n^3)$ flops for a full matrix, so this approach is expensive.

Spectral Radius of a Product

Little can be said about the spectral radius of a product of two matrices. The example

$\notag A = \begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix}, \quad B = \begin{bmatrix} 0 & 0 \\ 0 & 1 \end{bmatrix}, \quad AB = 0$

shows that we can have $\rho(AB) = 0$ when $\rho(A) = \rho(B) = 1$ . On the other hand, for

$\notag A = \begin{bmatrix} 0 & 1 \\ 0 & 1 \end{bmatrix}, \quad B = \begin{bmatrix} 0 & 0 \\ 1 & 1 \end{bmatrix}, \quad AB = \begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix}$

we have $1 = \rho(A)\rho(B) < \rho(AB) = 2$ .

Condition Number Lower Bound

For a nonsingular matrix $A$ , by applying Theorem 1 to $A$ and $A^{-1}$ we obtain

$\notag \begin{aligned}\notag \kappa(A) &= \|A\| \|A^{-1}\| \ge \rho(A) \rho(A^{-1})\\ &= \frac{ \max_i |\lambda_i| } { \min_i |\lambda_i| }. \end{aligned}$

This bound can be very weak for nonnormal matrices.

Power Boundedness

In many situations we wish to know whether the powers of a matrix converge to zero. The spectral radius gives a necessary and sufficient condition for convergence.

Theorem 3. For $A\in\mathbb{C}^{n\times n}$ , $A^k\to0$ as $k\to\infty$ if and only if $\rho(A) < 1$ .

The proof of Theorem 3 is straightforward if $A$ is diagonalizable and it can be done in general using the Jordan canonical form.

Computing the Spectral Radius

In this pseudocode, norm(x) denotes the $2$ -norm of x.

Choose n-vector q_0 such that norm(q_0) = 1.
for k=1,2,...                                         
    z_k = A q_{k-1}            % Apply A.
    q_k = z_k / norm(z_k)      % Normalize.     
    mu_k = q_k^*Aq_k           % Rayleigh quotient.
end

The normalization is to avoid overflow and underflow and the $|\mu_k|$ are approximations to $\rho(A)$ .

If $q_0$ has a nontrivial component in the direction of the eigenvector corresponding to the dominant eigenvalue then the power method converges linearly, with a constant that depends on the ratio of the spectral radius to the magnitude of the next largest eigenvalue in magnitude.

Here is an example where the power method converges quickly, thanks to the substantial ratio of $6.49$ between the spectral radius and next largest eigenvalue in magnitude.

>> rng(1); A = rand(4); eig_abs = abs(eig(A)), q = rand(4,1); 
>> for k = 1:5, q = A*q; q = q/norm(q); mu = q'*A*q; 
>>              fprintf('%1.0f %7.4e\n',k,mu)
>> end
eig_abs =
   1.3567e+00
   2.0898e-01
   2.5642e-01
   1.9492e-01
1 1.4068e+00
2 1.3559e+00
3 1.3580e+00
4 1.3567e+00
5 1.3567e+00

Nonnegative Matrices

For real matrices with nonnegative elements, much more is known about the spectral radius. Perron–Frobenius theory says that if $A\in\mathbb{R}^{n\times n}$ is nonnegative then $\rho(A)$ is an eigenvalue of $A$ and there is a nonnegative eigenvector $x$ such that $Ax = \rho(A)x$ . This is why if you generate a random matrix in MATLAB using the rand function, which produces random matrices with elements on $(0,1)$ , there is always an eigenvalue equal to the spectral radius:

>> rng(1);  A = rand(4); e = sort(eig(A))
e =
  -2.0898e-01
   1.9492e-01
   2.5642e-01
   1.3567e+00

If $A$ is stochastic, that is, it is nonnegative and has unit row sums, then $\rho(A) = 1$ . Indeed $e = [1,1,\dots,1]^T$ is an eigenvector corresponding to the eigenvalue $1$ , so $\rho(A) \ge 1$ , but $\rho(A) \le \|A\|_{\infty} = 1$ , by Theorem 1.

References

Shmuel Friedland, Revisiting Matrix Squaring, Linear Algebra Appl., 154–156, 59-63, 1991.
J. H. Wilkinson, The Algebraic Eigenvalue Problem, Oxford University Press, 1965, pp.~615–617.

What Is a Hessenberg Matrix?

An $n\times n$ upper Hessenberg matrix $H$ has the property that $h_{ij} = 0$ for $i>j+1$ . For $n=4$ , the structure is

$\notag H = \begin{bmatrix} \times & \times & \times & \times \\ \times & \times & \times & \times \\ 0 & \times & \times & \times \\ 0 & 0 & \times & \times \end{bmatrix}.$

$H$ is an upper triangular matrix with an extra subdiagonal.

A lower Hessenberg matrix $H$ is the transpose of an upper Hessenberg matrix. In the rest of this article, the Hessenberg matrices are upper Hessenberg.

Hessenberg matrices play a key role in the QR algorithm for computing the eigenvalues of a general matrix. The first step of the algorithm is to reduce $A$ to Hessenberg form by unitary Householder transformations and then carry out the QR iteration on the Hessenberg form.

Because $H$ is so close to being triangular, its LU factorization and QR factorization can both be computed in $O(n^2)$ flops. For example, the first stage of LU factorization needs to eliminate just one element, by adding a multiple of row $1$ to row $2$ (assuming $h_{11}\ne 0$ ):

$\notag H = \begin{bmatrix} \times & \times & \times & \times \\ \times & \times & \times & \times \\ 0 & \times & \times & \times \\ 0 & 0 & \times & \times \end{bmatrix} \quad \to \quad \begin{bmatrix} \times & \times & \times & \times \\ \fbox{0} & \times & \times & \times \\ 0 & \times & \times & \times \\ 0 & 0 & \times & \times \end{bmatrix}.$

Partial pivoting can be used and adds no extra flops. For QR factorization, Givens rotations are used and just two rows need to be rotated on each step.

If the subdiagonal is nonzero, that is, $h_{i+1,i}\ne 0$ for all $i$ , then $H$ is said to be unreduced. In this case, $\mathrm{rank}(H - \lambda I ) \ge n-1$ for any $\lambda$ , which means that there is one linearly independent eigenvector associated with each eigenvalue of $H$ (equivalently, no eigenvalue appears in more than one Jordan block in the Jordan canonical form of $H$ ).

Unitary Hessenberg Matrices

A unitary Hessenberg matrix $H$ with positive subdiagonal elements can be represented in terms of $2n-1$ real parameters (the Schur parametrization), which are essentially the angles in the Givens QR factorization (note that in the QR factorization $H = QR$ , $R$ is triangular and unitary and hence diagonal). The MATLAB function gallery('randhess',n) generates random real orthogonal Hessenberg matrices by choosing the Schur parameters randomly:

>> n = 4; rng(1), H = gallery('randhess',n)
H =
  -8.6714e-01  -9.2330e-02  -4.8943e-01   3.5172e-04
  -4.9807e-01   1.6075e-01   8.5211e-01  -6.1236e-04
            0   9.8267e-01  -1.8538e-01   1.3322e-04
            0            0  -7.1864e-04  -1.0000e+00
>> check_unitary = norm(H'*H-eye(n))
check_unitary =
   1.1757e-16

Examples

A famous Hessenberg matrix with some interesting properties is the Frank Matrix. For example, the eigenvalues appear in reciprocal pairs $(\lambda,1/\lambda)$ . For $n = 5$ :

>> F = gallery('frank',5)
F =
     5     4     3     2     1
     4     4     3     2     1
     0     3     3     2     1
     0     0     2     2     1
     0     0     0     1     1

>> e = eig(F); sort([e 1./e])
ans =
   9.9375e-02   9.9375e-02
   2.8117e-01   2.8117e-01
   1.0000e+00   1.0000e+00
   3.5566e+00   3.5566e+00
   1.0063e+01   1.0063e+01

Another well-known Hessenberg matrix is the Grcar matrix, a Toeplitz matrix of the form

>> G = gallery('grcar',5)
G =
     1     1     1     1     0
    -1     1     1     1     1
     0    -1     1     1     1
     0     0    -1     1     1
     0     0     0    -1     1

It has interesting pseudospectra.

Companion matrices are upper Hessenberg with a unit subdiagonal:

$\notag C = \begin{bmatrix} a_{n-1} & a_{n-2} & \dots &\dots & a_0 \\ 1 & 0 & \dots &\dots & 0 \\ 0 & 1 & \ddots & & \vdots \\ \vdots & & \ddots & 0 & 0 \\ 0 & \dots & \dots & 1 & 0 \end{bmatrix}.$

References

W. B. Gragg, The QR Algorithm for Unitary Hessenberg Matrices, J. Comp. Appl. Math., 16 (1986), pp. 1-8.
Lloyd Trefethen and Mark Embree, Spectra and Pseudospectra: The Behavior of Nonnormal Matrices and Operators, Princeton University Press, Princeton, NJ, USA, 2005. For the Grcar matrix.

The Power of Bidiagonal Matrices

An upper bidiagonal matrix

$\notag B = \begin{bmatrix} b_{11} & b_{12} & & \\ & b_{22} & \ddots & \\ & & \ddots & b_{n-1,n} \\ & & & b_{nn} \end{bmatrix} \in\mathbb{C}^{n\times n}$

depends on just $2n-1$ parameters, which appear on the main diagonal and the superdiagonal.

Such matrices arise commonly, for example as the $U$ factor and the transpose of the $L$ factor in the LU factorization of tridiagonal matrices, and as the intermediate matrix in the computation of the singular value decomposition by the Golub–Reinsch algorithm.

Bidiagonal matrices have many interesting properties, especially when we consider products of them. We describe some of their properties here.

Inverse

Consider the inverse of a nonsingular $4 \times 4$ bidiagonal matrix:

$\notag \begin{bmatrix} a & x & 0 & 0 \\ & b & y & 0 \\ & & c & z \\ & & & d \end{bmatrix}^{-1} = \begin{bmatrix} \frac{1}{a} & -\frac{x}{a\,b} & \frac{x\,y}{a\,b\,c} & -\frac{x\,y\,z}{a\,b\,c\,d} \\[3pt] & \frac{1}{b} & -\frac{y}{b\,c} & \frac{y\,z}{b\,c\,d} \\[3pt] & & \frac{1}{c} & -\frac{z}{c\,d} \\[3pt] & & & \frac{1}{d} \end{bmatrix}.$

Notice that every element in the upper triangle is a product of off-diagonal elements of $B$ and inverses of diagonal elements, that the superdiagonals have alternating signs attached, and that there are no additions. These properties hold for general $n$ .

The consequences include:

the absolute values of the elements of $B^{-1}$ depend on $|B| = (|b_{ij}|)$ , as there is no cancellation in the formulas for $B^{-1}$ (unlike for general triangular matrices),
the singular values of $B$ depend only on $|B|$ (less obvious, and provable by a scaling argument).

Matrix Function

There is a simple formula for any function of $B$ , not just the inverse. Here, the $f[\dots]$ term is a divided difference.

Theorem 1. If $B\in\mathbb{C}^{n\times n}$ is upper bidiagonal then $F = f(B)$ is upper triangular with $f_{ii} = f(t_{ii})$ and

$\notag f_{ij} = b_{i,i+1} b_{i+1,i+2} \dots b_{j-1,j}\, f[b_{ii},b_{i+1,i+1}, \dots, b_{jj}], \quad j > i.$

A special case is the formula for a function of an $m\times m$ Jordan block:

$\notag f\left( \begin{bmatrix} \lambda & 1 & & \\ & \lambda & \ddots & \\ & & \ddots & 1 \\ & & & \lambda \end{bmatrix} \right) = \begin{bmatrix} f(\lambda) & f'(\lambda) & \dots & \displaystyle\frac{f^{(m-1)}(\lambda)}{(m-1)!} \\ & f(\lambda) & \ddots & \vdots \\ & & \ddots & f'(\lambda) \\ & & & f(\lambda) \end{bmatrix}.$

Condition Number of Product of Bidiagonals

Perhaps the most interesting result is that if we have a factorization of a matrix $A\in\mathbb{C}^{n\times n}$ into a product of $2n-1$ bidiagonal matrices then we can compute the condition number $\kappa_{\infty}(A) = \|A\|_{\infty} \|A^{-1}\|_{\infty}$ in $O(n^2)$ flops and to high accuracy in floating-point arithmetic. Every matrix has such a factorization and in some cases the entries of the factors are known as explicit formulas. For example, we can compute the condition number of the Hilbert matrix $H_n$ to high accuracy in $O(n^2)$ flops.

$n$	$\kappa_{\infty}(H_n)$	Relative error for fast algorithm
4	2.84e4	1.28e-16
8	3.39e10	2.25e-16
16	5.06e22	3.67e-17
32	1.36e47	1.75e-15
64	1.10e96	1.77e-15

Any attempt to compute $\kappa_{\infty}(H_n)$ by explicitly forming $H_n$ is doomed to failure by the rounding errors incurred in the formation, no matter what algorithm is used for the computation.

All this analysis, and much more, is contained in

Nicholas J. Higham, The Power of Bidiagonal Matrices, ArXiv:2311.06609, November 2023.

which is based on the Hans Schneider Prize talk that I gave at The 25th Conference of the International Linear Algebra Society, Madrid, Spain, June 12-16, 2023.

What Is an Invariant Subspace?

A subspace $S$ of $\mathbb{C}^n$ is an invariant subspace for $A\in\mathbb{C}^{n \times n}$ if $AS \subseteq S$ , that is, if $x\in S$ implies $Ax \in S$ .

Here are some examples of invariant subspaces.

$\{0\}$ and $\mathbb{C}^n$ are trivially invariant subspaces of any $A$ .
The null space $\mathrm{null}(A) = \{ x: Ax = 0\}$ is an invariant subspace of $A$ because $x\in\mathrm{null}(A)$ implies $Ax = 0\in\mathrm{null}(A)$ .
If $x$ is an eigenvector of $A$ then $\mathrm{span}(x) = \{\, \alpha x: \alpha\in\mathbb{C} \,\}$ is a $1$ -dimensional invariant subspace, since $A\alpha x = \lambda \alpha x \in S$ , where $\lambda$ is the eigenvalue corresponding to $x$ .
The matrix

$\notag \begin{bmatrix} 1 & 1 & 1 \\ 0 & 1 & 1 \\ 0 & 0 & 1 \end{bmatrix}$

has a one-dimensional invariant subspace $\mathrm{span}(e_1)$ and a two-dimensional invariant subspace $\mathrm{span}(e_1, e_2)$ , where $e_i$ denotes the $i$ th column of the identity matrix.

Let $x_1,x_2, \dots,x_p\in\mathbb{C}^n$ be linearly independent vectors. Then $S = \mathrm{span}(x_1,x_2, \dots,x_p)$ is an invariant subspace of $A$ if and only if $Ax_i\in S$ for $i=1\colon p$ . Writing $X = [x_1,x_2,\dots,x_p]\in\mathbb{C}^{n \times p}$ , this condition can be expressed as

$\notag AX = XB, \qquad(1)$

for some $B\in\mathbb{C}^{p \times p}$ .

If $p = n$ in (1) then $AX = XB$ with $X$ square and nonsingular, so $X^{-1}AX = B$ , that is, $A$ and $B$ are similar.

Eigenvalue Relations

We denote by $\Lambda(A)$ the spectrum (set of eigenvalues) of $A$ and by $A^+$ the pseudoinverse of $A$ .

Theorem.

Let $A\in\mathbb{C}^{n\times n}$ and suppose that (1) holds for some full-rank $X\in\mathbb{C}^{n\times p}$ and $B\in\mathbb{C}^{p\times p}$ . Then $\Lambda(B)\subseteq\Lambda(A)$ . Furthermore, if $(\lambda,x)$ is an eigenpair of $A$ with $x\in\mathrm{range}(X)$ then $(\lambda,X^+x)$ is an eigenpair of $B$ .

Proof. If $(\lambda,z)$ is an eigenpair of $B$ then $AXz = XBz = \lambda Xz$ , and $X z\ne 0$ since the columns of $X$ are independent, so $(\lambda,Xz)$ is an eigenpair of $A$ .

If $(\lambda,x)$ is an eigenpair of $A$ with $x\in\mathrm{range}(X)$ then $x = Xz$ for some $z\ne0$ , and $z = X^+x$ , since $X$ being full rank implies that $X^+X = I$ . Hence

$\notag \lambda x = Ax = AXz = XBz.$

Multiplying on the left by $X^+$ gives $\lambda z = Bz$ , so $(\lambda,z)$ is an eigenpair of $B$ .

Block Triangularization

Assuming that $X$ in (1) has full rank $p$ we can choose $Y\in\mathbb{C}^{p \times (n-p)}$ so that $W = [X,\,Y]$ is nonsingular. Let $W^{-1} = \left[\begin{smallmatrix} G \\ H \end{smallmatrix}\right]$ and note that $W^{-1}W = I$ implies $GX = I$ and $HX = 0$ . We have

$\notag W^{-1}AW = \begin{bmatrix} G \\H \end{bmatrix} [AX,\, AY] = \begin{bmatrix} G \\H \end{bmatrix} [XB,\, AY] = \begin{bmatrix} B & GAY\\ 0 & HAY \end{bmatrix}, \qquad (2)$

which is block upper triangular. This construction is used in the proof of the Schur decomposition with $p=1$ , $x$ an eigenvector of unit $2$ -norm, and $W$ chosen to be unitary.

The Schur Decomposition

Suppose $A\in\mathbb{C}^{n \times n}$ has the Schur decomposition $Q^*AQ = R$ , where $Q$ is unitary and $R$ is upper triangular. Then $AQ = QR$ and writing $Q = [Q_1,\,Q_2]$ , where $Q_1$ is $n\times p$ , and

$\notag R = \begin{bmatrix} R_{11} & R_{12} \\ 0 & R_{22} \\ \end{bmatrix},$

where $R_{11}$ is $p\times p$ , we have $A Q_1 = Q_1 R_{11}$ . Hence $Q_1$ is an invariant subspace of $A$ corresponding to the eigenvalues of $A$ that appear on the diagonal of $R_{11}$ . Since $p$ can take any value from $1$ to $n$ , the Schur decomposition provides a nested sequence of invariant subspaces of $A$ .

Notes and References

Many books on numerical linear algebra or matrix analysis contain material on invariant subspaces, for example

David S. Watkins. Fundamentals of Matrix Computations Third edition, Wiley, New York, USA, 2010.

The ultimate reference is perhaps the book by Gohberg, Lancaster, and Rodman, which has an accessible introduction but is mostly at the graduate textbook or research monograph level.

Israel Gohberg, Peter Lancaster, and Leiba Rodman, Invariant Subspaces of Matrices with Applications, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2006 (unabridged republication of book first published by Wiley in 1986).

What Is a Submatrix?

A submatrix of a matrix is another matrix obtained by forming the intersection of certain rows and columns, or equivalently by deleting certain rows and columns. More precisely, let $A$ be an $m\times n$ matrix and let $1\le i_1 < i_2 < \cdots < i_p \le m$ and $1\le j_1 < j_2 < \cdots < j_q \le n$ . Then the $p\times q$ matrix $B$ with $b_{rs} = a_{i_rj_s}$ is the submatrix of $A$ comprising the elements at the intersection of the rows indexed by $i_1,\dots,i_p$ and the columns indexed by $j_1,\dots,j_q$ . For example, for the matrix

$\notag \begin{bmatrix} ~\fbox{1} & \fbox{2} & 3\mskip2mu \\ ~~4 & 5 & 6\mskip2mu~ \\ ~\fbox{7} & \fbox{8} & 9~\mskip2mu \end{bmatrix} = \begin{bmatrix} ~\fbox{1} & \fbox{2} & 3\mskip2mu~ \\[1pt] ~\fbox{4} & 5 & 6\mskip2mu~ \\ 7 & \fbox{8} & 9\mskip2mu \end{bmatrix},$

shown with four elements highlighted in two different ways,

$\notag \begin{bmatrix} 1 & 2 \\ 7 & 8 \\ \end{bmatrix}$

is a submatrix (the intersection of rows $1$ and $3$ and columns $1$ and $2$ , or what is left after deleting row $2$ and column $3$ ), but

$\notag \begin{bmatrix} 1 & 2 \\ 4 & 8 \\ \end{bmatrix}$

is \emph{not} a submatrix.

Submatrices include the $mn$ matrix elements and the matrix itself, but there are many of intermediate size: an $m\times n$ matrix has $(2^m-1)(2^n-1)$ submatrices in total (counting both square and nonsquare submatrices).

If $p = q$ and $i_k = j_k$ , $k = 1\colon p$ , then $B$ is a principal submatrix of $A$ , which is a submatrix symmetrically located about the diagonal. If, in addition, $i_k = k$ , $k=1\colon p$ , then $B$ is a leading principal submatrix of $A$ , which is one situated in the top left corner of $A$ .

The determinant of a square submatrix is called a minor. The Laplace expansion of the determinant expresses the determinant as a weighted sum of minors.

The Colon Notation

In various programming languages, notably MATLAB, and in numerical linear algebra, a colon notation is used to denote submatrices consisting of contiguous rows and columns.

For integers $p$ and $q$ we denote by $p\colon q$ the sequence $p,p+1,\dots,q$ . Thus $i = 1\colon n$ is another way of writing $i = 1,2,\dots,n$ .

We write $A(p\colon q,r \colon s)$ for the submatrix of $A$ comprising the intersection of rows $p$ to $q$ and columns $r$ to $s$ , that is,

$A(p\colon q,r \colon s) = \begin{bmatrix}a_{pr} & \cdots & a_{ps} \\ \vdots & \ddots & \vdots\cr a_{qr} & \cdots & a_{qs} \end{bmatrix}.$

We can think of $A(p\colon q,r \colon s)$ as a projection of $A$ using the corresponding rows and columns of the identity matrix:

$\notag A(p\colon q,r \colon s) = I(p\colon q,:) \,A \,I(:,r\colon s).$

As special cases, $A(k,\colon)$ denotes the $k$ th row of $A$ and $A(\colon,k)$ the $k$ th column of $A$ .

Here are some examples of using the colon notation to extract submatrices in MATLAB. Rows and columns can be indexed by a range using the colon notation or by specifying the required indices in a vector. The matrix used is from the Anymatrix collection.

>> A = anymatrix('core/beta',5)
A =
     1     2     3     4     5
     2     6    12    20    30
     3    12    30    60   105
     4    20    60   140   280
     5    30   105   280   630

>> A(3:4, [2 4 5]) % Rectangular submatrix.
ans =
    12    60   105
    20   140   280

>> A(1:2,4:5)      % Square, but nonprincipal, submatrix.
ans =
     4     5
    20    30

>> A([3 5],[3 5])  % Principal submatrix.
ans =
    30   105
   105   630

Block Matrices

Submatrices are intimately associated with block matrices, which are matrices in which the elements are themselves matrices. For example, a $4\times 4$ matrix $A$ can be regarded as a block $2\times 2$ matrix, where each element is a $2\times 2$ submatrix of $A$ :

$\notag A = \left[\begin{array}{cc|cc} a_{11} & a_{12} & a_{13} & a_{14}\\ a_{21} & a_{22} & a_{23} & a_{24}\\\hline a_{31} & a_{32} & a_{33} & a_{34}\\ a_{41} & a_{42} & a_{43} & a_{44}\\ \end{array}\right] = \begin{bmatrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{bmatrix},$

where

$\notag A_{11} = A(1\colon2,1\colon2) = \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix},$

and likewise for the other three blocks.

What Is a Flop?

A flop is one of the elementary arithmetic operations $+$ , $-$ , $*$ , $/$ carried out on floating-point numbers. For example, evaluating the expression $(b - ax)/c$ takes three flops. A square root, which occurs infrequently in numerical computation, is also counted as one flop.

As an example, the computation of the inner product $x^Ty$ of two $n$ -vectors $x$ and $y$ can be written

s = x(1)*y(1)
for i = 2:n
    s = s + x(i)*y(i)
end

which requires $n$ multiplications and $n-1$ additions, or $2n-1$ flops. A matrix multiplication $AB$ of two $n\times n$ matrices involves computing the inner products of every row of $A$ with every column of $B$ , that is $n^2$ inner products, costing $2n^3 - n^2$ flops. As we are usually interested in flop counts for large dimensions we retain only the highest order terms, so we regard $AB$ as costing $2n^3$ flops.

The number of flops required by a numerical computation is one measure of its cost. However, it ignores the cost of data movement, which can be as large as, or even larger, than the cost of floating-point arithmetic for large computations implemented on modern machines with hierarchical memories. Nevertheless, flop counts provide a useful way to compare the cost of competing algorithms when most of the flops occur in similar types of operations, such as matrix multiplications, and they can be used to choose between algorithm variants (López et al., 2022).

This table summarizes some flop counts, where $\alpha$ is a scalar, $x,y$ are $n$ -vectors, and $A,B$ are $n\times n$ matrices, and $A^{-1}$ is computed via LU factorization.

Computation	Cost
$x + \alpha y$	$2n$ flops
$x^Ty$	$2n$ flops
$Ax$	$2n^2$ flops
$AB$	$2n^3$ flops
$A^{-1}$	$2n^3$ flops

As the table suggests, most standard problems involving $n\times n$ matrices can be solved with a cost of order $n^3$ flops or less, so the interest is in the exponent ( $1$ , $2$ , or $3$ ) of the dominant term and the multiplicative constant. For $m\times n$ matrices there can be several potentially dominant terms $m^kn^\ell$ and comparing competing algorithms is more complicated.

Early Usage

Moler and Van Loan give a different definition of “flop” in their classic paper “Nineteen Dubious Ways to Compute the Exponential of a Matrix” (1978), and their definition was adopted in the flop command in the original Fortran version of MATLAB, which counts the number of flops executed in the most recent command or since MATLAB started. In the 1981 MATLAB Users’ Guide, Cleve Moler explains that

One flop is one execution of a Fortran statement like S = S + X(I)*Y(I) or Y(I) = Y(I) + T*X(I). In other words, it consists of one floating point multiplication, together with one floating point addition and the associated indexing and storage reference operations.

This original definition attempted to account for indexing and storage costs in the computer implementation of an algorithm as well as the cost of the floating-point arithmetic. It was motivated by the fact that in linear algebra computations most arithmetic appears in statements of the form s = s + x*y, which appear in evaluating inner products and in adding one multiple of a vector to another.

In the LINPACK Users’ Guide (1979, App. B) flops were used to normalize timing data, but the word “flop” was not used.

The widespread adoption of the flop was ensured by its use in Golub and Van Loan’s book Matrix Computations (1981). In the 1989 second edition of the book a flop was redefined as in our first paragraph and this definition quickly became the standard usage. The Fortran statements given by Moler now involve two flops.

The MATLAB flop function survived until around 2000, when MATLAB started using LAPACK in place of LINPACK for its core linear algebra computations and it became no longer feasible to keep track of the number of flops executed.

It is interesting to note that an operation of the form S + X(I)*Y(I) that was used in the original definition of flop is now supported in the hardware of certain processors. The operation is called a fused multiply-add and is done in the same time as one multiplication or addition and with one rounding error.

Flops as a Measure of Computer Performance

Another usage of “flop”, in the plural, is in measuring computer performance, with “flops” meaning floating-point operations per second and having a prefix specifying a power of ten. Thus we have the terms megaflops, gigaflops, through to exaflops ( $10^{18}$ floating-point operations per second) for today’s fastest computers.

The earliest citation given in the Oxford English Dictionary for this use of the word “flops” is in a 1977 paper by Calahan, who writes

The most common vector performance measure is the number of floating point operations per second (FLOPS), obtained by dividing the number of floating point operations—known from the algorithmic complexity—by the computation time.

The term “MFLOPS” for “million floating point operations per second” is used in an earlier Cray-1 evaluation report (Keller, 1976).

If you are aware of any earlier references please put them in the comments below.

Dictionary Definitions

The word “flop” appears in general dictionaries, but there is some variation in whether it appears under “flop” or “flops”, in capitalization, and whether it means an operation or an execution rate.

Dictionary	Term	Definition
Oxford English Dictionary (online, 2023)	FLOP	“A floating-point operation per second”
Concise Oxford English Dictionary (11th ed., 2004)	flop	“Floating-point operations per second”
The Chambers Dictionary (13th ed., 2014)	flop	“A floating-point operation”
Collins English Dictionary (13th ed., 2018)	flops or FLOPS	“Floating-point operations per second”
The American Heritage Dictionary of the English Language (5th ed., 2018)	flops or flop	“A measure of the speed of a computer in operations per second, especially operations involving floating-point numbers”
Merriam-Webster (online, 2023)	flops	“A unit of measure for calculating the speed of a computer equal to one floating-point operation per second”

Notes and References

I thank Jack Dongarra, Cleve Moler, and Charlie Van Loan for comments on the history of flops.

D. A. Calahan, Algorithmic and Architectural Issues Related to Vector Processors, in: Large Engineering Systems. Proceedings of the International Symposium held at the University of Manitoba, Winnipeg, Manitoba, Canada August 9–12, 1976, Alvin Wexler, ed., Pergamon Press, Oxford, 1977.
T. Keller, CRAY-1 Evaluation. Final Report, Report LA-6456-MS, Los Alamos Scientific Laboratory, Los Alamos, New Mexico, USA, December 1976.
Francisco López, Lars Karlsson, and Paolo Bientinesi, FLOPs as a Discriminant for Dense Linear Algebra Algorithms, in Proceedings of the 51st International Conference on Parallel Processing, ACM Press, 2022.
Cleve B. Moler and Charles F. Van Loan, Nineteen Dubious Ways to Compute the Exponential of a Matrix, SIAM Rev. 20(4), 801–836. 1978.

Hamilton, the Quaternions, and Creativity

Complex numbers have the form

$\notag a + \mathrm{i}b, \quad \mathrm{i}^2 = -1,$

where $\mathrm{i}$ is the imaginary unit. Quaternions contain two more imaginary units, $\mathrm{j}$ and $\mathrm{k}$ :

$\notag a + \mathrm{i}b + \mathrm{j}c + \mathrm{k}d, \quad \mathrm{i}^2 = \mathrm{j}^2 = \mathrm{k}^2 = \mathrm{ijk} = -1.$

Sir William Rowan Hamilton discovered the quaternions in 1843 as he walked along the Royal Canal in Dublin, and he famously carved the formulas on a stone of Brougham Bridge. He had for some time been trying to extend the “couplets” $a + \mathrm{i}b$ to “triplets” $a + \mathrm{i}b + \mathrm{j}c$ , but he realized that the requirement $|z_1z_2| = |z_1||z_2|$ for the product of two such numbers could not hold. His key insight was to realize that a fourth imaginary unit, $\mathrm{k}$ , was required and that multiplication would be noncommutative: $z_1z_2 \ne z_2z_1$ in general.

Sir William Rowan Hamilton. Etching after J. Kirkwood. Credit: Wellcome Library, London. Wellcome Images. Source.

We do not know exactly how Hamilton discovered the quaternions, though his writings provide insights. One way in which he could have constructed the quaternions is by starting with the complex numbers and asking for each of their properties “How might this be different?” Four terms rather than two and noncommutative multiplication are the differences. This approach of identifying the key features of a problem and asking, for each one, “How might this be different?”, is a powerful process for creativity and discovery—a process that anyone can apply to any problem.

For more on the discovery of the quaternions and on how to generate ideas see the article Hamilton’s Discovery of the Quaternions and Creativity in Mathematics by Dennis Sherwood and I in the LMS Newsletter.

What Is the Pseudoinverse of a Matrix?

The pseudoinverse is an extension of the concept of the inverse of a nonsingular square matrix to singular matrices and rectangular matrices. It is one of many generalized inverses, but the one most useful in practice as it has a number of special properties.

The pseudoinverse of a matrix $A\in\mathbb{C}^{m\times n}$ is an $n \times m$ matrix $X$ that satisfies the Moore–Penrose conditions

$\notag \begin{array}{rcrc} \mathrm{(1)} & AXA = A, \; & \mathrm{(2)} & XAX=X,\\ \mathrm{(3)} & (AX)^* = AX, \; & \mathrm{(4)} & (XA)^* = XA. \end{array}$

Here, the superscript $*$ denotes the conjugate transpose. It can be shown that there is a unique $X$ satisfying these equations. The pseudoinverse is denoted by $A^+$ ; some authors write $A^\dagger$ .

The most important property of the pseudoinverse is that for any system of linear equations $Ax = b$ (overdetermined or underdetermined) $x = A^+b$ minimizes $\|Ax - b\|_2$ and has the minimum $2$ -norm over all minimizers. In other words, the pseudoinverse provides the minimum $2$ -norm least squares solution to $Ax = b$ .

The pseudoinverse can be expressed in terms of the singular value decomposition (SVD). If $A = U\Sigma V^*$ is an SVD, where the $m\times m$ matrix $U$ and $n\times n$ matrix $V$ are unitary and $\Sigma = \mathrm{diag}(\sigma_1,\dots , \sigma_p)$ with $\sigma_1 \ge \sigma_2 \ge \cdots \ge \sigma_r > \sigma_{r+1} = \cdots =\sigma_p = 0$ (so that $\mathrm{rank}(A) = r$ ) with $p = \min(m,n)$ , then

$\notag A^+ = V\mathrm{diag}(\sigma_1^{-1},\dots,\sigma_r^{-1},0,\dots,0)U^*, \qquad (1)$

where the diagonal matrix is $n\times m$ . This formula gives an easy way to prove many identities satisfied by the pseudoinverse. In MATLAB, the function pinv computes $A^+$ using this formula.

From the Moore–Penrose conditions or (1) it can be shown that $(A^+)^+ = A$ and $(A^*)^+ = (A^+)^*$ .

For full rank $A$ we have the concise formulas

$\notag A^+ = \begin{cases} (A^*A) ^{-1}A^*, & \textrm{if}~\mathrm{rank}(A) = n \le m, \\ A^*(AA^*)^{-1}, & \textrm{if}~ \mathrm{rank}(A) = m \le n. \end{cases} \qquad (2)$

Consequently,

$\notag A^+A = I_n ~~\mathrm{if}~ \mathrm{rank}(A) = n, \qquad AA^+ = I_m ~~\mathrm{if}~ \mathrm{rank}(A) = m.$

Some special cases are worth noting.

The pseudoinverse of a zero $m\times n$ matrix is the zero $n\times m$ matrix.
The pseudoinverse of a nonzero vector $x\in\mathbb{C}^n$ is $x^*/(x^*x)$ .
For $x,y\in\mathbb{C}^n$ , $(xy^*)^+ = (y^*)^+ x^+$ and if $x$ and $y$ are nonzero then $(xy^*)^+ = yx^*/ (x^*x\cdot y^*y)$ .
The pseudoinverse of a Jordan block with eigenvalue $0$ is the transpose:

$\notag \begin{bmatrix} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 0 & 0 & 0 \\ \end{bmatrix}^+ = \begin{bmatrix} 0 & 0 & 0 \\ 1 & 0 & 0 \\ 0 & 1 & 0 \end{bmatrix}.$

The pseudoinverse differs from the usual inverse in various respects. For example, the pseudoinverse of a triangular matrix is not necessarily triangular (here we are using MATLAB with the Symbolic Math Toolbox):

>> A = sym([1 1 1; 0 0 1; 0 0 1]), X = pinv(A)
A =
[1, 1, 1]
[0, 0, 1]
[0, 0, 1]
X =
[1/2, -1/4, -1/4]
[1/2, -1/4, -1/4]
[  0,  1/2,  1/2]

Furthermore, it is not generally true that $(AB)^+ = B^+A^+$ for $A\in\mathbb{C}^{m\times n}$ and $B\in\mathbb{C}^{n\times p}$ . A sufficient condition for this equality to hold is that $\mathrm{rank}(A) = \mathrm{rank}(B) = n$ .

It is not usually necessary to compute the pseudoinverse, but if it is required it can be obtained using (1) or (2) or from the Newton–Schulz iteration

$\notag X_{k+1} = 2X_k - X_kAX_k, \quad X_0 = \alpha A^*,$

for which $X_k \to A^+$ as $k\to\infty$ if $0 < \alpha < 2/\|A\|_2^2$ . The convergence is at an asymptotically quadratic rate. However, about $2\log_2\kappa_2(A)$ iterations are required to reach the asymptotic phase, where $\kappa_2(A) = \|A\|_2 \|A^+\|_2$ , so the iteration is slow to converge when $A$ is ill conditioned.

Notes and References

The pseudoinverse was first introduced by Eliakim Moore in 1920 and was independently discovered by Roger Penrose in 1955. For more on the pseudoinverse see, for example, Ben-Israel and Greville (2003) or Campbell and Meyer (2009). For analysis of the Newton–Schulz iteration see Pan and Schreiber (1991).

Adi Ben-Israel and Thomas N. E. Greville, Generalized Inverses: Theory and Applications, second edition, Springer-Verlag, New York, 2003
Stephen Campbell and Carl Meyer, Generalized Inverses of Linear Transformations, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2009. published (Originally published by Pitman in 1979.)
Victor Pan and Robert Schreiber, An Improved Newton Iteration for the Generalized Inverse of a Matrix, with Applications, SIAM J. Sci. Statist. Comput. 12 (5), 1109–1130, 1991.

What Is the Numerical Range of a Matrix?

The numerical range of a matrix $A\in\mathbb{C}^{n\times n}$ , also known as the field of values, is the set of complex numbers

$W(A) = \biggl\{\, \displaystyle\frac{z^*Az}{z^*z}: 0\ne z\in\mathbb{C}^n \,\biggr\}.$

The set $W(A)$ is compact and convex (a nontrivial property proved by Toeplitz and Hausdorff), and it contains all the eigenvalues of $A$ . For normal matrices it is the convex hull of the eigenvalues. For a Hermitian matrix, $W(A)$ is a segment of the real axis, while for a skew-Hermitian matrix it is a segment of the imaginary axis.

The following figure plots in blue the boundaries of the the numerical ranges of nine $16\times 16$ matrices, with the eigenvalues shown as black dots. They were plotted using the function fv in the Matrix Computation Toolbox. The matrices are from Anymatrix.

A method for computing the boundary of the numerical range is based on the observation that for any $z\in\mathbb{C}$ ,

$\notag z^*Az = \underbrace{z^* \frac{1}{2}(A+A^*) z}_{\mathrm{\phantom{1234}real\phantom{1234}}} + \underbrace{z^* \frac{1}{2}(A-A^*) z}_{\mathrm{pure~imaginary}} \qquad (1)$

This implies that the real part of $z^*Az/z^*z$ lies between the largest and smallest eigenvalues of the Hermitian matrix $(A+A^*)/2$ , which define a vertical strip in which the numerical range lies. Since $W(\mathrm{e}^{\mathrm{i}\theta} A) = \mathrm{e}^{\mathrm{i}\theta} W(A)$ , we can apply the same reasoning to the rotated matrix $A(\theta) = \mathrm{e}^{\mathrm{i}\theta} A$ , and taking a range of $\theta\in[0,\pi]$ we obtain an approximation the boundary of the numerical range.

The quantity

$\notag \omega(A) = \max \bigl\{\, \mathrm{Re}\, w : w \in W(A) \,\bigr\}$

is called the numerical abscissa, and by (1), $\omega(A)$ is the largest eigenvalue of the Hermitian matrix $(A+A^*)/2$ . The numerical abscissa determines the rate of growth of $\|\mathrm{e}^{tA}\|$ for small positive $t$ .

Associated with the numerical range is the numerical radius

$\notag r(A) = \max\{\, |w| : w \in W(A) \,\}.$

Note that $r(A^*) = r(A)$ . Also, $r(A) \ge \rho(A)$ , where $\rho$ is the spectral radius (the largest absolute value of any eigenvalue), since $W(A)$ contains the eigenvalues of $A$ .

The numerical radius differs by at most a factor $2$ from the $2$ -norm:

$\notag \frac{1}{2} \|A\|_2 \le r(A) \le \|A\|_2. \qquad (2)$

When $A$ is normal, $r(A) = \|A\|_2$ .

The numerical radius is a matrix norm, but not a consistent norm (that is, $r(AB) \le r(A)r(B)$ does not hold in general). However, it is it true that

$\notag r(A^k) \le r(A)^k, \quad k\ge 1.$

Combining this with with the lower bound in (2) gives

$\notag \|A^k\|_2 \le 2 r(A)^k, \quad k \ge 1,$

so if we know $r(A)$ then we can bound $\|A^k\|_2$ for all $k$ .

The numerical radius can be characterized as the solution of an optimization problem over the Hermitian matrices:

$\notag r(A) = - \max\biggl\{\, \lambda_{\min}\biggl( \begin{bmatrix} Z\!\! & A \\ A^*\!\! & -Z \end{bmatrix} \biggr): Z=Z^*\in\mathbb{C}^{n\times n} \,\biggr\}.$

Notes and References

For proofs of the results given here see Horn and Johnson (2013) or Horn and Johnson (1991), and see the latter for details of the algorithm for computing the numerical range. See Benzi (2020) for a discussion of applications of the numerical range in numerical analysis.

Michele Benzi, Some Uses of the Field of Values in Numerical Analysis, Bollettino dell’Unione Matematica Italiana 14, 159–177, 2020.
Roger Horn and Charles Johnson, Topics in Matrix Analysis, Cambridge University Press, 1991. Chapter 1.
Roger A. Horn and Charles R. Johnson, Matrix Analysis, second edition, Cambridge University Press, 2013. My review of the second edition.

Wilkinson’s Rounding Errors Book Reprinted by SIAM

James Wilkinson’s 1963 book Rounding Errors in Algebraic Processes has been hugely influential. It came at a time when the effects of rounding errors on numerical computations in finite precision arithmetic had just starting to be understood, largely due to Wilkinson’s pioneering work over the previous two decades. The book gives a uniform treatment of error analysis of computations with polynomials and matrices and it is notable for making use of backward errors and condition numbers and thereby distinguishing problem sensitivity from the stability properties of any particular algorithm.

The book was originally published by Her Majesty’s Stationery Office in the UK and Prentice-Hall in the USA. It was reprinted by Dover in 1994 but has been out of print for some time. SIAM has now reprinted the book in its Classics in Applied Mathematics series. It is available from the SIAM Bookstore and, for those with access to it, the SIAM Institutional Book Collection.

I was asked to write a foreword to the book, which I include below. The photo shows Wilkinson lecturing at Argonne National Laboratory. For more about Wilkinson see the James Hardy Wilkinson web page.

Foreword

Rounding Errors in Algebraic Processes was the first book to give detailed analyses of the effects of rounding errors on a variety of key computations involving polynomials and matrices.

The book builds on James Wilkinson’s 20 years of experience in using the ACE and DEUCE computers at the National Physical Laboratory in Teddington, just outside London. The original design for the ACE was prepared by Alan Turing in 1945, and after Turing left for Cambridge Wilkinson led the group that continued to design and build the machine and its software. The ACE made its first computations in 1950. A Cambridge educated mathematician, Wilkinson was perfectly placed to develop and analyze numerical algorithms on the ACE and the DEUCE (the commercial version of the ACE). The intimate access he had to the computers (which included the ability to observe intermediate computed quantities on lights on the console) helped Wilkinson to develop a deep understanding of the numerical methods he was using and their behavior in finite precision arithmetic.

The principal contribution of the book is the analysis of the effects of rounding errors on algorithms, using backward error analysis (then in its infancy) or forward error analysis as appropriate. The book laid the foundations for the error analysis of algebraic processes on digital computers.

Three types of computer arithmetic are analyzed: floating-point arithmetic, fixed-point arithmetic (in which numbers are represented as a number on a fixed interval such as $[-1,1]$ with a fixed scale factor), and block floating-point arithmetic, which is a hybrid of floating-point arithmetic and fixed-point arithmetic. Although floating-point arithmetic dominates today’s computational landscape, fixed-point arithmetic is widely used in digital signal processing and block floating-point arithmetic is enjoying renewed interest in machine learning.

A notable feature of the book is its careful consideration of problem sensitivity, as measured by condition numbers, and this approach inspired future generations of numerical analysis textbooks.

Wilkinson recognized that the worst-case error bounds he derived are pessimistic and he noted that more realistic bounds are obtained by taking the square roots of the dimension-dependent terms (see pages 26, 52, 102, 151). In recent years, rigorous results have been derived that support Wilkinson’s intuition: they show that bounds with the square roots of the constants hold with high probability under certain probabilistic assumptions on the rounding errors.

It is entirely fitting that Rounding Errors in Algebraic Processes is reprinted in the SIAM Classics series. The book is a true classic, it continues to be well-cited, and it will remain a valuable reference for years to come.

Bounds

Limit of Norms of Powers

Spectral Radius of a Product

Condition Number Lower Bound

Power Boundedness

Computing the Spectral Radius

Nonnegative Matrices

References

Related Blog Posts

Share this:

Unitary Hessenberg Matrices

Examples

References

Related Blog Posts

Share this:

Inverse

Matrix Function

Condition Number of Product of Bidiagonals

Share this:

Eigenvalue Relations

Block Triangularization

The Schur Decomposition

Notes and References

Related Blog Posts

Share this:

The Colon Notation

Block Matrices

Related Blog Posts

Share this:

Early Usage

Flops as a Measure of Computer Performance

Dictionary Definitions

Notes and References

Related Blog Posts

Share this:

Share this:

Notes and References

Related Blog Posts

Share this:

Notes and References

Related Blog Posts

Share this:

Foreword

Share this: