Wilkinson’s Rounding Errors Book Reprinted by SIAM

wilk63_covers.jpg James Wilkinson’s 1963 book Rounding Errors in Algebraic Processes has been hugely influential. It came at a time when the effects of rounding errors on numerical computations in finite precision arithmetic had just starting to be understood, largely due to Wilkinson’s pioneering work over the previous two decades. The book gives a uniform treatment of error analysis of computations with polynomials and matrices and it is notable for making use of backward errors and condition numbers and thereby distinguishing problem sensitivity from the stability properties of any particular algorithm.

The book was originally published by Her Majesty’s Stationery Office in the UK and Prentice-Hall in the USA. It was reprinted by Dover in 1994 but has been out of print for some time. SIAM has now reprinted the book in its Classics in Applied Mathematics series. It is available from the SIAM Bookstore and, for those with access to it, the SIAM Institutional Book Collection.

I was asked to write a foreword to the book, which I include below. The photo shows Wilkinson lecturing at Argonne National Laboratory. For more about Wilkinson see the James Hardy Wilkinson web page.

Wilkinson_045.jpg

Foreword

Rounding Errors in Algebraic Processes was the first book to give detailed analyses of the effects of rounding errors on a variety of key computations involving polynomials and matrices.

The book builds on James Wilkinson’s 20 years of experience in using the ACE and DEUCE computers at the National Physical Laboratory in Teddington, just outside London. The original design for the ACE was prepared by Alan Turing in 1945, and after Turing left for Cambridge Wilkinson led the group that continued to design and build the machine and its software. The ACE made its first computations in 1950. A Cambridge educated mathematician, Wilkinson was perfectly placed to develop and analyze numerical algorithms on the ACE and the DEUCE (the commercial version of the ACE). The intimate access he had to the computers (which included the ability to observe intermediate computed quantities on lights on the console) helped Wilkinson to develop a deep understanding of the numerical methods he was using and their behavior in finite precision arithmetic.

The principal contribution of the book is the analysis of the effects of rounding errors on algorithms, using backward error analysis (then in its infancy) or forward error analysis as appropriate. The book laid the foundations for the error analysis of algebraic processes on digital computers.

Three types of computer arithmetic are analyzed: floating-point arithmetic, fixed-point arithmetic (in which numbers are represented as a number on a fixed interval such as [-1,1] with a fixed scale factor), and block floating-point arithmetic, which is a hybrid of floating-point arithmetic and fixed-point arithmetic. Although floating-point arithmetic dominates today’s computational landscape, fixed-point arithmetic is widely used in digital signal processing and block floating-point arithmetic is enjoying renewed interest in machine learning.

A notable feature of the book is its careful consideration of problem sensitivity, as measured by condition numbers, and this approach inspired future generations of numerical analysis textbooks.

Wilkinson recognized that the worst-case error bounds he derived are pessimistic and he noted that more realistic bounds are obtained by taking the square roots of the dimension-dependent terms (see pages 26, 52, 102, 151). In recent years, rigorous results have been derived that support Wilkinson’s intuition: they show that bounds with the square roots of the constants hold with high probability under certain probabilistic assumptions on the rounding errors.

It is entirely fitting that Rounding Errors in Algebraic Processes is reprinted in the SIAM Classics series. The book is a true classic, it continues to be well-cited, and it will remain a valuable reference for years to come.

What Is the Schur Complement of a Matrix?

For an n\times n matrix

\notag   A =   \begin{bmatrix} A_{11} & A_{12} \\                   A_{21} & A_{22}   \end{bmatrix} \qquad (1)

with nonsingular (1,1) block A_{11} the Schur complement is A_{22} - A_{21}A_{11}^{-1}A_{12}. It is denoted by A/A_{11}. The block with respect to which the Schur complement is taken need not be the (1,1) block. Assuming that A_{22} is nonsingular, the Schur complement of A_{22} in A is A_{11} - A_{12}A_{22}^{-1}A_{21}.

More generally, for an index vector \alpha = [i_1,i_2,\dots,i_k], where 1\le i_1 < i_2 < \cdots < i_p \le n, the Schur complement of A(\alpha,\alpha) in A is A(\alpha^c,\alpha^c) - A(\alpha^c,\alpha) A(\alpha,\alpha)^{-1} A(\alpha,\alpha^c), where \alpha^c is the complement of \alpha (the vector of indices not in \alpha). Most often, it is the Schur complement of A_{11} in A that is of interest, and the general case can be reduced to it by row and column permutations that move A(\alpha,\alpha) to the (1,1) block of A.

Schur Complements in Gaussian Elimination

The reduced submatrices in Gaussian elimination are Schur complements. Write an n\times n matrix A with nonzero (1,1) element \alpha as

\notag   A =   \begin{bmatrix} \alpha & a^*   \\                    b     & C   \end{bmatrix}    =   \begin{bmatrix}  1            & 0      \\                    b/\alpha     & I_{n-1}   \end{bmatrix}   \begin{bmatrix}  \alpha       & a^*    \\                     0           & C - ba^*/\alpha   \end{bmatrix}.

This factorization represents the first step of Gaussian elimination.

After k stages of Gaussian elimination we have computed the following factorization, in which the (1,1) blocks are k\times k:

\notag   \begin{bmatrix} L_{11} & 0      \\                   L_{21} & I   \end{bmatrix}   \begin{bmatrix} A_{11} & A_{12} \\                   A_{21} & A_{22}   \end{bmatrix}  =   \begin{bmatrix} U_{11} & U_{12}      \\                      0   & S   \end{bmatrix},

where the first matrix on the left is the product of the elementary transformations that reduce the first k columns to upper triangular form. Equating (2,1) and (2,2) blocks in this equation gives L_{21}A_{11} + A_{21} = 0 and L_{21}A_{12} + A_{22} = S. Hence S= A_{22} + L_{21}A_{12}                       = A_{22} + (-A_{21}A_{11}^{-1})A_{12}, which is the Schur complement of A_{11} in A.

The next step of the elimination, which zeros out the first column of S below the diagonal, succeeds as long as the (1,1) element of S is nonzero (or if the whole first column of S is zero, in which case there is nothing to do, and A is singular in this case).

For a number of matrix structures, such as

  • Hermitian (or real symmetric) positive definite matrices,
  • totally positive matrices,
  • matrices diagonally dominant by rows or columns,
  • M-matrices,

one can show that the Schur complement inherits the structure. For these four structures the (1,1) element of the matrix is nonzero, so the success of Gaussian elimination (or equivalently, the existence of an LU factorization) is guaranteed. In the first three cases the preservation of structure is also the basis for a proof of the numerical stability of Gaussian elimination.

Inverse of a Block Matrix

The Schur complement arises in formulas for the inverse of a block 2\times 2 matrix. If A_{11} is nonsingular, we can write

\notag   A = \begin{bmatrix}A_{11} & A_{12} \\ A_{21} & A_{22} \end{bmatrix}     = \begin{bmatrix}I & 0 \\ A_{21}A_{11}^{-1} & I \end{bmatrix}       \begin{bmatrix}A_{11} & A_{12} \\ 0 & S \end{bmatrix}, \quad                       S = A_{22} - A_{21}A_{11}^{-1}A_{12}.  \qquad (2)

If S is nonsingular then inverting gives

\notag \begin{aligned}   A^{-1} &=       \begin{bmatrix}A_{11}^{-1} & -A_{11}^{-1}A_{12}S^{-1} \\                     0 & S^{-1}                     \end{bmatrix}                         \begin{bmatrix}I & 0 \\ -A_{21}A_{11}^{-1} & I                           \end{bmatrix} \notag\\         &=       \begin{bmatrix}A_{11}^{-1} + A_{11}^{-1}A_{12}S^{-1}A_{21}A_{11}^{-1} & -A_{11}^{-1}A_{12}S^{-1} \\                -S^{-1}A_{21}A_{11}^{-1} & S^{-1}       \end{bmatrix}. \end{aligned}

So S^{-1} is the (2,2) block of A^{-1}.

One can obtain an analogous formula for A^{-1} in terms of the Schur complement of A_{22} in A, in which the inverse of the Schur complement is the (1,1) block of A^{-1}. More generally, we have (A/A(\alpha,\alpha))^{-1} = A^{-1}(\alpha^c,\alpha^c).

Test for Positive Definiteness

For Hermitian matrices the Schur complement provides a test for positive definiteness. Suppose

\notag   A =   \begin{bmatrix} A_{11} & A_{12} \\                   A_{12}^* & A_{22}   \end{bmatrix}

with A_{11} positive definite. The factorization

\notag  A =  \begin{bmatrix} I & 0 \cr A_{12}^*A_{11}^{-1} & I \end{bmatrix}       \begin{bmatrix} A_{11} & 0 \cr 0 & S \end{bmatrix}        \begin{bmatrix} I & A_{11}^{-1}A_{12} \cr 0 & I \end{bmatrix}, \quad    S = A_{22} - A_{12}^*A_{11}^{-1}A_{12},

shows that A is congruent to the block diagonal matrix \mathrm{diag}(A_{11},S), and since congruences preserve inertia (the number of positive, zero, and negative eigenvalues), A is positive definite if and only if \mathrm{diag}(A_{11},S) is positive definite, that is, if and only if S is positive definite. The same equivalence holds with “definite” replace by “semidefinite” (with A_{11} still required to be positive definite).

Generalized Schur Complement

For A in (1) with a possibly singular (1,1) block A_{11} we can define the generalized Schur complement A_{22} - A_{21}A_{11}^+A_{12}, where “+” denotes the Moore-Penrose pseudo-inverse. The generalized Schur complement is useful in the context of Hermitian positive semidefinite matrices, as the following result shows.

Theorem 1.

If

\notag        A = \begin{bmatrix} A_{11} & A_{12} \\                            A_{12}^* & A_{22}            \end{bmatrix} \in\mathbb{C}^{n\times n}

is Hermitian then A is positive semidefinite if and only if \mathrm{range}(A_{12}) \subseteq \mathrm{range}(A_{11}) and A_{11} and the generalized Schur complement S = A_{22} - A_{12}^*A_{11}^+ A_{12} are both positive semidefinite.

If A_{11} is positive definite then the range condition is trivially satisfied.

Notes and References

The term “Schur complement” was coined by Haynsworth in 1968, thereby focusing attention on this important form of matrix and spurring many subsequent papers that explore its properties and applications. The name “Schur” was chosen because a 1917 determinant lemma of Schur says that if A_{11} is nonsingular then

\notag    \det\left(        \begin{bmatrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{bmatrix} \right)      = \det(A_{11}) \det(A_{22} - A_{21}A_{11}^{-1} A_{12})      = \det(A_{11}) \det(A/A_{11}),

which is obtained by taking determinants in (2).

For much more about the Schur complement see Zhang (2005) or Horn and Johnson (2013).