What Is a Correlation Matrix?

In linear algebra terms, a correlation matrix is a symmetric positive semidefinite matrix with unit diagonal. In other words, it is a symmetric matrix with ones on the diagonal whose eigenvalues are all nonnegative.

The term comes from statistics. If x_1, x_2, \dots, x_n are column vectors with m elements, each vector containing samples of a random variable, then the corresponding n\times n covariance matrix V has (i,j) element

v_{ij} = \mathrm{cov}(x_i,x_j) = \displaystyle\frac{1}{n-1}            (x_i - \overline{x}_i)^T (x_j - \overline{x}_j),

where \overline{x}_i is the mean of the elements in x_i. If v has nonzero diagonal elements then we can scale the diagonal to 1 to obtain the corresponding correlation matrix

C = D^{-1/2} V D^{-1/2},

where D = \mathrm{diag}(v_{ii}). The (i,j) element c_{ij} = v_{ii}^{-1/2} v_{ij} v_{jj}^{-1/2} is the correlation between the variables x_i and x_j.

Here are a few facts.

  • The elements of a correlation matrix lie on the interval [-1, 1].
  • The eigenvalues of a correlation matrix lie on the interval [0,n].
  • The eigenvalues of a correlation matrix sum to n (since the eigenvalues of a matrix sum to its trace).
  • The maximal possible determinant of a correlation matrix is 1.

It is usually not easy to tell whether a given matrix is a correlation matrix. For example, the matrix

A = \begin{bmatrix}      1  &  1 &   0\\      1  &  1 &   1\\      0  &  1 &   1      \end{bmatrix}

is not a correlation matrix: it has eigenvalues -0.4142, 1.0000, 2.4142. The only value of a_{13} and a_{31} that makes A a correlation matrix is 1.

A particularly simple class of correlation matrices is the one-parameter class A_n with every off-diagonal element equal to w, illustrated for n = 3 by

A_3 = \begin{bmatrix}      1  &  w &   w\\      w  &  1 &   w\\      w  &  w &   1      \end{bmatrix}.

The matrix A_n is a correlation matrix for -1/(n-1) \le w \le 1.

In some applications it is required to generate random correlation matrices, for example in Monte-Carlo simulations in finance. A method for generating random correlation matrices with a specified eigenvalue distribution was proposed by Bendel and Mickey (1978); Davies and Higham (2000) give improvements to the method. This method is implemented in the MATLAB function gallery('randcorr').

Obtaining or estimating correlations can be difficult in practice. In finance, market data is often missing or stale; different assets may be sampled at different time points (e.g., some daily and others weekly); and the matrices may be generated from different parametrized models that are not consistent. Similar problems arise in many other applications. As a result, correlation matrices obtained in practice may not be positive semidefinite, which can lead to undesirable consequences such as an investment portfolio with negative risk.

In risk management and insurance, matrix entries may be estimated, prescribed by regulations or assigned by expert judgement, but some entries may be unknown.

Two problems therefore commonly arise in connection with correlation matrices.

Nearest Correlation Matrix

Here, we have an approximate correlation matrix A that has some negative eigenvalues and we wish to replace it by the nearest correlation matrix. The natural choice of norm is the Frobenius norm, \|A\|_F = \bigl(\sum_{i,j} a_{ij}^2\bigr)^{1/2}, so we solve the problem

\min \{ \, \|A-C\|_F: C~\textrm{is a correlation matrix} \,\}.

We may also have a requirement that certain elements of C remain fixed. And we may want to weight some elements more than others, by using a weighted Frobenius norm. These are convex optimization problems and have a unique solution that can be computed using the alternating projections method (Higham, 2002) or a Newton algorithm (Qi and Sun, 2006; Borsdorf and Higham, 2010).

Another variation requires C to have factor structure, which means that the off-diagonal agrees with that of a rank-k matrix for some given k (Borsdorf, Higham, and Raydan, 2010). Yet another variation imposes a constraint that C has a certain rank or a rank no larger than a certain value. These problems are non-convex, because of the objective function and the rank constraint, respectively.

Another approach that can be used for restoring definiteness, although it does not in general produce the nearest correlation matrix, is shrinking, which constructs a convex linear combination A = \alpha C + (1-\alpha)M, where M is a target correlation matrix (Higham, Strabić, and Šego, 2016). Shrinking can readily incorporate fixed blocks and weighting.

Correlation Matrix Completion

Here, we have a partially specified matrix and we wish to complete it, that is, fill in the missing elements in order to obtain a correlation matrix. It is known that a completion is possible for any set of specified entries if the associate graph is chordal (Grone et al., 1994). In general, if there is one completion there are many, but there is a unique one of maximal determinant, which is elegantly characterized by the property that the inverse contains zeros in the positions of the unspecified entries.


This is a minimal set of references, and they cite further useful references.

Related Blog Posts

Posted in what-is | Leave a comment

What Is a Hadamard Matrix?

A Hadamard matrix is an n\times n matrix with elements \pm 1 and mutually orthogonal columns. For example,

\left[\begin{array}{rrrr}       1 & 1 & 1 & 1\\       1 & -1 & 1 & -1\\       1 & 1 & -1 & -1\\       1 & -1 & -1 & 1 \end{array}\right]

is a Hadamard matrix.

A necessary condition for an n\times n Hadamard matrix to exist with n > 2 is that n is divisible by 4, but it is not known if a Hadamard matrix exists for every such n.

A Hadamard matrix of order 428 was found for the first time in 2005. The smallest multiple of 4 for which a Hadamard matrix has not been found is 668.

A Hadamard matrix satisfies H^T H = nI, so H^{-1} = n^{-1}H^T. It also follows that \det(H) = \pm n^{n/2}. Hadamard’s inequality states that for an n\times n real matrix A, |\det(A)| \le \prod_{k=1}^n \|a_k\|_2, where a_k is the kth column of A. A Hadamard matrix achieves equality in this inequality (as does any matrix with orthogonal columns).

Hadamard matrices can be generated with a recursive (Kronecker product) construction: if H is a Hadamard matrix then so is

\left[\begin{array}{rr}          H & H\\          H & -H   \end{array}\right].

So starting with a Hadamard matrix of size m, one can build up matrices of size 2^km for k = 1,2,\dots. The MATLAB hadamard function uses this technique. It includes the following Hadamard matrix of order 12, for which we simply display the signs of the elements:

\left[\begin{array}{rrrrrrrrrrrr} {}+ & + & + & + & + & 1 & + & + & + & + & + & +\\ {}+ & - & + & - & + & + & + & - & - & - & + & -\\ {}+ & - & - & + & - & + & + & + & - & - & - & +\\ {}+ & + & - & - & + & - & + & + & + & - & - & -\\ {}+ & - & + & - & - & + & - & + & + & + & - & -\\ {}+ & - & - & + & - & - & + & - & + & + & + & -\\ {}+ & - & - & - & + & - & - & + & - & + & + & +\\ {}+ & + & - & - & - & + & - & - & + & - & + & +\\ {}+ & + & + & - & - & - & + & - & - & + & - & +\\ {}+ & + & + & + & - & - & - & + & - & - & + & -\\ {}+ & - & + & + & + & - & - & - & + & - & - & +\\ {}+ & + & - & + & + & + & - & - & - & + & - & - \end{array}\right].

Hadamard matrices have applications in optimal design theory, coding theory, and graph theory.

In numerical analysis, Hadamard matrices are of interest because when LU factorization is performed on them they produce a growth factor of at least n, for any form of pivoting. Evidence suggests that the growth factor for complete pivoting is exactly n, but this has not been proved. It has been proved that any n\times n Hadamard matrix has growth factor n for complete pivoting for n = 12 and n = 16.

An interesting property of Hadamard matrices is that the p-norm (the matrix norm subordinate to the vector p-norm) is known explicitly for all p:

\|H\|_p = \max\bigl( n^{1/p}, n^{1-1/p} \bigr), \quad 1\le p\le \infty.


This is a minimal set of references, which contain further useful references within.

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

Posted in what-is | Leave a comment

What Is an Orthogonal Matrix?

A real, square matrix Q is orthogonal if Q^TQ = QQ^T = I (the identity matrix). Equivalently, Q^{-1} = Q^T. The columns of an orthogonal matrix are orthonormal, that is, they have 2-norm (Euclidean length) 1 and are mutually orthogonal. The same is true of the rows.

Important examples of orthogonal matrices are rotations and reflectors. A 2\times 2 rotation matrix has the form

\begin{bmatrix}             c & s \\             -s& c \\      \end{bmatrix},       \quad c^2 + s^2 = 1.

For such a matrix, c = \cos \theta and s = \sin \theta for some \theta, and the multiplication y = Qx for a 2\times 1 vector x represents a rotation through an angle \theta radians. An n\times n rotation matrix is formed by embedding the 2\times 2 matrix into the identity matrix of order n.

A Householder reflector is a matrix of the form H = I - 2uu^T/(u^Tu), where u is a nonzero n-vector. It is orthogonal and symmetric. When applied to a vector it reflects the vector about the hyperplane orthogonal to v. For n = 2, such a matrix has the form

\begin{bmatrix}     c &  s \\      s& -c \\      \end{bmatrix}, \quad c^2 + s^2 = 1.

Here is the 4\times 4 Householder reflector corresponding to v = [1,1,1,1]^T/2:

\frac{1}{2}         \left[\begin{array}{@{\mskip2mu}rrrr@{\mskip2mu}}                        1 &   -1 &   -1 &   -1\\                       -1 &    1 &   -1 &   -1\\                       -1 &   -1 &    1 &   -1\\                       -1 &   -1 &   -1 &    1\\        \end{array}\right].

This is 1/2 times a Hadamard matrix.

Various explicit formulas are known for orthogonal matrices. For example, the n\times n matrices with (i,j) elements

q_{ij} = \displaystyle\frac{2}{\sqrt{2n+1}}        \sin \left(\displaystyle\frac{2ij\pi}{2n+1}\right)


q_{ij} =              \sqrt{\displaystyle\frac{2}{n}}\cos              \left(\displaystyle\frac{(i-1/2)(j-1/2)\pi}{n} \right)

are orthogonal. These and other orthogonal matrices, as well as diagonal scalings of orthogonal matrices, are constructed by the MATLAB function gallery('orthog',...).

Here are some properties of orthogonal matrices.

  • All the eigenvalues are on the unit circle, that is, they have modulus 1.
  • All the singular values are 1.
  • The 2-norm condition number is 1, so orthogonal matrices are perfectly conditioned.
  • Multiplication by an orthogonal matrix preserves Euclidean length: \|Qx\|_2 = \|x\|_2 for any vector x.
  • The determinant of an orthogonal matrix is \pm 1. A rotation has determinant 1 while a reflection has determinant -1.

Orthogonal matrices can be generated from skew-symmetric ones. If S is skew-symmetric (S = -S^T) then \exp(S) (the matrix exponential) is orthogonal and the Cayley transform (I-S)(I+S)^{-1} is orthogonal as long as S has no eigenvalue equal to -1.

Unitary matrices are complex square matrices Q for which Q^*Q = QQ^* = I, where Q^* is the conjugate transpose of Q. They have analogous properties to orthogonal matrices.

Related Blog Posts

  • What Is a Hadamard Matrix? (2020)—forthcoming
  • What Is a Random Orthogonal Matrix? (2020)—forthcoming

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

Posted in what-is | Leave a comment

What Is a Generalized Inverse?

The matrix inverse is defined only for square nonsingular matrices. A generalized inverse is an extension of the concept of inverse that applies to square singular matrices and rectangular matrices. There are many definitions of generalized inverses, all of which reduce to the usual inverse when the matrix is square and nonsingular.

A large class of generalized inverses of an m \times n matrix A can be defined in terms of the Moore–Penrose conditions, in which X is n\times m:

\begin{array}{rcrc}  \mathrm{(1)}   &    AXA = A,  \; & \mathrm{(2)} & XAX=X,\\  \mathrm{(3)} & (AX)^* = AX, \; & \mathrm{(4)} & (XA)^* = XA. \end{array}

Here, the superscript * denotes the conjugate transpose. A 1-inverse is any X satisfying condition (1), a (1,3)-inverse is any X satisfying conditions (1) and (3), and so on for any subset of the four conditions.

Condition (1) implies that if Ax = b then A(Xb) = A (XAx) = AXAx = Ax = b, so Xb solves the equation, meaning that any 1-inverse is an equation-solving inverse. Condition (2) implies that X = 0 if A = 0.

A (1,3) inverse can be shown to provide a least squares solution to an inconsistent linear system. A (1,4) inverse can be shown to provide the minimum 2-norm solution of a consistent linear system (where the 2-norm is defined by \|x\|_2 = (x^*x)^{1/2}).

There is not a unique matrix satisfying any one, two, or three of the Moore–Penrose conditions. But there is a unique matrix satisfying all four of the conditions, and it is called the Moore-Penrose pseudoinverse, denoted by A^+ or A^{\dagger}. For any system of linear equations Ax = b, x = A^+b minimizes \|Ax - b\|_2 and has the minimum 2–norm over all minimizers.

The pseudoinverse can be expressed in terms of the singular value decomposition (SVD). If A = U\Sigma V^* is an SVD, where the m\times m matrix U and n\times n matrix V are orthogonal, and \Sigma = \mathrm{diag}(\sigma_1,\dots , \sigma_n) with \sigma_1 \ge \sigma_2 \ge \cdots \ge \sigma_r > \sigma_{r+1} = \cdots =\sigma_n = 0 (so that \mathrm{rank}(A) = r), then

A^+ = V\mathrm{diag}(\sigma_1^{-1},\dots,\sigma_r^{-1},0,\dots,0)U^*.

In MATLAB, the function pinv computes A^+ using this formula. If \mathrm{rank}(A) = n then the concise formula A^+ = (A^*A)^{-1}A^* holds.

For square matrices, the Drazin inverse is the unique matrix A^D such that

A^D A A^D = A^D, \quad A A ^D = A^D A, \quad        A^{k+1} A^D = A^k,

where k = \mathrm{index}(A). The first condition is the same as the second of the Moore–Penrose conditions, but the second and third have a different flavour. The index of a matrix of A is the smallest nonnegative integer k such that \mathrm{rank}(A^k) = \mathrm{rank}(A^{k+1}); it is characterized as the dimension of the largest Jordan block of A with eigenvalue zero.

If \mathrm{index}(A)=1 then A^D is also known as the group inverse of A and is denoted by A^\#. The Drazin inverse is an equation-solving inverse precisely when \mathrm{index}(A)\le 1, for then AA^DA=A, which is the first of the Moore–Penrose conditions.

The Drazin inverse can be represented explicitly as follows. If

A = P \begin{bmatrix}                B & 0 \\                0 & N               \end{bmatrix} P^{-1},

where P and B are nonsingular and N has only zero eigenvalues, then

A^D = P \begin{bmatrix}                 B^{-1} & 0 \\                 0      & 0                 \end{bmatrix} P^{-1}.

Here is the pseudoinverse and the Drazin inverse for a particular matrix with index 2:

A = \left[\begin{array}{rrr} 1 & -1 & -1\\[3pt] 0 & 0 & -1\\[3pt] 0 & 0 & 0 \end{array}\right], \quad A^+ = \left[\begin{array}{rrr} \frac{1}{2} & -\frac{1}{2} & 0\\[3pt] -\frac{1}{2} & \frac{1}{2} & 0\\[3pt] 0 & -1 & 0 \end{array}\right], \quad A^D = \left[\begin{array}{rrr} 1 & -1 & 0\\[3pt] 0 & 0 & 0\\[3pt] 0 & 0 & 0 \end{array}\right].


The Moore–Penrose pseudoinverse is intimately connected with orthogonality, whereas the Drazin inverse has spectral properties related to those of the original matrix. The pseudoinverse occurs in all kinds of least squares problems. Applications of the Drazin inverse include population modelling, Markov chains, and singular systems of linear differential equations. It is not usually necessary to compute generalized inverses, but they are valuable theoretical tools.


This is a minimal set of references, which contain further useful references within.

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

Posted in what-is | 3 Comments

What Is a Matrix?

A matrix is a rectangular array of numbers on which certain algebraic operations are defined. Matrices provide a convenient way of encapsulating many numbers in a single object and manipulating those numbers in useful ways.

An m \times n matrix has m rows and n columns and m and n are called the dimensions of the matrix. A matrix is square if it has the same number of rows and columns, otherwise it is rectangular.

An example of a square matrix is

\begin{bmatrix}            1  &  1  &  1  &  1\\            1  &  2  &  3  &  4\\            1  &  3  &  6  & 10\\            1  &  4  & 10  & 20        \end{bmatrix}.

This matrix is symmetric: a_{ij} = a_{ji} for all i and j, where a_{ij} denotes the entry at the intersection of row i and column j. Matrices are written either with square brackets, as in this example, or round brackets (parentheses).

Addition of matrices of the same dimensions is defined in the obvious way: by adding the corresponding entries.

Multiplication of matrices requires the inner dimensions to match. The product of an m \times p matrix A and an p\times n matrix B is an m\times n matrix C = AB defined by the formula

c_{ij} = \displaystyle\sum_{k=1}^p a_{ik}b_{kj},        \quad 1 \le i \le m, \quad 1 \le j \le n.

When m = n, both AB and BA are defined, but they are generally unequal: matrix multiplication is not commutative.

The inverse of a square matrix A is a matrix X such that AX = XA = I, where I is the identity matrix, which has ones on the main diagonal (that is, in the (i,i) position for all i) and zeros off the diagonal. For rectangular matrices various notions of generalized inverse exist.

The transpose of an m \times n matrix A, written A^T, is the n \times m matrix whose (i,j) entry is a_{ji}. For a complex matrix, the conjugate transpose, written A^* or A^H, has (i,j) entry \overline{a}_{ji}.

In linear algebra, a matrix represents a linear transformation between two vector spaces in terms of particular bases for each space.

Vectors and scalars are special cases of matrices: column vectors are n\times 1, row vectors are 1\times n, and scalars are 1\times1.

Many programming languages and problem solving environments support arrays. It is important to note that operations on arrays are typically defined componentwise, so that, for example, multiplying two arrays multiplies the corresponding pairs of entries, which is not the same as matrix multiplication. The quintessential programming environment for matrices is MATLAB, in which a matrix is the core data type.

It is possible to give meaning to a matrix with one or both dimensions zero. MATLAB supports such empty matrices. Matrix multiplication generalizes in a natural way to allow empty dimensions:

>> A = zeros(0,2)*zeros(2,3)
A =
0x3 empty double matrix

>> A = zeros(2,0)*zeros(0,3)
A =
0     0     0
0     0     0

In linear algebra and numerical analysis, matrices are usually written with a capital letter and vectors with a lower case letter. In some contexts matrices are distinguished by boldface.

The term matrix was coined by James Joseph Sylvester in 1850. Arthur Cayley was the first to define matrix algebra, in 1858.


  • Arthur Cayley, A Memoir on the Theory of Matrices, Philos. Trans. Roy. Soc. London 148, 17–37, 1858.
  • Nicholas J. Higham, Sylvester’s Influence on Applied Mathematics, Mathematics Today 50, 202–206, 2014. A version of the article with an extended bibliography containing additional historical references is available as a MIMS EPrint.
  • Roger A. Horn and Charles R. Johnson, Matrix Analysis, second edition, Cambridge University Press, 2013. My review of the second edition.

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

Posted in what-is | Leave a comment

Update of Catalogue of Software for Matrix Functions


Edvin Hopkins and I have updated to version 3.0 the catalogue of software for matrix functions that we originally produced in 2014 and updated in 2016. It covers what is available in various languages (C++, Fortran, Java, Julia, Python, Rust), problem solving environments (GNU Octave, Maple, Mathematica, MATLAB and associated toolboxes, R, Scilab), and libraries (Armadillo, GNU Scientific Library, NAG Library, SLEPc, SLICOT).

Here are some highlights of changes in the last four years that are reflected in the new version.

  • Several new MATLAB third-party functions have been written, by various authors, notably for f(A)b and for arbitrary precision evaluation of the exponential and logarithm.
  • Matrix function support in Julia has been expanded.
  • Armadillo, Rust, SLEPc, and Tensorflow are included in new entries.

In addition, all URLs and references have been updated.

Suggestions for inclusion in a future revision are welcome.

Posted in software | Tagged , , , | Leave a comment

What Is Backward Error?

Backward error is a measure of error associated with an approximate solution to a problem. Whereas the forward error is the distance between the approximate and true solutions, the backward error is how much the data must be perturbed to produce the approximate solution.

For a function f from \mathbb{R}^n to \mathbb{R}^n and an approximation y to f(x), the backward error in y is the smallest \Delta x such that y = f(x+\Delta x), for some appropriate measure of size. There can be many \Delta x satisfying this equation, so the backward error is the solution to a minimization problem. Using a vector norm and measuring perturbations in a relative sense, we can define the backward error in y as

\eta(y) = \min\{ \, \epsilon: y = f(x+\Delta x), \;                      \|\Delta x\| \le \epsilon \|x\| \,\}.

In the following figure the solid lines denote exact mappings and the dashed line shows the mapping that was actually computed.


Usually, but not always, the errors in question are rounding errors, but backward error can also be a useful way to characterize truncation errors (for example in deriving algorithms based on Padé approximation for computing matrix functions).

As an example, for the inner product u^Tv of two vectors the backward error of an approximation w can be defined as

\eta(w) = \min \{\, \epsilon: w = (u + \Delta u)^Tv,\;    \|\Delta u\|_2 \le \epsilon \|u\|_2 \,\},

where \|u\|_2 = (u^Tu)^{1/2}. It can be shown that

\eta(w) = \displaystyle\frac{ |w - u^Tv| }{ \|u\|_2 \|v\|_2 }.

The definition of \eta(w) is clearly unsymmetric in that u is perturbed but v is not. If v is perturbed instead of u then the same formula is obtained. If both u and v are perturbed then the constraint in the definition of \eta(w) becomes nonlinear in \Delta u and \Delta v and no explicit formula is available for \eta(w).

For some problems a backward error may not exist. An example is computation of the outer product A = uv^T of two n-vectors, which has rank 1. In floating-point arithmetic the computed matrix \widehat{A} is unlikely to be of rank 1, so \widehat{A} = (u + \Delta u)(v + \Delta v)^T is not in general possible for any \Delta u and \Delta v. In this situation one can consider a mixed backward–forward error that perturbs \widehat{A} as well as u and v.

Backward error analysis refers to rounding error analysis in which a bound is obtained for a suitably defined backward error. If the backward error can be shown to be small then y is the solution to a nearby problem. Indeed, if the backward error can be shown to be of the same size as any uncertainties in the data then y is as good a solution as can be expected.

Backward error analysis was developed and popularized by James Wilkinson in the 1950s and 1960s. He first used it in the context of computing zeros of polynomials, but the method’s biggest successes came when he applied it to linear algebra computations.

Backward error analysis has also been used in the context of the numerical solution of differential equations, where it is used in different forms known as defect control and shadowing.

The forward error of y\approx f(x) is bounded in terms of the backward error \eta(y) by

\displaystyle\frac{\|y - f(x)\|}{\|f\|} \le        \mathrm{cond}(f,x) \eta(y) + O(\eta(y))^2,

in view of the definition of condition number. Consequently, we have the rule of thumb that

\mbox{forward error} \lesssim    \mbox{condition number}\times    \mbox{backward error}.


This is a minimal set of references, which contain further useful references within.

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

Posted in what-is | 1 Comment

What Is a Condition Number?

A condition number of a problem measures the sensitivity of the solution to small perturbations in the input data. The condition number depends on the problem and the input data, on the norm used to measure size, and on whether perturbations are measured in an absolute or a relative sense. The problem is defined by a function, which may be known explicitly or may be only implicitly defined (as when the problem is to solve an equation).

The most well known example of a condition number is the condition number of a nonsingular square matrix A, which is \kappa(A) = \|A\| \|A^{-1}\|. More correctly, this is the condition number with respect to inversion, because a relative change to A of norm \epsilon can change A^{-1} by a relative amount as much as, but no more than, about \kappa(A)\epsilon for small \epsilon. The same quantity \kappa(A) is also the condition number for a linear system Ax = b (exactly if A is the data, but only approximately if both A and b are the data).

It is easy to see that \kappa(A) \ge 1 for any norm for which \|I\| = 1 (most common norms, but not the Frobenius norm, have this property) and that \kappa(A) tends to infinity as A tends to singularity.

A general definition of (relative) condition number, for a function f from \mathbb{R}^n to \mathbb{R}^n, is

\mathrm{cond}(f,x) = \lim_{\epsilon\to0}                       \displaystyle\sup_{\|\Delta x\| \le \epsilon \|x\|}                       \displaystyle\frac{\|f(x+\Delta x) - f(x)\|}{\epsilon\|f(x)\|}.

Taking a small, nonzero \epsilon, we have

\displaystyle\frac{\|f(x+\Delta x) - f(x)\|}{\|f(x)\|} \lesssim     \mathrm{cond}(f,x) \displaystyle\frac{\|\Delta x\|}{\|x\|}

for small \|\Delta x\|, with approximate equality for some \Delta x.

An explicit expression for \mathrm{cond}(f,x) can be given in terms of the Jacobian matrix, J(x) = (\partial f_i/\partial x_j):

\mathrm{cond}(f,x) = \displaystyle\frac{ \|x\| \| J(x) \| }{ \| f(x) \|}.

We give two examples.

  • If f is a scalar function then J(x) = f'(x), so \mathrm{cond}(f,x) =  |xf'(x)/f(x)|. Hence, for example, \mathrm{cond}(\log,x) =  1/|\log x|.
  • If z is a simple (non-repeated) root of the polynomial p(t) = a_n t^n + \cdots + a_1 t +   a_0 then the data is the vector of coefficients a = [a_n,\dots,a_0]^T. It can be shown that the condition number of the root z is, for the \infty-norm,

    \mathrm{cond}(z,a) =            \displaystyle\frac{ \max_i |a_i| \sum_{i=0}^n |z|^i  }                                  { |z p'(z)| }.

A general theory of condition numbers was developed by Rice (1966).

A problem is said to be well conditioned if the condition number is small and ill conditioned if the condition number is large. The meaning of “small” and “large” depends on the problem and the context. This diagram illustrates a well-conditioned function f: small changes in x produce small changes in f.


The next diagram depicts an ill-conditioned function f: small changes in x can produce large changes in f (but do not necessarily do so, as the closeness of f(x_2) and f(x_3) illustrates).


Here are a few key points about condition numbers.

  • Even though an explicit expression may be available for it, computing \mathrm{cond}(f,x) is usually as expensive as computing f(x), so a lot of research has focused on obtaining inexpensive estimates of the condition number or bounds for it.
  • While \kappa(A) \ge 1, it is not true for all functions that \mathrm{cond}(f,x) is bounded below by 1.
  • For a range of functions that includes the matrix inverse, matrix eigenvalues, and a root of a polynomial, it is known that the condition number is the reciprocal of the relative distance to the nearest singular problem (one with an infinite condition number).
  • As the condition number is itself a function, one can ask: What is the condition number of the condition number? For a variety of problems, including those mentioned in the previous point, the condition number of the condition number is (approximately) the condition number!


This is a minimal set of references, which contain further useful references within.

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

Posted in what-is | 3 Comments

“What Is” Series

I am starting a series of “What Is” posts. These will give brief descriptions of important concepts in numerical analysis and related areas, with a focus on topics that arise in my research. I will try to adhere to the following criteria.

  • The posts are no more than two or three screenfuls (about 500 words).
  • The posts contain a minimum of mathematical symbols, equations, and citations.
  • The posts are accessible to upper level undergraduate students in any mathematically oriented subject.
  • The list of references given for further reading is very small and concentrates on those that are broad, readable, and contain further references to the literature.

I will make PDF versions (from \LaTeX) of the posts available on GitHub in the repository What-Is. These are much better for printing. They contain hyperlinks, which will be clickable in the PDF but not visible in a printout.

I will be happy to receive suggestions for topics to cover.

From 2002–2019 the Notices of the American Mathematical Society published a “What Is …” series of articles, latterly in the Graduate Student section. An index of the articles is available here. Those articles are almost exclusively on pure mathematics and I don’t anticipate much, if any, overlap with my series.

Posted in what-is | Leave a comment

Handbook of Writing for the Mathematical Sciences, Third Edition


The third edition of Handbook of Writing for the Mathematical Sciences was published by SIAM in January 2020. It is SIAM’s fourth best selling book of all time if sales for the first edition (1993) and second edition (1998) are combined, and it is in my top ten most cited outputs. As well as being used by individuals from students to faculty, it is a course text on transferable skills modules in many mathematics departments. A number of publishers cite the book as a reference for recommended style—see, for example, the AMS Author Handbook, the SIAM Style Manual and, outside mathematics and computing, the Chicago Manual of Style.

Parts of the second edition were becoming out of date, as they didn’t reflect recent developments in publishing (open access publishing, DOIs, ORCID, etc.) or workflow (including modern LaTeX packages, version control, and markup languages). I’ve also learned a lot more about writing over the last twenty years.

I made a variety of improvements for the third edition. I reorganized the material in a way that is more logical and makes the book easier to use for reference. I also improved the design and formatting and checked and updated all the references.

I removed content that was outdated or is now unnecessary. For example, nowadays there is no need to talk about submitting a hard copy manuscript, or one not written in LaTeX, to a publisher. I removed the 20-page appendix Winners of Prizes for Expository Writing, since the contents can now be found on the web, and likewise for the appendix listing addresses of mathematical organizations.

I also added a substantial amount of new material. Here are some of the more notable changes.

  • A new chapter Workflow discusses how to organize and automate the many tasks involved in writing. With the increased role of computers in writing and the volume of digital material we produce it is important that we make efficient use of text editors, markup languages, tools for manipulating plain text, spellcheckers, version control, and much more.
  • The chapter on \LaTeX has been greatly expanded, reflecting both the many new and useful packages and my improved knowledge of typesetting.
  • I used the enumitem \LaTeX package to format all numbered and bulleted lists. This results in more concise lists that make better use of the page, as explained in this blog post.
  • I wrote a new chapter on indexing at the same time as I was reading the literature on indexing and making an improved index for the book. Indexing is an interesting task, but most of us do it only occasionally so it is hard to become proficient. This is my best index yet, and the indexing chapter explains pretty much everything I’ve learned abut the topic.
  • Since the second edition I have changed my mind about how to typeset tables. I am now a convert to minimizing the use of rules and to using the booktabs \LaTeX package, as explained in this blog post.
  • The chapter Writing a Talk now illustrates the use of the Beamer \LaTeX package.
  • The book uses color for syntax highlighted \LaTeX listings and examples of slides.
  • Sidebars in gray boxes give brief diversions on topics related to the text, including several on “Publication Peculiarities”.
  • An expanded chapter English Usage includes new sections on Zombie Nouns; Double Negatives; Serial, or Oxford, Comma; and Split Infinitives.
  • There are new chapters on Writing a Blog Post; Refereeing and Reviewing; Writing a Book; and, as discussed above, Preparing an Index and Workflow.
  • The bibliography now uses the backref \LaTeX package to point back to the pages on which entries are cited, hence I removed the author index.
  • As well as updating the bibliography I have added DOIs and URL links, which can be found in the online version of the bibliography in bbl and PDF form, which is available from the book’s website.

At 353 pages, and allowing for the appendices removed and the more efficient formatting, the third edition is over 30 percent longer than the second edition.

As always, working with the SIAM staff on the book was a pleasure. A special thanks goes to Sam Clark of T&T Productions, who copy edited the book. Sam, with whom I have worked on two previous book projects, not only edited for SIAM style but found a large number of improvements to the text and showed me some things I did not know about \LaTeX.

SIAM News has published an interview with me about the book and mathematical writing and publishing.

Here is a word cloud for the book, generated in MATLAB using the wordcloud function, based on words of 6 or more characters. wordcloud2.jpg

Posted in books, LaTeX, writing | Leave a comment