What Is the Determinant of a Matrix?

The determinant of an n\times n matrix A is defined by

\notag   \det(A) = \displaystyle\sum_j (-1)^{\mathop{\mathrm{sgn}}j}                   a_{1,j_1}a_{2,j_2} \dots a_{n,j_n}, \qquad (1)

where the sum is over all n! permutations j = (j_1,j_2,\dots,j_n) of the sequence (1,2,\dots,n) and \mathop{\mathrm{sgn}}j is the number of inversions in j, that is, the number of pairs (j_k,j_\ell) with k  j_\ell. Each term in the sum is a signed product of n entries of A and the product contains one entry taken from each row and one from each column.

The determinant is sometimes written with vertical bars, as |A|.

Three fundamental properties are

\notag \begin{aligned} \det(\alpha A) &= \alpha^n \det(A)\; \mathrm{for~any~scalar~}\alpha,\qquad(2)\\ \det(A^T) &= \det(A), \qquad(3)\\ \det(AB) &= \det(A)\det(B) \mathrm{~for~} n\times n~ A \mathrm{~and~} B.\qquad(4) \end{aligned}

The first property is immediate, the second can be proved using properties of permutations, and the third is proved in texts on linear algebra and matrix theory.

An alternative, recursive expression for the determinant is the Laplace expansion

\notag   \det(A) = \displaystyle\sum_{j=1}^n (-1)^{i+j} a_{ij} \det (A_{ij}).  \qquad(5)

for any i\in\{1,2,\dots,n\}, where A_{ij} denotes the (n-1)\times (n-1) submatrix of A obtained by deleting row i and column j, and \det(a) = a for a scalar a. This formula is called the expansion by minors because \det (A_{ij}) is a minor of A.

For some types of matrices the determinant is easy to evaluate. If T is triangular then \det(T) = \prod_{i=1}^n t_{ii}. If Q is unitary then Q^*Q = I implies |\det(Q)| = 1 on using (3) and (4). An explicit formula exists for the determinant of a Vandermonde matrix.

The determinant of A is connected with the eigenvalues \lambda_i of A via the property \det(A) = \prod_{i=1}^n \lambda_i. Since the eigenvalues are the roots of the characteristic polynomial \det(tI - A), this relation follows by setting t = 0 in the expression

\notag   \det(tI - A) = t^n + a_{n-1}t^{n-1} + \cdots + a_1 t + a_0                = \displaystyle\prod_{i=1}^n (t - \lambda_i).

For n=2, the determinant is

\notag     \det\biggl( \begin{bmatrix} a & b \\ c & d \end{bmatrix} \biggr)      = ad - bc,

but already for n=3 the determinant is tedious to write down. If one must compute \det(A), the formulas (1) and (5) are too expensive unless n is very small: they have an exponential cost. The best approach is to use a factorization of A involving factors that are triangular or orthogonal, so that the determinants of the factors are easily computed. If PA = LU is an LU factorization, with P a permutation matrix, L unit lower triangular, and U upper triangular, then \det(A) = \det(P) \prod_{i=1}^n u_{ii} = \pm \prod_{i=1}^n u_{ii}. As this expression indicates, the determinant is prone to overflow and underflow in floating-point arithmetic, so it may be preferable to compute \log(|\det(A)|) = \sum_{i=1}^n \log|u_{ii}|.

The determinant features in the formula

\notag   A^{-1} = \displaystyle\frac{\mathrm{adj}(A)}{\det(A)}

for the inverse, where \mathrm{adj}(A) is the adjugate of A (recall that \mathrm{adj}(A) has (i,j) element (-1)^{i+j} \det(A_{ji})). More generally, Cramer’s rule says that the components of the solution to a linear system Ax = b are given by x_i = \det(A_i(b))/\det(A), where A_i(b) denotes A with its ith column replaced by b. While mathematically elegant, Cramer’s rule is of no practical use, as it is both expensive and numerically unstable in finite precision arithmetic.

Inequalities

A celebrated bound for the determinant of a Hermitian positive definite matrix H\in\mathbb{C}^{n\times n} is Hadamard’s inequality. Note that for such H, \det(H) is real and positive (being the product of the eigenvalues, which are real and positive) and the diagonal elements are also real and positive (since h_{ii} = e_i^*He_i > 0).

Theorem 1 (Hadamard’s inequality). For a Hermitian positive definite matrix H\in\mathbb{C}^{n\times n},

\notag     \det(H) \le \displaystyle\prod_{i=1}^n h_{ii},

with equality if and only if H is diagonal.

Theorem 1 is easy to prove using a Cholesky factorization.

The following corollary can be obtained by applying Theorem 1 to H = A^*A or by using a QR factorization of A.

Corollary 2. For A = [a_1,a_2,\dots,a_n] \in\mathbb{C}^{n\times n},

\notag     \det(A) \le \displaystyle\prod_{i=1}^n \|a_i\|_2,

with equality if and only if the columns of A are orthogonal.

Obviously, one can apply the corollary to A^* and obtain the analogous bound with column norms replaced by row norms.

The determinant of A\in\mathbb{R}^{n\times n} can be interpreted as the volume of the parallelepiped \{\, \sum_{i=1}^n t_ia_i : 0 \le t_i \le 1, ~ i = 1\colon n\,\}, whose sides are the columns of A. Corollary 2 says that for columns of given lengths the volume is maximized when the columns are orthogonal.

Nearness to Singularity and Conditioning

The determinant characterizes nonsingularity: A is singular if and only if \det(A) = 0. It might be tempting to use |\det(A)| as a measure of how close a nonsingular matrix A is to being singular, but this measure is flawed, not least because of the sensitivity of the determinant to scaling. Indeed if Q is unitary then \det(\alpha Q) = \alpha^n \det(Q) can be given any value by a suitable choice of \alpha, yet \alpha Q is perfectly conditioned: \kappa_2(\alpha Q) = 1, where \kappa(A) = \|A\| \|A^{-1}\| is the condition number.

To deal with the poor scaling one might normalize the determinant: in view of Corollary 2,

\notag   \psi(A) = \displaystyle\frac{\prod_{i=1}^n \|a_i\|_2} {\det(A)}

satisfies \psi(A) \ge 1 and \psi(A) = 1 if and only if the columns of A are orthogonal. Birkhoff (1975) calls \psi the Hadamard condition number. In general, \psi is not related to the condition number \kappa, but if A has columns of unit 2-norm then it can be shown that \kappa_2(A) < 2\psi(A) (Higham, 2002, Prob. 14.13). Dixon (1984) shows that for classes of n\times n random matrices A_n that include matrices with elements independently drawn from a normal distribution with mean 0, the probability that the inequality

n^{1/4 - \epsilon} \mathrm{e}^{n/2}     < \psi(A_n) < n^{1/4 + \epsilon} \mathrm{e}^{n/2}

holds tends to 1 as n\to\infty for any \epsilon > 0, so \psi(A_n) \approx n^{1/4}\mathrm{e}^{n/2} for large n. This exponential growth is much faster than the growth of \kappa, for which Edelman (1998) showed that for the standard normal distribution, \mathbb{E}(\log(\kappa_2(A_n))) \approx \log n + 1.537, where \mathbb{E} denotes the mean value. This MATLAB example illustrates these points.

>> rng(1); n = 50; A = randn(n); 
>> psi = prod(sqrt(sum(A.*A)))/abs(det(A)), kappa2 = cond(A)
psi =
   5.3632e+10
kappa2 =
   1.5285e+02
>> ratio = psi/(n^(0.25)*exp(n/2))
ratio =
   2.8011e-01

The relative distance from A to the set of singular matrices is equal to the reciprocal of the condition number.

Theorem 3 (Gastinel, Kahan). For A\in\mathbb{C}^{n\times n} and any subordinate matrix norm,

\notag   \min \left\{ \displaystyle\frac{\|\Delta A\|}{\|A\|} : A+\Delta A\mathrm{~singular} \right\}  = \displaystyle\frac{1}{\kappa(A)}.

Notes

Determinants came before matrices, historically. Most linear algebra textbooks make significant use of determinants, but a lot can be done without them. Axler (1995) shows how the theory of eigenvalues can be developed without using determinants.

Determinants have little application in practical computations, but they are a useful theoretical tool in numerical analysis, particularly for proving nonsingularity.

There is a large number of formulas and identities for determinants. Sir Thomas Muir collected many of them in his five-volume magnum opus The Theory of Determinants in the Historical Order of Development, published between 1890 and 1930. Brualdi and Schneider (1983) give concise derivations of many identities using Gaussian elimination, bringing out connections between the identities.

The quantity obtained by modifying the definition (1) of determinant to remove the (-1)^{\mathop{\mathrm{sgn}}j} term is the permanent. The permanent arises in combinatorics and quantum mechanics and is much harder to compute than the determinant: no algorithm is known for computing the permanent in p(n) operations for a polynomial p.

References

This is a minimal set of references, which contain further useful references within.

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

What Is a Vandermonde Matrix?

A Vandermonde matrix is defined in terms of scalars x_1, x_2, …, x_n\in\mathbb{C} by

\notag     V = V(x_1,x_2,\dots,x_n)       = \begin{bmatrix}                 1      &   1    & \dots & 1     \\                 x_1   & x_2   & \dots & x_n  \\                 \vdots &\vdots  &       & \vdots \\                 x_1^{n-1} & x_2^{n-1} & \dots & x_n^{n-1}    \end{bmatrix}       \in \mathbb{C}^{n\times n}.

The x_i are called points or nodes. Note that while we have indexed the nodes from 1, they are usually indexed from 0 in papers concerned with algorithms for solving Vandermonde systems.

Vandermonde matrices arise in polynomial interpolation. Suppose we wish to find a polynomial p_{n-1}(x) = a_nx^{n-1} + a_{n-1}x^{n-2} + \cdots + a_1 of degree at most n-1 that interpolates to the data (x_i,f_i)_{i=1}^n, that is, p_{n-1}(x_i) = f_i, i=1\colon n. These equations are equivalent to

\notag          V^Ta = f \quad \mathrm{(dual)},

where a = [a_1,a_2,\dots,a_n]^T is the vector of coefficients. This is known as the dual problem. We know from polynomial interpolation theory that there is a unique interpolant if the x_i are distinct, so this is the condition for V to be nonsingular.

The problem

\notag          Vy = b \quad \mathrm{(primal)}

is called the primal problem, and it arises when we determine the weights for a quadrature rule: given moments b_i find weights y_i such that \sum_{j=1}^n y_j^{} x_j^{\,i-1} = b_i, i=1\colon n.

Determinant

The determinant of V is a function of the n points x_i. If x_i = x_j for some i\ne j then V has identical ith and jth columns, so is singular. Hence the determinant must have a factor x_i - x_j. Consequently, we have

\notag  \det( V(x_1,x_2,\dots,x_n) ) = c \displaystyle\prod_{i,j = 1\atop i > j}^n (x_i - x_j),

where, since both sides have degree n(n-1)/2 in the x_i, c is a constant. But \det(V) contains a term x_2 x_3^2 \dots x_n^{n-1} (from the main diagonal), so c = 1. Hence

\notag   \det(V) = \displaystyle\prod_{i,j = 1\atop i > j}^n (x_i - x_j). \qquad (1)

This formula confirms that V is nonsingular precisely when the x_i are distinct.

Inverse

Now assume that V is nonsingular and let V^{-1} = W = (w_{ij})_{i,j=1}^n. Equating elements in the ith row of WV = I gives

\sum_{j=1}^n w_{ij} x_k^{\mskip1mu j-1} = \delta_{ik},    \quad k=1\colon n,

where \delta_{ij} is the Kronecker delta (equal to 1 if i=j and 0 otherwise). These equations say that the polynomial \sum_{j=1}^n w_{ij} x^{\mskip1mu j-1} takes the value 1 at x = x_i and 0 at x = x_k, k\ne i. It is not hard to see that this polynomial is the Lagrange basis polynomial:

\notag    \sum_{j=1}^n w_{ij} x^{j-1} = \displaystyle\prod_{k=1\atop k\ne i}^n    \left( \frac{x-x_k}{x_i-x_k} \right) =: \ell_i(x). \qquad (2)

We deduce that

\notag    w_{ij} = \displaystyle\frac{ (-1)^{n-j}                    \sigma_{n-j}(x_1,\dots,x_{i-1},x_{i+1},\dots,x_n) }    { \displaystyle\prod_{k=1 \atop k\ne i}^n (x_i-x_k) }, \qquad (3)

where \sigma_k(y_1,\dots,y_n) denotes the sum of all distinct products of k of the arguments y_1,\dots,y_n (that is, \sigma_k is the kth elementary symmetric function).

From (1) and (3) we see that if the x_i are real and positive and arranged in increasing order 0 < x_1 < x_2 < \cdots  0 and V^{-1} has a checkerboard sign pattern: the (i,j) element has sign (-1)^{i+j}.

Note that summing (2) over i gives

\notag    \displaystyle\sum_{j=1}^n x^{j-1} \sum_{i=1}^n w_{ij} = \sum_{i=1}^n \ell_i(x) = 1,

where the second equality follows from the fact that \sum_{i=1}^n \ell_i(x) is a degree n-1 polynomial that takes the value 1 at the n distinct points x_i. Hence

\notag   \displaystyle\sum_{i=1}^n w_{ij} = \delta_{j1},

so the elements in the jth column of the inverse sum to 1 for j = 1 and 0 for j\ge 2.

Example

To illustrate the formulas above, here is an example, with x_i = (i-1)/(n-1) and n = 5:

\notag V = \left[\begin{array}{ccccc} 1 & 1 & 1 & 1 & 1\\ 0 & \frac{1}{4} & \frac{1}{2} & \frac{3}{4} & 1\\[\smallskipamount] 0 & \frac{1}{16} & \frac{1}{4} & \frac{9}{16} & 1\\[\smallskipamount] 0 & \frac{1}{64} & \frac{1}{8} & \frac{27}{64} & 1\\[\smallskipamount] 0 & \frac{1}{256} & \frac{1}{16} & \frac{81}{256} & 1 \end{array}\right], \quad V^{-1} = \left[\begin{array}{ccccc} 1 & -\frac{25}{3} & \frac{70}{3} & -\frac{80}{3} & \frac{32}{3}\\[\smallskipamount] 0 & 16 & -\frac{208}{3} & 96 & -\frac{128}{3}\\ 0 & -12 & 76 & -128 & 64\\[\smallskipamount] 0 & \frac{16}{3} & -\frac{112}{3} & \frac{224}{3} & -\frac{128}{3}\\[\smallskipamount] 0 & -1 & \frac{22}{3} & -16 & \frac{32}{3} \end{array}\right],

for which \det(V) = 9/32768.

Conditioning

Vandermonde matrices are notorious for being ill conditioned. The ill conditioning stems from the monomials being a poor basis for the polynomials on the real line. For arbitrary distinct points x_i, Gautschi showed that V_n = V(x_1, x_2, \dots, x_n) satisfies

\notag    \displaystyle\max_i \displaystyle\prod_{j\ne i} \frac{ \max(1,|x_j|) }{ |x_i-x_j| }    \le \|V_n^{-1}\|_{\infty}    \le  \displaystyle\max_i \prod_{j\ne i} \frac{ 1+|x_j| }{ |x_i-x_j| },

with equality on the right when x_j = |x_j| e^{\mathrm{i}\theta} for all j with a fixed \theta (in particular, when x_j\ge0 for all j). Note that the upper and lower bounds differ by at most a factor 2^{n-1}. It is also known that for any set of real points x_i,

\notag        \kappa_2(V_n) \ge         \Bigl(\displaystyle\frac{2}{n}\Bigr)^{1/2} \,         (1+\sqrt{2})^{n-2}

and that for x_i = 1/i we have \kappa_{\infty}(V_n) > n^{n+1}, where the lower bound is an extremely fast growing function of the dimension!

These exponential lower bounds are alarming, but they do not necessarily rule out the use of Vandermonde matrices in practice. One of the reasons is that there are specialized algorithms for solving Vandermonde systems whose accuracy is not dependent on the condition number \kappa, and which in some cases can be proved to be highly accurate. The first such algorithm is an O(n^2) operation algorithm for solving V_ny =b of Björck and Pereyra (1970). There is now a long list of generalizations of this algorithm in various directions, including for confluent Vandermonde-like matrices (Higham, 1990), as well as for more specialized problems (Demmel and Koev, 2005) and more general ones (Bella et al., 2009). Another important observation is that the exponential lower bounds are for real nodes. For complex nodes V_n can be much better conditioned. Indeed when the x_i are the roots of unity, V_n/\sqrt{n} is the unitary Fourier matrix and so V_n is perfectly conditioned.

Generalizations

Two ways in which Vandermonde matrices have been generalized are by allowing confluency of the points x_i and by replacing the monomials by other polynomials. Confluency arises when the x_i are not distinct. If we assume that equal x_i are contiguous then a confluent Vandermonde matrix is obtained by “differentiating” the previous column for each of the repeated points. For example, with points x_1, x_1, x_1, x_2, x_2 we obtain

\notag   \begin{bmatrix}   1      &  0     &   0     &   1          &   0    \\   x_1    &  1     &   0     &  x_2    &   1    \\   x_1^2  & 2x_1   &   2     &  x_2^2  & 2x_2  \\   x_1^3  & 3x_1^2 & 6x_1    & x_2^3   & 3x_2^2 \\   x_1^4  & 4x_1^3 & 12x_1^2 & x_2^4   & 4x_2^3   \end{bmatrix}. \qquad (4)

The transpose of a confluent Vandermonde matrix arises in Hermite interpolation; it is nonsingular if the points corresponding to the “nonconfluent columns” are distinct (that is, if x_1 \ne x_2 in the case of (4)).

A Vandermonde-like matrix is defined in terms of a set of polynomials \{p_i(x)\}_{i=0}^n with p_i having degree i:

\notag     \begin{bmatrix}     p_0(x_1) & p_0(x_2) & \dots & p_0(x_n)\\     p_1(x_1) & p_1(x_2) & \dots & p_1(x_n)\\    \vdots & \vdots & \dots & \vdots\\     p_{n-1}(x_1) & p_{n-1}(x_2) & \dots & p_{n-1}(x_n)\\    \end{bmatrix}.

Of most interest are polynomials that satisfy a three-term recurrence, in particular, orthogonal polynomials. Such matrices can be much better conditioned than general Vandermonde matrices.

Notes

Algorithms for solving confluent Vandermonde-like systems and their rounding error analysis are described in the chapter “Vandermonde systems” of Higham (2002).

Gautschi has written many papers on the conditioning of Vandermonde matrices, beginning in 1962. We mention just his most recent paper on this topic: Gautschi (2011).

References

This is a minimal set of references, which contain further useful references within.

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

Bounds for the Matrix Condition Number

We present a selection of bounds for the condition number \kappa(A) = \|A\| \|A^{-1}\| of a nonsingular matrix A\in\mathbb{C}^{n\times n} in terms of quantities that might be known or can be estimated.

General Matrices

From the inequality \|A\| \ge \rho(A), for any matrix norm, where \rho(A) is the spectral radius (the largest magnitude of any eigenvalue of A) we have

\notag       \kappa(A) \ge \rho(A) \rho(A^{-1}).  \qquad (1)

Fir the 2-norm, this bound is an equality for a normal matrix (one for which A^*A = AA^*), but it can be arbitrarily weak for nonnormal matrices.

Guggenheimer, Edelman, and Johnson (1995) obtain the bound

\notag       \kappa_2(A) < \displaystyle\frac{2}{|\det(A)|}                  \left( \frac{\|A\|_F}{n^{1/2}} \right)^n. \qquad (2)

The proof of the bound applies the arithmetic–geometric mean inequality to the n numbers \sigma_1^2/2, \sigma_1^2/2,  \sigma_2^2, \sigma_3^2, \dots, \sigma_{n-1}^2, where the \sigma_i are the singular values of A. This bound can be arbitrarily weak but it is an approximate equality when \sigma_1,\sigma_2, \dots \sigma_{n-1} are of similar order of magnitude.

Merikoski, Urpala, Virtanen, Tam, and Uhlig (1997) obtain the bound

\notag  \kappa_2(A) \le  \left(\displaystyle\frac{1+x}{1-x}\right)^{1/2}, \quad      x = \sqrt{1 - (n/\|A\|_F^2)^n |\det(A)|^2 }. \qquad (3)

Their proof uses a more refined application of the arithmetic–geometric mean inequality, and they show that this bound is the smallest that can be obtained based on \|A\|_F, \det(A), and n only. Hence (3) is no larger than (2), and they show that it can be smaller by no more than 1.5. Equality holds in (3) if and only if \sigma_2 = \sigma_3 = \cdots = \sigma_{n-1} = (\sigma_1 + \sigma_n)/2.

As an example, for three random 25\times 25 matrices with \kappa_2(A) = 10, generated by gallery('randsvd') with three different singular value dsitributions:

Mode (2) (3)
One large singular value 9.88e+07 9.88e+07
One small singular value 1.21e+01 1.20e+01
Geometrically distributed singular values 5.71e+04 5.71e+04

We note that for larger \kappa_2(A) the formula (3) is prone to overflow, which can be avoided by evaluating it in higher precision arithmetic.

Hermitian Positive Definite Matrices

Merikoski et al. (1997) also give a version of (3) for Hermitian positive definite A\in\mathbb{C}^{n\times n}:

\kappa_2(A) \le \displaystyle\frac{1+x}{1-x}, \quad      x = \sqrt{1 - (n/\mathrm{trace}(A))^n \det(A) }.     \qquad (4)

This is the smallest bound that can be obtained based on \mathrm{trace}(A), \det(A), and n only. Equality holds in (4) if and only if the eigenvalues \lambda_1 \ge \lambda_2 \ge \cdots \ge \lambda_n of A satisfy \lambda_2 = \lambda_3 = \cdots = \lambda_{n-1} = (\lambda_1 + \lambda_n)/2. We can rewrite this upper bound as

\displaystyle\frac{1+x}{1-x} = \frac{(1+x)^2}{1-x^2}                < \frac{4}{1-x^2},

which gives the weaker bound

\notag   \kappa_2(A) < \displaystyle\frac{4}{\det(A)} \Bigl(\displaystyle\frac{\mathrm{trace}(A)}{n}\Bigr)^n.     \qquad (5)

This bound is analogous to (2) and is up to a factor 4 larger than (4), this factor being attained for A = I.

If \mathrm{trace}(A) = n then (4) reduces to

\notag \begin{aligned}   \kappa_2(A) &< \displaystyle\frac{1 + \sqrt{1-\det(A)}}{1 - \sqrt{1-\det(A)}}                =\displaystyle\frac{\bigl(1 + \sqrt{1-\det(A)}\,\bigr)^2}{\det(A)}   \qquad(6)\\               &< \displaystyle\frac{4}{\det(A)}. \end{aligned}

These bounds hold for any positive definite matrix with unit diagonal, that is, any nonsingular correlation matrix.

We can sometimes get a sharper bound than (4) and (5) by writing A = DCD, where D = \mathrm{diag}(a_{ii}^{1/2}) and c_{ii} \equiv 1 (thus C is a correlation matrix), using

\notag \kappa_2(A) \le \kappa_2(D)^2 \kappa_2(C)           = \displaystyle\frac{\max_i a_{ii}}{\min_i a_{ii}} \kappa_2(C), \qquad (7)

and bounding \kappa_(C) using (6). For example, for the 5\times 5 Pascal matrix

\notag P_5 = \left[\begin{array}{ccccc} 1 & 1 & 1 & 1 & 1\\ 1 & 2 & 3 & 4 & 5\\ 1 & 3 & 6 & 10 & 15\\ 1 & 4 & 10 & 20 & 35\\ 1 & 5 & 15 & 35 & 70 \end{array}\right]

the condition number is \kappa_1(P_5) = 8.52 \times 10^3. The bounds from (4) and (5) are both 1.22 \times 10^7, whereas combining (4) and (7) gives a bound of 4.70 \times 10^6.

Notes

Many other condition number bounds are available in the literature. All have their pros and cons and any bound based on limited information such as traces of powers of A and the determinant will be potentially very weak.

A drawback of the bounds (3)–(6) is that they require \det(A). Sometimes the determinant is easily computable, as for a Vandermonde matrix, or can be bounded: for example, |\det(A)| \ge 1 for a matrix with integer entries. If a Cholesky, LU, or QR factorization of A is available then |\det(A)| is easily computable, but in this case a good order of magnitude estimate of the condition number can be cheaply computed using condition estimation techniques (Higham, 2002, Chapter 15).

The bounds (3) and (4) are used by Higham and Lettington (2021) in investigating the most ill conditioned 4\times 4 symmetric matrices with integer elements bounded by 10; see What Is the Wilson Matrix?

References

This is a minimal set of references, which contain further useful references within.

What Is the Wilson Matrix?

The 4\times 4 matrix

\notag   W = \begin{bmatrix}      5 & 7  & 6  & 5 \\      7 & 10 & 8  & 7 \\      6 & 8  & 10 & 9 \\      5 & 7  & 9  & 10   \end{bmatrix}

appears in a 1946 paper by Morris, in which it is described as having been “devised by Mr. T. S. Wilson.” The matrix is symmetric positive definite with determinant 1 and inverse

\notag    W^{-1} =   \begin{bmatrix}    68 & -41 & -17 & 10\\   -41 &  25 &  10 & -6\\   -17 &  10 &   5 & -3\\    10 &  -6 &  -3 &  2    \end{bmatrix},

so it is moderately ill conditioned with \kappa_2(W) = \|W\|_2 \|W^{-1}\|_2 \approx 2.98409\times 10^3. This little matrix has been used as an example and for test purposes in many research papers and books over the years, in particular by John Todd, who described it as “the notorious matrix W of T. S. Wilson”.

Rutishauser (1968) stated that “the famous Wilson matrix is not a very striking example of an ill-conditioned matrix”, on the basis that \kappa_2(A)\le 40{,}000 for a “positive definite symmetric 4\times 4 matrix with integer elements not exceeding 10” and he gave the positive definite matrix

\notag   A_0 = \begin{bmatrix}      10 & 1  & 4  &  0 \\      1  & 10 & 5  & -1 \\      4  & 5  & 10 &  7 \\      0  & -1 & 7  &  9   \end{bmatrix}, \quad    A_0^{-1} =\begin{bmatrix}       105& 167 & -304 & 255\\       167 & 266 & -484 & 406\\      -304 & -484 & 881 & -739\\       255 & 406 & -739 & 620       \end{bmatrix},

for which \kappa_2(A_0) = 3.57924\times 10^4. The matrix A_0 is therefore a factor 12 more ill conditioned than W. Rutishauser did not give a proof of the stated bound.

Moler (2018) asked how ill-conditioned W is relative to matrices in the set

\notag \begin{aligned}    \mathcal{S} &= \{\, A\in\mathbb{R}^{4\times 4}: A=A^T       \mathrm{~is~nonsingular~with~integer~entries}\nonumber\\       & \hspace{2.9cm} \mathrm{between~1~and~10} \,\}. \end{aligned}

He generated one million random matrices from \mathcal{S} and found that about 0.21 percent of them had a larger condition number than W. The matrix with the largest condition number was the indefinite matrix

\notag    A_1 = \begin{bmatrix}             1  &  3  & 10  & 10\\             3  &  4  &  8  &  9\\             10 &   8 &   3 &  9\\             10 &   9 &   9 &  3           \end{bmatrix},    \quad      A_1^{-1} =       \begin{bmatrix}       573 & -804 &  159 &  25\\     -804 & 1128 & -223 & -35\\       159 & -223 &   44 &   7\\        25 &  -35 &    7 &   1        \end{bmatrix},

for which \kappa_2(A_1) \approx 4.80867\times 10^4. How far is this matrix from being a worst case?

As the Wilson matrix is positive definite, we are also interested in how ill conditioned a matrix in the set

\notag \begin{aligned}    \mathcal{P} &= \{\, A\in\mathbb{R}^{4\times 4}: A=A^T    \mathrm{~is~symmetric~positive~definite ~with~integer~entries}\nonumber\\       & \hspace{2.9cm} \mathrm{between~1~and~10} \,\} \end{aligned}

can be.

Condition Number Bounds

We first consider bounds on \kappa_2(A) for A \in \mathcal{S}. It is possible to obtain a bound from first principles by using the relation A^{-1} = \mathrm{adj}(A)/\det(A), where \mathrm{adj}(A) is the adjugate matrix, along with the fact that |\det(A)| \ge 1 since A has integer entries. Higham and Lettington (2021) found that the smallest bound they could obtain came from a bound of Merikoski et al. (1997): for nonsingular B\in\mathbb{R}^{n\times n},

\notag  \kappa_2(B) \le  \left(\displaystyle\frac{1+x}{1-x}\right)^{1/2}, \quad      x = \sqrt{1 - (n/\|B\|_F^2)^n |\det(B)|^2 }.

Applying this bound to A\in\mathcal{S}, using the fact that (1+x)/(1-x) is monotonically increasing for x\in(0,1), gives

\notag     \kappa_2(A) \le 2.97606\dots \times 10^5 =: \beta_S, \quad A\in\mathcal{S}.      \qquad (1)

Another result from Merikoski et al. (1997) gives, for symmetric positive definite C\in\in\mathbb{R}^{n\times n},

\notag      \kappa_2(C) \le \displaystyle\frac{1+x}{1-x}, \quad      x = \sqrt{1 - (n/\mathrm{trace}(C))^n \det(C) }.

For A\in\mathcal{P}, since \det(A) \ge 1 we have x \le \sqrt{1 - (1/10)^4}, and hence

\notag   \kappa_2(A) \le 3.99980 \times 10^4 =: \beta_P, \quad A\in\mathcal{P}.      \qquad (2)

Recall that Rutishauser’s bound is 4\times 10^4. The bounds (1) and (2) remain valid if we modify the definitions of \mathcal{S} and \mathcal{P} to allow zero elements (note that Rutishauser’s matrix A_0 has a zero element).

Experiment

The sets \mathcal{S} and \mathcal{P} are large: \mathcal{S} has on the order of 10^{10} elements. Exhaustively searching over the sets in reasonable time is possible with a carefully optimized code. Higham and Lettington (2021) use a MATLAB code that loops over all symmetric matrices with integer elements between 1 and 10 and

  • evaluates \det(A) from an explicit expression (exactly computed for such matrices) and discards A if the matrix is singular;
  • computes the eigenvalues \lambda_i of A and obtains the condition number as \kappa_2(A) = \max_i |\lambda_i|/\min_i |\lambda_i| (since A is symmetric); and
  • for \mathcal{P}, checks whether A is positive definite by checking whether the smallest eigenvalue is positive.

The code is available at https://github.com/higham/wilson-opt.

The maximum over \mathcal{S} is attained for

\notag   A_2 = \begin{bmatrix}                2 & 7  & 10 & 10\\                7 & 10 & 10 & 9\\               10 & 10 & 10 & 1\\               10 & 9  & 1  & 9             \end{bmatrix},   \quad   A_2^{-1} =   \begin{bmatrix}   640 & -987 &  323 &  240\\  -987 & 1522 & -498 & -370\\   323 & -498 &  163 &  121\\   240 & -370 &  121 &   90   \end{bmatrix},

which has \kappa_2(A_2) \approx 7.6119 \times 10^4. and determinant -1. The maximum over \mathcal{P} is attained for

\notag   A_3 =     \begin{bmatrix}      9  &   1 &    1 &    5\\      1  &  10 &    1 &    9\\      1  &   1 &   10 &    1\\      5  &   9 &    1 &   10  \end{bmatrix}, \quad    A_3^{-1} =   \begin{bmatrix}   188 &  347 & -13 & -405\\   347 &  641 & -24 & -748\\   -13 &  -24 &   1 &   28\\  -405 & -748 &  28 &  873  \end{bmatrix}.

which has \kappa_2(A_3) \approx 3.5529 \times 10^4 and determinant 1. Obviously, symmetric permutations of these matrices are also optimal.

The following table summarizes the condition numbers of the matrices discussed and how close they are to the bounds.

Matrix A Comment \kappa_2(A) \beta_S/\kappa_2(A) \beta_P/\kappa_2(A)
W Wilson matrix 2.98409\times 10^3 99.73 13.40
A_9 Rutishauser’s matrix 3.57924\times 10^4 8.31 1.12
A_1 By random sampling 4.80867\times 10^4 6.19
A_2 Optimal matrices in \mathcal{S} 7.61190\times 10^4 3.91
A_3 Optimal matrices in \mathcal{P} 3.55286\times 10^4 8.38 1.13

Clearly, the bounds are reasonably sharp.

We do not know how Wilson constructed his matrix or to what extent he tried to maximize the condition number subject to the matrix entries being small integers. One possibility is that he constructed it via the factorization in the next section.

Integer Factorization

The Cholesky factor of the Wilson matrix is

\notag R = \begin{bmatrix} \sqrt{5} & \frac{7\,\sqrt{5}}{5} & \frac{6\,\sqrt{5}}{5} & \sqrt{5}\\[\smallskipamount] 0 & \frac{\sqrt{5}}{5} & -\frac{2\,\sqrt{5}}{5} & 0\\[\smallskipamount] 0 & 0 & \sqrt{2} & \frac{3\,\sqrt{2}}{2}\\[\smallskipamount] 0 & 0 & 0 & \frac{\sqrt{2}}{2} \end{bmatrix} \quad (W = R^TR).

Apart from the zero (2,4) element, it is unremarkable. If we factor out the diagonal then we obtain the LDL^T factorization, which has rational elements:

\notag L = \begin{bmatrix}1 & 0 & 0 & 0\\ \frac{7}{5} & 1 & 0 & 0\\ \frac{6}{5} & -2 & 1 & 0\\ 1 & 0 & \frac{3}{2} & 1 \end{bmatrix}, \quad D = \begin{bmatrix}5 & 0 & 0 & 0\\ 0 & \frac{1}{5} & 0 & 0\\ 0 & 0 & 2 & 0\\ 0 & 0 & 0 & \frac{1}{2} \end{bmatrix}  \quad (W = LDL^T).

Suppose we drop the requirement of triangularity and ask whether the Wilson matrix has a factorization W = Z^T\!Z with a 4\times4 matrix Z of integers. It is known that every symmetric positive definite n\times n matrix A of integers with determinant 1 has a factorization A = Z^T\!Z with Z an n\times n matrix of integers as long as n \le 7, but examples are known for n = 8 for which the factorization does not exist. This result is mentioned by Taussky (1961) and goes back to Hermite, Minkowski, and Mordell. Higham and Lettington (2021) found the integer factor

\notag Z_0 = \begin{bmatrix}      2 &   3  &   2  &    2\\      1  &   1   &  2   &  1\\      0  &   0   &  1   &  2\\      0  &   0   &  1   &  1      \end{bmatrix}

of W, which is block upper triangular so can be thought of as a block Cholesky factor. Higham, Lettington, and Schmidt (2021) draw on recent research that links the existence of such factorizations to number-theoretic considerations of quadratic forms to show that for the existence of an integer solution Z to A = Z^TZ it is necessary that a certain quadratic equation in n variables has an integer solution. In the case of the Wilson matrix the equation is

2 w^2+x_1^2+x_1 x_2+x_1 x_3+x_2^2+x_2 x_3+x_3^2=952.

The authors solve this equation computationally and find Z_1 and two rational factors:

\notag Z_1=\left[ \begin{array}{cccc}  \frac{1}{2} & 1 & 0 & 1 \\  \frac{3}{2} & 2 & 3 & 3 \\  \frac{1}{2} & 1 & 0 & 0 \\  \frac{3}{2} & 2 & 1 & 0 \\ \end{array} \right], \quad Z_2=\left[ \begin{array}{@{\mskip2mu}rrrr}  \frac{3}{2} & 2 & 2 & 2 \\  \frac{3}{2} & 2 & 2 & 1 \\  \frac{1}{2} & 1 & 1 & 2 \\  -\frac{1}{2} & -1 & 1 & 1 \\ \end{array} \right].

They show that these matrices are the only factors Z\in\frac{1}{16}\mathbb{Z} of W up to left multiplication by integer orthogonal matrices.

Conclusions

The Wilson matrix has provided sterling service throughout the digital computer era as a convenient symmetric positive definite matrix for use in textbook examples and for testing algorithms. The recent discovery of its integer factorization has led to the development of new theory on when general n\times n integer matrices A can be factored as A = Z^TZ (when A is symmetric positive definite) or A = Z^2 (a problem also considered in Higham, Lettington, and Schmidt (2021)), with integer Z.

Olga Taussky Todd wrote in 1961 that “matrices with integral elements have been studied for a very long time and an enormous number of problems arise, both theoretical and practical.” We wonder what else can be learned from the Wilson matrix and other integer test matrices.

References

This is a minimal set of references, which contain further useful references within.

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.