# Matrix Rank Relations

Matrix rank is an important concept in linear algebra. While rank deficiency can be a sign of an incompletely or improperly specified problem (a singular system of linear equations, for example), in some problems low rank of a matrix is a desired property or outcome. Here we present some fundamental rank relations in a concise form useful for reference. These are all immediate consequences of the singular value decomposition (SVD), but we give elementary (albeit not entirely self-contained) proofs of them.

The rank of a matrix $A\in\mathbb{R}^{m\times n}$ is the maximum number of linearly independent columns, which is the dimension of the range space of $A$, $\mathrm{range}(A) = \{\, Ax: x \in\mathbb{R}^n \,\}$. An important but non-obvious fact is that this is the same as the maximum number of linearly independent rows (see (5) below).

A rank-$1$ matrix has the form $xy^*$, where $x$ and $y$ are nonzero vectors. Every column is a multiple of $x$ and every row is a multiple of $y^*$. A sum of $k$ rank-$1$ matrices has the form

$\notag A = \displaystyle\sum_{i=1}^{k} x_iy_i^* = \begin{bmatrix} x_1 & x_2 & \dots & x_k \end{bmatrix} \begin{bmatrix} y_1^* \\ y_2^* \\ \vdots\\ y_k^* \end{bmatrix} \equiv XY^*. \qquad (0)$

Each column of $A$ is a linear combination of the vectors $x_1$, $x_2$, …, $x_k$, so $A$ has at most $k$ linearly independent columns, that is, $A$ has rank at most $k$. In fact, $\mathrm{rank}(A) = k$ if $X$ and $Y$ have rank $k$, as follows from (4) below. Any rank-$k$ matrix can be written in the form $(0)$ with $X$ and $Y$ of rank $k$; indeed this is the full-rank factorization below.

Here are some fundamental rank equalities and inequalities.

## Rank-Nullity Theorem

The rank-nullity theorem says that

$\notag \boxed{ \mathrm{rank}(A) + \mathrm{dim}( \mathrm{null}(A) ) = n, \quad A\in\mathbb{R}^{m\times n},}$

where $\mathrm{null}(A) = \{\, x \in\mathbb{R}^n: Ax = 0 \,\}$ is the null space of $A$.

## Rank Bound

The rank cannot exceed the number of columns, or, by (5) below, the number of rows:

$\notag \boxed{ \mathrm{rank}(A) \le \min(m,n), \quad A\in\mathbb{C}^{m\times n}. }$

## Rank of a Sum

For any $A$ and $B$ of the same dimension,

$\notag \boxed{|\mathrm{rank}(A) - \mathrm{rank}(B)| \le \mathrm{rank}(A+B) \le \mathrm{rank}(A) + \mathrm{rank}(B).} \qquad (1)$

The upper bound follows from the fact that the dimension of the sum of two subspaces cannot exceed the sum of the dimensions of the subspaces. Interestingly, the upper bound is also a corollary of the bound (3) for the rank of a matrix product, because

\notag \begin{aligned} \mathrm{rank}(A+B) &= \mathrm{rank}\biggl( \begin{bmatrix} A & B \end{bmatrix} \begin{bmatrix} I \\ I \end{bmatrix} \biggr)\\ &\le \min\biggl(\mathrm{rank}\bigl(\begin{bmatrix} A & B \end{bmatrix}\bigr), \mathrm{rank}\biggl(\begin{bmatrix} I \\ I \end{bmatrix} \biggr)\biggr)\\ &\le \mathrm{rank}\bigl(\begin{bmatrix} A & B \end{bmatrix}\bigr)\\ &\le \mathrm{rank}(A) + \mathrm{rank}(B). \end{aligned}

For the lower bound, writing $A = -B + A+B$ and applying the upper bound gives $\mathrm{rank}(A) \le \mathrm{rank}(-B) + \mathrm{rank}(A+B) = \mathrm{rank}(B) + \mathrm{rank}(A+B)$, and likewise with the roles of $A$ and $B$ interchanged.

## Rank of $A$ and $A^*A$

For any $A$,

$\notag \boxed{\mathrm{rank}(A^*A) = \mathrm{rank}(A).} \qquad (2)$

Indeed $Ax = 0$ implies $A^*Ax = 0$, and $A^*Ax = 0$ implies $0 = x^*A^*Ax = (Ax)^*(Ax)$, which implies $Ax = 0$. Hence the null spaces of $A$ and $A^*A$ are the same. The equality (2) follows from the rank-nullity theorem.

## Rank of a General Product

For any $A$ and $B$ for which the product $AB$ is defined,

$\notag \boxed{\mathrm{rank}(AB) \le \min\bigl( \mathrm{rank}(A), \mathrm{rank}(B) \bigr).} \qquad (3)$

If $B = [b_1,\dots,b_n]$ then $AB = [Ab_1,\dots,Ab_n]$, so the columns of $AB$ are linear combinations of those of $A$ and so $AB$ cannot have more linearly independent columns than $A$, that is, $\mathrm{rank}(AB) \le \mathrm{rank}(A)$. Using (5) below, we then have

$\notag \mathrm{rank}(AB) = \mathrm{rank}(B^*A^*) \le \mathrm{rank}(B^*) = \mathrm{rank}(B).$

The latter inequality can be proved without using (5) (our proof of which uses (3)), as follows. Suppose $\mathrm{rank}(B) < \mathrm{rank}(AB) = r$. Let the columns of $Y$ span $\mathrm{range}(AB)$, so that $Y$ has $r$ columns and $Y = ABZ$ for some matrix $Z$ with $r$ columns. Now $\mathrm{rank}(BZ) \le \mathrm{rank}(B) < r$ by the first part, so $BZg = 0$ for some nonzero $g$. But then $Yg = ABZg = 0$, which contradicts the linear independence of the columns of $Y$, so we must have $\mathrm{rank}(B) \ge \mathrm{rank}(AB)$.

## Rank of a Product of Full-Rank Matrices

We have

$\notag \boxed{ \mathrm{rank}(AB) = r, \quad A\in\mathbb{C}^{m\times r}, \; B\in\mathbb{C}^{r\times n}, \; \mathrm{rank}(A) = \mathrm{rank}(B) = r .} \qquad (4)$

We note that $A^*A$ and $BB^*$ are both nonsingular $r\times r$ matrices by (2), so their product has rank $r$. Using (3),

$\notag r = \mathrm{rank}(A^*A BB^*) \le \mathrm{rank}(A B) \le r,$

and hence $\mathrm{rank}(A B) = r$.

Another important relation is

$\notag \boxed{ \mathrm{rank}(XAY ) = \mathrm{rank}(A), \quad X\in\mathbb{C}^{m\times m} \;\mathrm{and}\; Y\in\mathbb{C}^{n\times n}\; \mathrm{nonsingular}. }$

This is a consequence of the equality $\mathrm{range}(XAY) = X\mathrm{range}(A)Y$ for nonsingular $X$ and $Y$.

## Ranks of $A$ and $A^*$

By (2) and (3) we have $\mathrm{rank}(A) = \mathrm{rank}(A^*A) \le \mathrm{rank}(A^*)$. Interchanging the roles of $A$ and $A^*$ gives $\mathrm{rank}(A^*) \le \mathrm{rank}(A)$ and so

$\notag \boxed{ \mathrm{rank}(A^*) = \mathrm{rank}(A). } \qquad (5)$

In other words, the rank of $A$ is equal to the maximum number of linearly independent rows as well as the maximum number of linearly independent columns.

## Full-Rank Factorization

$A\in\mathbb{C}^{m \times n}$ has rank $r$ if and only if $A = GH$ for some $G\in\mathbb{C}^{m \times r}$ and $H\in\mathbb{C}^{r \times n}$, both of rank $r$, and this is called a full-rank factorization. The existence of such a factorization implies that $\mathrm{rank}(A) = r$ by (4). Conversely, suppose that $A$ has rank $r$. Let the columns of $X\in\mathbb{C}^{m \times r}$ form a basis for the range space of $A$. Then there are $r$-vectors $y_j$ such that $a_j = Xy_j$, $j = 1\colon n$, and with $Y = [y_1,y_2,\dots, y_n]$ we have $A = XY$. Finally, $r = \mathrm{rank}(A) = \mathrm{rank}(XY) \le \mathrm{rank}(Y)$ by (3), and since $\mathrm{rank}(Y) \le r$ we have $\mathrm{rank}(Y) = r$.

## Rank and Minors

A characterization of rank that is sometimes used as the definition is that it is the size of the largest nonsingular square submatrix. Equivalently, the rank is the size of the largest nonzero minor, where a minor of size $k$ is the determinant of a $k\times k$ submatrix.

## rank(AB) and rank(BA)

Although $AB$ and $BA$ have some properties in common when both products are defined (notably they have the same nonzero eigenvalues), $\mathrm{rank}(AB)$ is not always equal to $\mathrm{rank}(BA)$. A simple example is $A = x$ and $B = y^*$ with $x$ and $y$ orthogonal vectors: $AB = xy^*$ but $BA = y^*x = 0$. An example with square $A$ and $B$ is

$\notag \begin{gathered} A = \begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix}, \quad B = \begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix}, \\ \mathrm{rank}(AB) = \mathrm{rank}\biggl( \begin{bmatrix} 0 & 0 \\ 0 & 0 \end{bmatrix} \biggr) = 0, \quad \mathrm{rank}(BA) = \mathrm{rank}\biggl( \begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix} \biggr) = 1. \end{gathered}$

Note that $A = e_1e_2^T$ and $B = e_1e_1^T$, where $e_i$ has $1$ in the $i$th position and zeros everywhere else. Such matrices are easy to manipulate in this form (e.g., $AB = e_1 (e_2^Te_1)e_1^T = 0$) and are useful for constructing examples.

## How to Find Rank

If we have a full-rank factorization of $A$ then we can read off the rank from the dimensions of the factors. But finding a full-rank factorization is a nontrivial task. The ultimate full-rank factorization is the SVD

$\notag A = U\Sigma V^T,$

where $U\in\mathbb{R}^{m\times m}$ and $V\in\mathbb{R}^{n\times n}$ are orthogonal, $\Sigma = \mathrm{diag}(\sigma_1,\dots, \sigma_p)\in\mathbb{R}^{m\times n}$, where $p = \min(m,n)$, and $\sigma_1\ge \sigma_2\ge \cdots \ge \sigma_r > 0 = \sigma_{r+1} = \cdots = \sigma_p = 0$. The rank of $A$ is $r$, the number of nonzero singular values.

In floating-point arithmetic, the standard algorithms for computing the SVD are numerically stable, that is, the computed singular values are the exact singular values of a matrix $A + \Delta A$ with $\|\Delta A\|_2 \le c_{m,n}u\|A\|$, where $c_{m,n}$ is a constant and $u$ is the unit roundoff. Unfortunately, $A + \Delta A$ will typically be full rank when $A$ is rank deficient. For example, consider this computation.

>> n = 4; A = zeros(n); A(:) = 1:n^2, svd(A)
A =
1     5     9    13
2     6    10    14
3     7    11    15
4     8    12    16
ans =
3.8623e+01
2.0713e+00
1.5326e-15
1.3459e-16


The matrix has rank $2$ and the two zero singular values are approximated by computed singular values of order $10^{-15}$. In general, we have no way to know whether tiny computed singular values signify exactly zero singular values. In practice, one typically defines a numerical rank based on a threshold and regards computed singular values less than the threshold as zero. Indeed the MATLAB rank function computes the rank as the number of singular values exceeding $2u \max(m,n)\widehat{\sigma}_1$, where $\widehat{\sigma}_1$ is the largest computed singular value. If the data from which the matrix is constructed is uncertain then the definition of numerical rank should take into account the level of uncertainty in the data. Dealing with rank deficiency in the presence of data errors and in finite precision arithmetic is a tricky business.

## References

An excellent reference for further rank relations is Horn and Johnson. Stewart describes some of the issues associated with rank-deficient matrices in practical computation.

# Diagonally Perturbing a Symmetric Matrix to Make It Positive Definite

Suppose $A$ is a matrix that is symmetric but not positive definite. What is the best way to perturb the diagonal to make $A$ positive definite? We want to compute a vector $d$ such that

$\notag A(d) = A + D, \quad D = \mathrm{diag}(d),$

is positive definite. Since the positive definite matrices form an open set there is no minimal $d$, so we relax the requirement to be that $A$ is positive semidefinite. The perturbation $D$ needs to make any negative eigenvalues of $A$ become nonnegative. We will require all the entries of $d$ to be nonnegative. Denote the eigenvalues of $A$ by $\lambda_n(A) \le \lambda_{n-1}(A) \le \cdots \le \lambda_1(A)$ and assume that $\lambda_n(A) < 0$.

A natural choice is to take $D$ to be a multiple of the identity matrix. For $d_i \equiv \delta$, $A(d)$ has eigenvalues $\lambda_i + \delta$, and so the smallest possible $\delta$ is $\delta = - \lambda_n(A)$. This choice of $D$ shifts all the diagonal elements by the same amount, which might be undesirable for a matrix with widely varying diagonal elements.

When the diagonal entries of $A$ are positive another possibility is to take $d_i = \alpha a_{ii}$, so that each diagonal entry undergoes a relative perturbation of size $\alpha$. Write $D_A = \mathrm{diag}(a_{ii})$ and note that $C = D_A^{-1/2}A D_A^{-1/2}$ is symmetric with unit diagonal. Then

$\notag A + \alpha D_A = D_A^{1/2}(C + \alpha I)D_A^{1/2}.$

Since $A + \alpha D_A$ is positive semidefinite if and only if $C + \alpha I$ is positive semidefinite, the smallest possible $\alpha$ is $\alpha = -\lambda_n(C)$.

More generally, we can treat the $d_i$ as $n$ independent variables and ask for the solution of the optimization problem

$\notag \min \|d\| ~~\mathrm{subject~to}~~ A + \mathrm{diag}(d) ~\mathrm{positive~semidefinite}, ~d \ge 0. \qquad(\dagger)$

Of particular interest are the norms $\|d\|_1 = \sum_i d_i = \mathrm{trace}(D)$ (since $d\ge0)$ and $\|d\|_\infty = \max_i d_i$.

If $A+D$ is positive semidefinite then from standard eigenvalue inequalities,

$\notag 0 \le \lambda_n(A+D) \le \lambda_n(A) + \lambda_1(D),$

so that

$\notag \max_i d_i \ge -\lambda_n(A).$

Since $d_i \equiv -\lambda_n(A)$ satisfies the constraints of $(\dagger)$, this means that this $d$ solves $(\dagger)$ for the $\infty$-norm, though the solution is obviously not unique in general.

For the $1$– and $2$-norms, $(\dagger)$ does not have an explicit solution, but it can be solved by semidefinite programming techniques.

Another approach to finding a suitable $D$ is to compute a modified Cholesky factorization. Given a symmetric $A$, such a method computes a perturbation $E$ such that $A + E = R^TR$ for an upper triangular $R$ with positive diagonal elements, so that $A + E$ is positive definite. The methods of Gill, Murray, and Wright (1981) and Schnabel and Eskow (1990) compute a diagonal $E$. The cost in flops is essentially the same as that of computing a Cholesky factorization ($n^3/3$) flops), so this approach is likely to require fewer flops than computing the minimum eigenvalue or solving an optimization problem, but the perturbations produces will not be optimal.

## Example

We take the $5\times 5$ Fiedler matrix

>> A = gallery('fiedler',5)
A =
0     1     2     3     4
1     0     1     2     3
2     1     0     1     2
3     2     1     0     1
4     3     2     1     0


The smallest eigenvalue is $-5.2361$, so $A+D$ is positive semidefinite for $D_1 = 5.2361I$. The Gill–Murray–Wright method gives $D_{\mathrm{GMW}}$ with diagonal elements

2.0000e+01   4.6159e+00   1.3194e+00   2.5327e+00   1.0600e+01


and has $\lambda_5(A+D_{\mathrm{GMW}}) = 0.5196$ while the Schnabel–Eskow method gives $D_{\mathrm{SE}}$ with diagonal elements

6     6     6     6     6


and has $\lambda_5(A+D_{\mathrm{SE}}) = 0.7639$. If we increase the diagonal elements of $D_1$ by 0.5 to give comparable smallest eigenvalues for the perturbed matrices then we have

$\Vert d\Vert_{\infty}$ $\Vert d \Vert_1$
Shift 5.2361 26.180
Gill-Murray-Wright 20.000 39.068
Schnabel–Eskow 6.000 30.000

# When Does Thresholding Preserve Positive Definiteness?

Does a symmetric positive definite matrix remain positive definite when we set one or more elements to zero? This question arises in thresholding, in which elements of absolute value less than some tolerance are set to zero. Thresholding is used in some applications to remove correlations thought to be spurious, so that only statistically significant ones are retained.

We will focus on the case where just one element is changed and consider an arbitrary target value rather than zero. Given an $n\times n$ symmetric positive definite matrix $A$ we define $A(t)$ to be the matrix resulting from adding $t$ to the $(i,j)$ and $(j,i)$ elements and we ask when is $A(t)$ positive definite. We can write

$\notag A(t) = A + t(e_i^{}e_j^T + e_j^{}e_i^T) \equiv A + tE_{ij},$

where $e_i$ is the $i$th column of the identity matrix. The perturbation $E_{ij}$ has rank $2$, with eigenvalues $-1$, $1$, and $0$ repeated $n-2$ times. Hence we can write $E_{ij}$ in the form $E_{ij} = pp^T - qq^T$, where $p^Tp = q^Tq = 1$ and $p^Tq = 0$. Adding $pp^T$ to $A$ causes each eigenvalue to increase or stay the same, while subtracting $qq^T$ decreases or leaves unchanged each eigenvalue. However, more is true: after each of these rank-$1$ perturbations the eigenvalues of the original and perturbed matrices interlace, by Weyl’s theorem. Hence, with the eigenvalues of $A$ ordered as $\lambda_n(A) \le \cdots \le \lambda_1(A)$, we have (Horn and Johnson, Cor. 4.3.7)

\notag \begin{aligned} \lambda_n(A(t)) &\le \lambda_{n-1}(A), \\ \lambda_{i+1}(A) &\le \lambda_i(A(t)) \le \lambda_{i-1}(A), \quad i = 2\colon n-1, \\ \lambda_2(A) &\le \lambda_1(A(t)). \end{aligned}

Because $A$ is positive definite these inequalities imply that $\lambda_{n-1}(A(t)) \ge \lambda_n(A) > 0$, so $A(t)$ has at most one negative eigenvalue. Since $\det(A(t))$ is the product of the eigenvalues of $A(t)$ this means that $A(t)$ is positive definite precisely when $\det(A(t)) > 0$.

There is a simple expression for $\det(A(t))$, which follows from a lemma of Chan (1984), as explained by Georgescu, Higham, and Peters (2018):

$\notag \det(A(t)) = \det(A)\big(1+ 2t b_{ij} + t^2(b_{ij}^2-b_{ii}b_{jj})\big),$

where $B = A^{-1}$. Hence the condition for $A(t)$ to be positive definite is

$\notag q_{ij}(t) = 1 + 2t b_{ij} + t^2(b_{ij}^2-b_{ii}b_{jj}) > 0.$

We can factorize

$\notag q_{ij}(t) = \Bigl( t\bigl(b_{ij} - \sqrt{b_{ii}b_{jj}}\bigr) + 1 \Bigr) \Bigl( t\bigl(b_{ij} + \sqrt{b_{ii}b_{jj}}\bigr) + 1 \Bigr),$

so $q_{ij}(t) > 0$ for

$\notag t\in \left( \displaystyle\frac{-1}{ \sqrt{b_{ii}b_{jj}} + b_{ij} }, \displaystyle\frac{1}{ \sqrt{b_{ii}b_{jj}} - b_{ij} } \right) =: I_{ij},$

where the endpoints are finite because $B$, like $A$, is positive definite and so $|b_{ij}| < \sqrt{b_{ii}b_{jj}}$.

The condition for $A$ to remain positive definite when $a_{ij}$ is set to zero is $q_{ij}(-a_{ij}) > 0$, or equivalently $-a_{ij} \in I_{ij}$. To check either of these conditions we need just $b_{ij}$, $b_{ii}$, and $b_{jj}$. These elements can be computed without computing the whole inverse by solving the equations $Ab_k = e_k$ for $k = i,j$, for the $k$th column $b_k$ of $B$, making use of a Cholesky factorization of $A$.

As an example, we consider the $4\times 4$ Lehmer matrix, which has $(i,j)$ element $i/j$ for $i \ge j$:

$\notag A = \begin{bmatrix} 1 & \frac{1}{2} & \frac{1}{3} & \frac{1}{4} \\[3pt] \frac{1}{2} & 1 & \frac{2}{3} & \frac{1}{2} \\[3pt] \frac{1}{3} & \frac{2}{3} & 1 & \frac{3}{4} \\[3pt] \frac{1}{4} & \frac{1}{2} & \frac{3}{4} & 1 \end{bmatrix}.$

The smallest eigenvalue of $A$ is $0.208$. Any off-diagonal element except the $(2,4)$ element can be zeroed without destroying positive definiteness, and if the $(2,4)$ element is zeroed then the new matrix has smallest eigenvalue $-0.0249$. For $i=2$ and $j=4$, the following plot shows in red $\lambda_{\min}(A(t))$ and in blue $q_{24}(t)$; the black dots are the endpoints of the closure of the interval $I_{24} = (-0.453,0.453)$ and the vertical black line is the value $-a_{24}$. Clearly, $-a_{24}$ lies outside $I_{24}$, which is why zeroing this element causes a loss of positive definiteness. Note that $I_{24}$ also tells us that we can increase $a_{24}$ to any number less than $0.953$ without losing definiteness.

Given a positive definite matrix and a set $S$ of elements to be modified we may wish to determine subsets (including a maximal subset) of $S$ for which the modifications preserve definiteness. Efficiently determining these subsets appears to be an open problem.

In practical applications thresholding may lead to an indefinite matrix. Definiteness must then be restored to obtain a valid correlation matrix. One way to do this is to find the nearest correlation matrix in the Frobenius norm such that the zeroed elements remain zero. This can be done by the alternating projections method with a projection to keep the zeroed elements fixed. Since the nearest correlation matrix is positive semidefinite, it is also desirable to to incorporate a lower bound $\delta > 0$ on the smallest eigenvalue, which corresponds to another projection. Both these projections are supported in the algorithm of Higham and Strabić (2016), implemented in the code at https://github.com/higham/anderson-accel-ncm. For the Lehmer matrix, the nearest correlation matrix with zero $(2,4)$ element and eigenvalues at least $\delta = 0.01$ is (to four significant figures)

$\notag \begin{bmatrix} 1 & 0.4946 & 0.3403 & 0.2445 \\ 0.4946 & 1 & 0.6439 & 0 \\ 0.3403 & 0.6439 & 1 & 0.7266 \\ 0.2445 & 0 & 0.7266 & 1 \end{bmatrix}.$

A related question is for what patterns of elements that are set to zero is positive definiteness guaranteed to be preserved for all positive definite $A$? Clearly, setting all the off-diagonal elements to zero preserves definiteness, since the diagonal of a positive definite matrix is positive. Guillot and Rajaratnam (2012) show that the answer to the question is that the new matrix must be a symmetric permutation of a block diagonal matrix. However, for particular $A$ this restriction does not necessarily hold, as the Lehmer matrix example shows.

# Randsvd Matrices with Large Growth Factors

Sixty years ago James Wilkinson published his backward error analysis of Gaussian elimination for solving a linear system $Ax = b$, where $A$ is a nonsingular $n\times n$ matrix. He showed that in floating-point arithmetic the computed solution $\widehat{x}$ satisfies

$(A+\Delta A) \widehat{x} = b, \qquad \|\Delta A\|_{\infty} \le p(n) \rho_n u \|A\|_{\infty},$

where $u$ is the unit roundoff and $p$ is a low degree polynomial. The term $\rho_n$ is the growth factor, defined by

$\rho_n = \displaystyle\frac{\max_{i,j,k} |a_{ij}^{(k)}|} {\max_{i,j}|a_{ij}|} \ge 1,$

where the $a_{ij}^{(k)}$ are the elements at the $k$th stage of Gaussian elimination. The growth factor measures how much elements grow during the elimination. We would like the product $p(n)\rho_n$ to be of order 1, so that $\Delta A$ is a small relative perturbation of $A$. We therefore need $\rho_n$ not to be too large.

With partial pivoting, in which row interchanges are used to ensure that at each stage the pivot element is the largest in its column, Wilkinson showed that $\rho_n \le 2^{n-1}$ and that equality is possible. Such exponential growth implies a large $\Delta A$ (unless we are lucky), meaning a severe loss of numerical stability. However, seventy years of digital computing experience have shown that $\rho_n$ is usually of modest size in practice. Explaining why this is the case is one of the outstanding problems in numerical analysis.

It is easy to experiment with growth factors in MATLAB. I will use the function

function g = gf(A)
%GF     Approximate growth factor.
%   g = GF(A) is an approximation to the
%   growth factor for LU factorization
%   with partial pivoting.
[~,U] = lu(A);
g = max(abs(U),[],'all')/max(abs(A),[],'all');


It computes a lower bound on the growth factor (since it only considers $k=n$ in the numerator in the definition), but it is entirely adequate for our purposes here. Let’s compute the growth factor for a random matrix of order 10,000 with elements from the standard normal distribution (mean 0, variance 1):

>> rng(1); n = 10000; gf(randn(n))
ans =
6.1335e+01


Growth of 61 is unremarkable for a matrix of this size. Now we try a matrix of the same size generated by the gallery('randsvd') function:

>> A = gallery('randsvd',n,1e6,2,[],[],1);
>> gf(A)
ans =
9.7544e+02


This function generates an $n\times n$ matrix with known singular value distribution and with singular vector matrices that are random orthogonal matrices from the Haar distribution. The parameter 1e6 specifies the 2-norm condition number, while the 2 (the mode parameter) specifies that there is only one small singular value, so the singular values are 1 repeated $n-1$ times and 1e-6. Growth of 975 is exceptional! These matrices have been in MATLAB since the 1990s, but this large growth property has apparently not been noticed before.

It turns out that mode 2 randsvd matrices generate with high probability growth factors of size at least $n/(4 \log n)$ for any condition number and for any pivoting strategy, not just partial pivoting. One way to check this is to randomly permute the columns of $A$ before doing the LU factorization with partial pivoting:

>> gf(A(:,randperm(n)))
ans =
7.8395e+02


Here is a plot showing the maximum over 12 randsvd matrices for each $n$ of the growth factors for three different pivoting strategies, along with the maximum growth factors for partial pivoting for rand and randn matrices. The black curve is $n/(4 \log n)$. This plot emphasizes the unusually large growth for mode 2 randsvd matrices.

What is the explanation for this large growth? It stems from three facts.

• Haar distributed orthogonal matrices have the property that that their elements are fairly small with high probability, as shown by Jiang in 2005.
• If the largest entries in magnitude of $A$ and $A^{-1}$ are both small, in the sense that their product is $\theta \ll 1$, then $A$ will produce a growth factor of at least $1/\theta$ for any pivoting strategy. This was proved by Des Higham and I in the paper Large Growth Factors in Gaussian Elimination with Pivoting.
• If $W$ is an orthogonal matrix generating large growth then a rank-1 perturbation of 2-norm at most 1 tends to preserve the large growth.

For full details see the new EPrint Random Matrices Generating Large Growth in LU Factorization with Pivoting by Des Higham, Srikara Pranesh and me.

Is growth of order $n$ a problem in practice? It can be for two reasons.

• The largest dense linear systems $Ax = b$ solved today are of dimension $n = 10^7$. If we work in single precision then $nu \approx 1$ and so LU factorization can potentially be completely unstable if there is growth of order $n$.
• For IEEE half precision arithmetic growth of order $n$ will cause overflow once $n$ exceeds $10^5 / \max_{i,j} |a_{ij}|$. It was overflow in half precision LU factorization on randsvd matrices that alerted us to the large growth.

# Singular Values of Rank-1 Perturbations of an Orthogonal Matrix

What effect does a rank-1 perturbation of norm 1 to an $n\times n$ orthogonal matrix have on the extremal singular values of the matrix? Here, and throughout this post, the norm is the 2-norm. The largest singular value of the perturbed matrix is bounded by $2$, as can be seen by taking norms, so let us concentrate on the smallest singular value.

Consider first a perturbation of the identity matrix: $B = I + xy^T$, for unit norm $x$ and $y$. The matrix $B$ has eigenvalues 1 (repeated $n-1$ times) and $1 + y^Tx$. The matrix is singular—and hence has a zero singular value—precisely when $y^Tx = -1$, which is the smallest value that the inner product $y^Tx$ can take.

Another example is $B = A + yy^T$, where $A = I - 2yy^T$ and $y$ has unit norm, so that $A$ is a Householder matrix. Here, $B = I - yy^T$ is singular with null vector $y$, so it has a zero singular value,

Let’s take a random orthogonal matrix and perturb it with a random rank-1 matrix of unit norm. We use the following MATLAB code.

n = 100; rng(1)
A = gallery('qmult',n); % Random Haar distrib. orthogonal matrix.
x = randn(n,1); y = randn(n,1);
x = x/norm(x); y = y/norm(y);
B = A + x*y';
svd_B = svd(B);
max_svd_B = max(svd_B), min_svd_B = min(svd_B)


The output is

max_svd_B =
1.6065e+00
min_svd_B =
6.0649e-01


We started with a matrix having singular values all equal to 1 and now have a matrix with largest singular value a little larger than 1 and smallest singular value a little smaller than 1. If we keep running this code the extremal singular values of $B$ do not change much; for example, the next run gives

max_svd_B =
1.5921e+00
min_svd_B =
5.9213e-01


A rank-1 perturbation of unit norm could make $A$ singular, as we saw above, but our random perturbations are producing a well conditioned matrix.

What is the explanation? First, note that a rank-1 perturbation to an orthogonal matrix $A$ can only change two of the singular values, because the singular values are the square roots of the eigenvalues of $A^T A$, which is the identity plus a rank-$2$ matrix. So $n-2$ singular values remain at 1.

A result of Benaych-Georges and Nadakuditi (2012) says that for large $n$ the largest and smallest singular values of $B$ tend to $(1+\sqrt{5})/2 = 1.618\dots$ and $(-1+\sqrt{5})/2 = 0.618\dots$ respectively! As our example shows, $n$ does not have to be large for these limits to be approximations correct to roughly the first digit.

The result in question requires the original orthogonal matrix to be from the Haar distribution, and such matrices can be generated by A = gallery('qmult',n) or by the construction

[Q,R] = qr(randn(n));
Q = Q*diag(sign(diag(R)));


(See What Is a Random Orthogonal Matrix?.) The result also requires $u$ and $v$ to be unit-norm random vectors with independent entries from the same distribution.

However, as the next example shows, the perturbed singular values can be close to the values that the Benaych-Georges and Nadakuditi result predicts even when the conditions of the result are violated:

n = 100; rng(1)
A = gallery('orthog',n);   % Random orthogonal matrix (not Haar).
x = rand(n,1); y = (1:n)'; % Non-random y.
x = x/norm(x); y = y/norm(y);
B = A + x*y';
svd_B = svd(B);
max_svd_B = max(svd_B), min_svd_B = min(svd_B)

max_svd_B =
1.6069e+00
min_svd_B =
6.0687e-01


The question of the conditioning of a rank-1 perturbation of an orthogonal matrix arises in the recent EPrint Random Matrices Generating Large Growth in LU Factorization with Pivoting.

# Accurately Computing the Softmax Function

The softmax function takes as input an $n$-vector $x$ and returns a vector $g(x)$ with elements

$g_j(x) = \displaystyle\frac{\mathrm{e}^{x_j}}{\sum_{i=1}^n \mathrm{e}^{x_i}}, \quad j=1\colon n,$

The elements of $g$ are all between $0$ and $1$ and they sum to 1, so $g$ can be regarded as a vector of probabilities. Softmax is a key function in machine learning algorithms.

Softmax is the gradient vector of the log-sum-exp function

$f(x) = \displaystyle\log \sum_{i=1}^n \mathrm{e}^{x_i}.$

This function is an approximation to the largest element, $x_{\max} = \max_i x_i$ of the vector $x$, as it lies between $x_{\max}$ and $x_{\max} + \log n$.

A problem with numerical evaluation of log-sum-exp and softmax is that overflow is likely even for quite modest values of $x_i$ because of the exponentials, even though $g(x)$ cannot overflow and $f(x)$ is very unlikely to do so.

A standard solution it to incorporate a shift, $a$, and use the formulas

$f(x) = a + \displaystyle\log \sum_{i=1}^n \mathrm{e}^{x_i-a}, \hspace*{4.5cm}(1)$

and

$g_j(x) = \displaystyle\frac{\mathrm{e}^{x_j-a}}{\sum_{i=1}^n \mathrm{e}^{x_i-a}}, \quad j=1\colon n, \hspace*{3.3cm}(2)$

where $a$ is usually set to $x_{\max}$.

Another formula for softmax is obtained by moving the denominator into the numerator:

$g_j(x) = \exp\left(x_j - a - \log\displaystyle\sum_{i=1}^n\mathrm{e}^{x_i -a}\right). \hspace*{2cm}(3)$

This formulas is used in various codes, including in the SciPy 1.4.1 function softmax.

How accurate are these formulas when evaluated in floating-point arithmetic? To my knowledge, this question has not been addressed in the literature, but it is particularly important given the growing use of low precision arithmetic in machine learning. Two questions arise. First, is there any difference between the accuracy of the formulas (2) and (3) for $g_j(x)$? Second, in (1) and (3), $a$ is added to a nonnegative log term, so when $a = x_{\max}$ is negative can there be damaging subtractive cancellation?

In a recent EPrint with Pierre Blanchard and Des Higham I have investigated these questions using rounding error analysis and analysis of the conditioning of the log-sum-exp and softmax problems. In a nutshell, our findings are that while cancellation can happen, it is not a problem: the shifted formulas (1) and (2) can be safely used.

However, the alternative softmax formula (3) is not recommended, as its rounding error bounds are larger than for (2) and we have found it to produce larger errors in practice.

Here is an example from training an artificial neural network using the MATLAB Deep Learning Toolbox. The network is trained to classify handwritten digits from the widely used MNIST data set. The following figure shows the sum of the computed elements of the softmax vector $g(x)$ for 2000 vectors extracted from the training data, where $g(x)$ was computed in IEEE half precision arithmetic. The sum should be 1. The red circles are for formula (2) and the blue crosses are for the division-free formula (3). Clearly, (2) gives a better approximation to a vector of probabilities (in the sense of respecting the constraint that probabilities sum to unity); the actual errors in each vector component are also smaller for (2).

# Half Precision Arithmetic: fp16 Versus bfloat16

The 2008 revision of the IEEE Standard for Floating-Point Arithmetic introduced a half precision 16-bit floating point format, known as fp16, as a storage format. Various manufacturers have adopted fp16 for computation, using the obvious extension of the rules for the fp32 (single precision) and fp64 (double precision) formats. For example, fp16 is supported by the NVIDIA P100 and V100 GPUs and the AMD Radeon Instinct MI25 GPU, as well as the A64FX Arm processor that will power the Fujitsu Post-K exascale computer.

## Bfloat16

Fp16 has the drawback for scientific computing of having a limited range, its largest positive number being $6.55 \times 10^4$. This has led to the development of an alternative 16-bit format that trades precision for range. The bfloat16 format is used by Google in its tensor processing units. Intel, which plans to support bfloat16 in its forthcoming Nervana Neural Network Processor, has recently (November 2018) published a white paper that gives a precise definition of the format.

The allocation of bits to the exponent and significand for bfloat16, fp16, and fp32 is shown in this table, where the implicit leading bit of a normalized number is counted in the significand.

Format Significand Exponent
bfloat16 8 bits 8 bits
fp16 11 bits 5 bits
fp32 24 bits 8 bits

Bfloat16 has three fewer bits in the significand than fp16, but three more in the exponent. And it has the same exponent size as fp32. Consequently, converting from fp32 to bfloat16 is easy: the exponent is kept the same and the significand is rounded or truncated from 24 bits to 8; hence overflow and underflow are not possible in the conversion.

On the other hand, when we convert from fp32 to the much narrower fp16 format overflow and underflow can readily happen, necessitating the development of techniques for rescaling before conversion—see the recent EPrint Squeezing a Matrix Into Half Precision, with an Application to Solving Linear Systems by me and Sri Pranesh.

The drawback of bfloat16 is its lesser precision: essentially 3 significant decimal digits versus 4 for fp16. The next table shows the unit roundoff $u$, smallest positive (subnormal) number xmins, smallest normalized positive number xmin, and largest finite number xmax for the three formats.

$u$ xmins xmin xmax
bfloat16 3.91e-03 (*) 1.18e-38 3.39e+38
fp16 4.88e-04 5.96e-08 6.10e-05 6.55e+04
fp32 5.96e-08 1.40e-45 1.18e-38 3.40e+38

(*) Unlike the fp16 format, Intel’s bfloat16 does not support subnormal numbers. If subnormal numbers were supported in the same way as in IEEE arithmetic, xmins would be 9.18e-41.

The values in this table (and those for fp64 and fp128) are generated by the MATLAB function float_params that I have made available on GitHub and at MathWorks File Exchange.

## Harmonic Series

An interesting way to compare these different precisions is in summation of the harmonic series $1 + 1/2 + 1/3 + \cdots$. The series diverges, but when summed in the natural order in floating-point arithmetic it converges, because the partial sums grow while the addends decrease and eventually the addend is small enough that it does not change the partial sum. Here is a table showing the computed sum of the harmonic series for different precisions, along with how many terms are added before the sum becomes constant.

Arithmetic Computed Sum Number of terms
bfloat16 $5.0625$ $65$
fp16 $7.0859$ $513$
fp32 $15.404$ $2097152$
fp64 $34.122$ $2.81\dots\times 10^{14}$

The differences are striking! I determined the first three values in MATLAB. The fp64 value is reported by Malone based on a computation that took 24 days, and he also gives analysis to estimate the limiting sum and corresponding number of terms for fp64.

The NVIDIA V100 has tensor cores that can carry out the computation D = C + A*B in one clock cycle for 4-by-4 matrices A, B, and C; this is a 4-by-4 fused multiply-add (FMA) operation. Moreover, C and D can be in fp32. The benefits that the speed and accuracy of the tensor cores can bring over plain fp16 is demonstrated in Harnessing GPU Tensor Cores for Fast FP16 Arithmetic to Speed up Mixed-Precision Iterative Refinement Solvers.

Intel’s bfloat16 format supports a scalar FMA d = c + a*b, where c and d are in fp32.

## Conclusion

A few years ago we had just single precision and double precision arithmetic. With the introduction of fp16 and fp128 in the IEEE standard in 2008, and now bfloat16 by Google and Intel, the floating-point landscape is becoming much more interesting.

# How to Program log z

While Fortran was the first high-level programming language used for scientific computing, Algol 60 was the vehicle for publishing mathematical software in the early 1960s. Algol 60 had real arithmetic, but complex arithmetic had to be programmed by working with real and imaginary parts. Functions of a complex variable were not built-in, and had to be written by the programmer.

I’ve written a number of papers on algorithms to compute the (principal) logarithm of a matrix. The problem of computing the logarithm of a complex scalar—given a library routine that handles real arguments—might appear trivial, by comparison. That it is not can be seen by looking at early attempts to provide such a function in Algol 60.

The paper

J. R. Herndon (1961). Algorithm 48: Logarithm of a complex number. Comm. ACM, 4(4), 179.

presents an Algol 60 code for computing $\log z$, for a complex number $z$. It uses the arctan function to obtain the argument of a complex number.

The paper

A. P. Relph (1962). Certification of Algorithm 48: Logarithm of a complex number. Comm. ACM, 5(6), 347.

notes three problems with Herndon’s code: it fails for $z$ with zero real part, the imaginary part of the logarithm is on the wrong range (it should be $(-\pi,\pi]$ for the principal value), and the code uses log (log to the base 10) instead of ln (log to the base $e$). The latter error suggests to me that the code had never actually been run, as for almost any argument it would produce an incorrect value. This is perhaps not surprising since Algol 60 compilers must have only just started to become available in 1961.

The paper

M. L. Johnson and W. Sangren, W. (1962). Remark on Algorithm 48: Logarithm of a complex number. Comm. CACM, 5(7), 391.

contains more discussion about avoiding division by zero and getting signs correct. In

D. S. Collens (1964). Remark on remarks on Algorithm 48: Logarithm of a complex number. Comm. ACM, 7(8), 485.

Collens notes that Johnson and Sangren’s code wrongly gives $\log 0 = 0$ and has a missing minus sign in one statement. Finally, Collens gives in

D. S. Collens (1964). Algorithm 243: Logarithm of a complex number: Rewrite of Algorithm 48. Comm. CACM, 7(11), 660.

a rewritten algorithm that fixes all the earlier errors.

So it took five papers over a three year period to produce a correct Algol 60 code for the complex logarithm! Had those authors had the benefit of today’s interactive computing environments that period could no doubt have been shortened, but working with multivalued complex functions is necessarily a tricky business, as I have explained in earlier posts here and here.

# Numerical Linear Algebra Group 2017

The Manchester Numerical Linear Algebra Group (many of whom are in the October 2017 photo below) was involved in a variety of activities this year. This post summarizes what we got up to. Publications are not included here, but many of them can be found on MIMS EPrints under the category Numerical Analysis.

## Software

Together with Jack Dongarra’s team at the University of Tennessee the group is one of the two partners involved in the development of PLASMA: Parallel Linear Algebra Software for Multicore Architectures.

PhD students Weijian Zhang, Steven Elsworth and Jonathan Deakin released Etymo—a search engine for machine learning research and development.

We continue to make our research codes available, which is increasingly done on GitHub; see the repositories of Fasi, Higham, Relton, Sego, Tisseur, Zhang. We also put MATLAB software on MATLAB Central File Exchange and on our own web sites, e.g., the Rational Krylov Toolbox (RKToolbox).

## PhD Students

New PhD students Gian Maria Negri Porzio and Thomas McSweeney joined the group in September 2017.

## Postdoctoral Research Associates (PDRAs)

Sam Relton, who was working on the Parallel Numerical Linear Algebra for Extreme Scale Systems (NLAFET) project, left in June 2017 to take up a position at the University of Leeds. Negin Bagherpour joined NLAFET in March 2017, leaving in November 2017.

Srikara Pranesh joined the project in November 2017. Pierre Blanchard joined us in October 2017 to work jointly on the ICONIC project (which started in June 2017) and NLAFET.

## Presentations at Conferences and Workshops

UK and Republic of Ireland Section of SIAM Annual Meeting, University of Strathclyde, January 12, 2017: Fasi, Gwynne, Higham, Zemaityte, Zhang.

2017 Joint Mathematics Meetings, Atlanta, January 4-7, 2017: Higham.

Workshop on Matrix Equations and Tensor Techniques, Pisa, Italy, February 13-14 2017: Fasi

Due Giorni di Algebra Lineare Numerica, Como, Italy, February 16-17, 2017: Fasi

International Conference on Domain Decomposition Methods DD24, Svalbard, Norway, February 6-10, 2017: Sistek.

Workshop on Batched, Reproducible, and Reduced Precision BLAS, Atlanta, February 23-25, 2017: Hammarling, Relton.

SIAM Conference on Computational Science and Engineering, Atlanta, February 27-March 3, 2017: Relton, Zounon. See the blog posts about the meeting by Nick Higham and Sam Relton.

High Performance Computing in Science and Engineering (HPCSE) 2017, Hotel Solan, Czech Republic, May 22-25, 2017: Sistek

Advances in Data Science, Manchester, May 15-16, 2017: Zhang.

27th Biennial Conference on Numerical Analysis, Glasgow, June 27-30, 2017: Tisseur.

Householder Symposium XX on Numerical Linear Algebra, The Inn at Virginia Tech, June 18-23, 2017: Tisseur.

SIAM Annual Meeting, Pittsburgh, July 10-14, 2017: Zhang (see this SIAM News article about Weijian’s presentation). A Storify of the conference is available in PDF form.

ILAS 2017 Conference, Iowa State University, USA, July 24-28, 2017: Güttel

24th IEEE Symposium on Computer Arithmetic, London, July 24-26, 2017: Higham (see this blog post by George Constantinides).

LMS-EPSRC Symposium on Model Order Reduction, Durham, UK, August 8-17, 2017: Güttel

Euro-Par 2017, 23rd International European Conference on Parallel and Distributed Computing, August 28-September 1, 2017: Zounon.

INdAM Meeting Structured Matrices in Numerical Linear Algebra: Analysis, Algorithms and Applications, Cortona, Italy, September 4-8, 2017: Fasi, Tisseur.

2017 Woudschoten Conferences, Zeist, The Netherlands, 4-6 October 2017: Tisseur.

ICERM Workshop on Recent Advances in Seismic Modeling and Inversion, Brown University, USA, November 6-10, 2017: Güttel. A video recording of this talk is available.

## Conference and Workshop Organization

Güttel co-organized the SIAM UKIE Annual Meeting 2017 at the University of Strathclyde January 12, 2017 and the GAMM ANLA Workshop on High-Performance Computing at the University of Cologne, September 7-8, 2017.

The Manchester SIAM Student Chapter organized an Manchester Chapter Auto Trader Industry Problem Solving Event on February 22, 2017 and the 7th Manchester SIAM Student Chapter Conference on May 5, 2017.

The group organized three minisymposia at the SIAM Conference on Computational Science and Engineering, Atlanta, February 27-March 3, 2017:

## Visitors

Franco Zivcovic (Università degli Studi di Trento) visited the group from September 2017-January 2018.

## Knowledge Transfer

The Sabisu KTP project, which ended in December 2016, has been awarded the highest grade of “Outstanding” by the KTP Grading Panel. A new KTP project with Process Integration Ltd. is under way, led by Stefan Güttel.

The MSc project of Thomas McSweeney was sponsored by NAG and produced a code for modified Cholesky factorization that will appear in the NAG Library.

## Recognition and Service

Stefan Güttel continued his terms as Secretary/Treasurer of the SIAM UKIE section and vice-chair of the GAMM Activity Group on Applied and Numerical Linear Algebra.

Nick Higham served the first year of his two-year term as President of SIAM.

Weijian Zhang was awarded a SIAM Student Travel Award to attend the SIAM Annual Meeting 2017 in Pittsburgh.

Massimiliano Fasi and Mante Zemaityte were selected to present posters at the SET for Britain 2017 competition, which took place at the House of Commons, London. Fasi’s poster was “Finding Communities in Large Signed Networks with the Weighted Geometric Mean of Laplacians” and Zemaityte’s was “A Shift-and-Invert Lanczos Algorithm for the Dynamic Analysis of Structures”.

Jakub Sistek served as treasurer of the eu-maths-in.cz Czech Network for Mathematics in Industry.

# The Strange Case of the Determinant of a Matrix of 1s and -1s

By Nick Higham and Alan Edelman (MIT)

In a 2005 talk the second author noted that the MATLAB det function returns an odd integer for a certain 27-by-27 matrix composed of $1$s and $-1$s:

>> A = edelman; % Set up the matrix.
>> format long g, format compact, det(A)
ans =
839466457497601


However, the determinant is, from its definition, a sum of an even number (27 factorial) of odd numbers, so is even. Indeed the correct determinant is 839466457497600.

At first sight, this example is rather troubling, since while MATLAB returns an integer, as expected, it is out by $1$. The determinant is computed as the product of the diagonal entries of the $U$ factor in the LU factorization with partial pivoting of $A$, and these entries are not all integers. Standard rounding error analysis shows that the relative error from forming that product is bounded by $nu/(1-nu)$, with $n=27$, where $u \approx 1.1 \times 10^{-16}$ is the unit roundoff, and this is comfortably larger than the actual relative error (which also includes the errors in computing $U$) of $6 \times 10^{-16}$. Therefore the computed determinant is well within the bounds of roundoff, and if the exact result had not been an integer the incorrect last decimal digit would hardly merit discussion.

However, this matrix has more up its sleeve. Let us compute the determinant using a different implementation of Gaussian elimination with partial pivoting, namely the function gep from the Matrix Computation Toolbox:

>> [Lp,Up,Pp] = gep(A,'p'); det(Pp)*det(Up)
ans =
839466457497600


Now we get the correct answer! To see what is happening, we can directly form the products of the diagonal elements of the $U$ factors:

>> [L,U,P] = lu(A);
>> d = diag(U); dp = diag(Up);
>> rel_diff_U_diags = norm((dp - d)./d,inf)
rel_diff_U_diags =
7.37206353875273e-16
>> [prod(d), prod(dp)]
ans =
-839466457497601          -839466457497600
>> [prod(d(end:-1:1)), prod(dp(end:-1:1))]
ans =
-839466457497600          -839466457497600


We see that even though the diagonals of the two $U$ factors differ by a small multiple of the unit roundoff, the computed products differ in the last decimal digit. If the product of the diagonal elements of $U$ is accumulated in the reverse order then the exact answer is obtained in both cases. Once again, while this behaviour might seem surprising, it is within the error bounds of a rounding error analysis.

The moral of this example is that we should not be misled by the integer nature of a result; in floating-point arithmetic it is relative error that should be judged.

Finally, we note that numerical evaluation of the determinant offers other types of interesting behaviour. Consider the Frank matrix: a matrix of integers that has determinant 1. What goes wrong here in the step from dimension 24 to 25?

>> A = gallery('frank',24); det(A)
ans =
0.999999999999996
>> A = gallery('frank',25); det(A)
ans =
143507521.082525


The Edelman matrix is available in the MATLAB function available in this gist, which is embedded below. A Julia notebook exploring the Edelman matrix is available here.

 function A = edelman %EDELMAN Alan Edelman's matrix for which det is computed as the wrong integer. % A = EDELMAN is a 27-by-27 matrix of 1s and -1s for which the % MATLAB det function returns an odd integer, though the exact % determinant is an even integer. A = [% 1 1 1 1 -1 -1 -1 1 1 -1 1 -1 -1 1 -1 1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 1 -1 1 1 1 -1 1 -1 -1 1 1 1 1 1 -1 1 -1 1 1 1 1 -1 -1 1 -1 -1 1 1 -1 1 1 -1 -1 -1 1 -1 -1 -1 -1 1 -1 -1 -1 -1 1 -1 1 -1 -1 1 1 -1 -1 1 -1 -1 -1 -1 -1 -1 1 1 -1 -1 -1 -1 -1 1 -1 -1 1 1 1 -1 -1 -1 -1 -1 1 1 1 -1 1 1 1 -1 1 1 1 1 1 1 -1 1 -1 -1 1 1 -1 1 -1 -1 1 1 -1 -1 1 1 1 1 -1 1 -1 -1 -1 -1 -1 -1 -1 1 -1 1 -1 1 -1 1 -1 1 1 1 1 1 -1 1 -1 1 -1 -1 1 -1 1 1 1 -1 1 1 -1 -1 1 1 -1 -1 1 1 -1 1 -1 1 1 1 1 1 -1 -1 1 1 -1 -1 1 -1 1 1 1 -1 -1 1 -1 1 1 1 1 -1 1 1 1 -1 1 1 -1 1 1 1 -1 -1 -1 1 -1 1 1 -1 -1 -1 -1 1 1 1 1 1 -1 1 -1 1 1 -1 -1 1 1 1 1 -1 -1 -1 -1 -1 -1 1 1 -1 -1 -1 1 1 -1 -1 1 -1 -1 1 1 1 1 -1 1 1 1 -1 -1 1 1 1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 1 -1 1 -1 1 1 -1 1 1 1 -1 -1 -1 1 -1 -1 -1 -1 1 -1 -1 -1 -1 1 -1 -1 1 1 -1 1 -1 -1 1 1 -1 -1 1 -1 -1 1 1 -1 1 1 -1 1 1 -1 -1 -1 1 1 1 1 -1 -1 1 -1 1 1 -1 1 -1 -1 1 -1 -1 1 -1 1 -1 1 1 1 -1 -1 1 1 -1 -1 -1 -1 -1 1 1 1 1 1 -1 1 1 -1 -1 -1 -1 1 -1 1 -1 1 -1 -1 1 -1 -1 -1 -1 -1 -1 1 -1 1 -1 1 1 -1 1 -1 -1 1 1 1 -1 1 1 -1 1 -1 -1 1 1 -1 -1 1 -1 -1 1 1 -1 1 -1 1 1 -1 1 1 -1 1 1 -1 1 1 -1 1 1 1 -1 1 -1 1 -1 1 1 1 1 -1 -1 -1 1 1 1 -1 1 1 1 -1 -1 -1 1 -1 1 -1 -1 -1 -1 1 1 -1 -1 1 1 -1 -1 -1 1 -1 1 1 1 1 -1 -1 -1 -1 1 1 -1 1 -1 1 -1 1 1 1 1 1 -1 -1 -1 -1 -1 -1 1 1 1 -1 -1 -1 1 1 -1 -1 -1 -1 -1 -1 -1 -1 1 1 1 -1 -1 1 -1 -1 1 -1 -1 1 1 -1 -1 -1 1 -1 -1 1 1 -1 -1 1 -1 -1 -1 1 -1 1 -1 -1 1 1 -1 1 -1 1 1 -1 1 -1 1 -1 -1 1 -1 1 1 1 1 1 -1 1 -1 1 -1 1 -1 1 1 1 1 1 1 1 1 -1 -1 -1 1 -1 -1 1 1 1 -1 -1 -1 1 -1 1 -1 -1 1 -1 -1 -1 -1 1 -1 -1 1 1 1 1 1 -1 1 1 1 1 1 -1 1 -1 1 -1 -1 1 1 -1 -1 1 1 1 -1 1 -1 -1 1 1 -1 1 1 1 -1 1 -1 1 -1 1 1 -1 1 -1 1 1 -1 1 -1 -1 1 -1 -1 1 1 1 -1 1 -1 -1 1 -1 1 1 -1 -1 1 1 1 -1 1 1 -1 1 1 1 1 1 -1 1 -1 1 -1 1 1 -1 1];
view raw edelman.m hosted with ❤ by GitHub