# Seven Sins of Numerical Linear Algebra

In numerical linear algebra we are concerned with solving linear algebra problems accurately and efficiently and understanding the sensitivity of the problems to perturbations. We describe seven sins, whereby accuracy or efficiency is lost or misleading information about sensitivity is obtained.

## 1. Inverting a Matrix

In linear algebra courses we learn that the solution to a linear system $Ax = b$ of $n$ equations in $n$ unknowns can be written $x = A^{-1}b$, where $A^{-1}$ is the matrix inverse. What is not always emphasized is that there are very few circumstances in which one should compute $A^{-1}$. Indeed one would not solve the scalar ($n=1$) system $7x = 21$ by computing $x = 7^{-1} \times 21$, but rather would carry out a division $x = 21/7$. In the $n\times n$ case, it is faster and more accurate to solve a linear system by LU factorization (Gaussian elimination) with partial pivoting than by inverting $A$ (which has, in any case, to be done by LU factorization).

Rare cases where $A^{-1}$ is required are in statistics, where the diagonal elements of the inverse of the covariance matrix are relevant quantities, and in certain algorithms for computing matrix functions.

## 2. Forming the Cross-Product Matrix A^TA

The solution to the linear least squares problem $\min_x\| b - Ax \|_2$, where $A$ is a full-rank $m\times n$ matrix with $m\ge n$, satisfies the normal equations $A^T\!A x = A^Tb$. It is therefore natural to form the symmetric positive definite matrix $A^T\!A$ and solve the normal equations by Cholesky factorization. While fast, this method is numerically unstable when $A$ is ill conditioned. By contrast, solving the least squares problem via QR factorization is always numerically stable.

What is wrong with the cross-product matrix $A^T\!A$ (also known as the Gram matrix)? It squares the data, which can cause a loss of information in floating-point arithmetic. For example, if

$A = \begin{bmatrix} 1 & 1 \\ \epsilon & 0 \end{bmatrix}, \quad 0 < \epsilon < \sqrt{u},$

where $u$ is the unit roundoff of the floating point arithmetic, then

$A^T\!A = \begin{bmatrix} 1 + \epsilon^2 & 1 \\ 1 & 1 \end{bmatrix}$

is positive definite but, since $\epsilon^2, in floating-point arithmetic $1+\epsilon^2$ rounds to $1$ and so

$\mathrm{f\kern.2ptl}( A^T\!A) = \begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix}.$

which is singular, and the information in $\epsilon$ has been lost.

Another problem with the cross product matrix is that the $2$-norm condition number of $A^T\!A$ is the square of that of $A$, and this leads to numerical instability in algorithms that work with $A^T\!A$ when the condition number is large.

## 3. Evaluating Matrix Products in an Inefficient Order

The cost of evaluating a matrix product depends on the order in which the product is evaluated (assuming the matrices are not all $n\times n$). More precisely, matrix multiplication is associative, so $A(BC) = (AB)C$, and in general the cost of the evaluation of a product depends on where one puts the parentheses. One order may be much superior to others, so one should not simply evaluate the product in a fixed left-right or right-left order. For example, if $x$, $y$, and $z$ are $n$-vectors then $xy^Tz$ can be evaluated as

• $(xy^T)z$: a vector outer product followed by a matrix–vector product, costing $O(n^2)$ operations, or
• $x (y^Tz)$: a vector scalar product followed by a vector scaling, costing just $O(n)$ operations.

In general. finding where to put the parentheses in a matrix product $A_1A_2\dots A_k$ in order to minimize the operation count is a difficult problem, but for many cases that arise in practice it is easy to determine a good order.

## 4. Assuming that a Matrix is Positive Definite

Symmetric positive definite matrices (symmetric matrices with positive eigenvalues) are ubiquitous, not least because they arise in the solution of many minimization problems. However, a matrix that is supposed to be positive definite may fail to be so for a variety of reasons. Missing or inconsistent data in forming a covariance matrix or a correlation matrix can cause a loss of definiteness, and rounding errors can cause a tiny positive eigenvalue to go negative.

Definiteness implies that

• the diagonal entries are positive,
• $\det(A) > 0$,
• $|a_{ij}| < \sqrt{a_{ii}a_{jj}}$ for all $i \ne j$,

but none of these conditions, or even all taken together, guarantees that the matrix has positive eigenvalues.

The best way to check definiteness is to compute a Cholesky factorization, which is often needed anyway. The MATLAB function chol returns an error message if the factorization fails, and a second output argument can be requested, which is set to the number of the stage on which the factorization failed, or to zero if the factorization succeeded. In the case of failure, the partially computed $R$ factor is returned in the first argument, and it can be used to compute a direction of negative curvature (as needed in optimization), for example.

This sin takes the top spot in Schmelzer and Hauser’s Seven Sins in Portfolio Optimization, because in portfolio optimization a negative eigenvalue in the covariance matrix can identify a portfolio with negative variance, promising an arbitrarily large investment with no risk!

## 5. Not Exploiting Structure in the Matrix

One of the fundamental tenets of numerical linear algebra is that one should try to exploit any matrix structure that might be present. Sparsity (a matrix having a large number of zeros) is particularly important to exploit, since algorithms intended for dense matrices may be impractical for sparse matrices because of extensive fill-in (zeros becoming nonzero). Here are two examples of structures that can be exploited.

Matrices from saddle point problems are symmetric indefinite and of the form

$\notag C = \begin{bmatrix} A & B^T \\ B & 0 \end{bmatrix},$

with $A$ symmetric positive definite. Much work has been done on developing numerical methods for solving $Cx = b$ that exploit the block structure and possible sparsity in $A$ and $B$. A second example is a circulant matrix

$\notag C = \begin{bmatrix} c_1 & c_2 & \dots & c_n \\ c_n & c_1 & \dots & \vdots \\ \vdots & \ddots & \ddots & c_2 \\ c_2 & \dots & c_n & c_1 \\ \end{bmatrix}.$

Circulant matrices have the important property that they are diagonalized by a unitary matrix called the discrete Fourier transform matrix. Using this property one can solve $Cx = v$ in $O(n \log_2n)$ operations, rather than the $O(n^3)$ operations required if the circulant structure is ignored.

Ideally, linear algebra software would detect structure in a matrix and call an algorithm that exploits that structure. A notable example of such a meta-algorithm is the MATLAB backslash function x = A\b for solving $Ax = b$. Backslash checks whether the matrix is triangular (or a permutation of a triangular matrix), upper Hessenberg, symmetric, or symmetric positive definite, and applies an appropriate method. It also allows $A$ to be rectangular and solves the least squares problem if there are more rows than columns and the underdetermined system if there are more columns than rows.

## 6. Using the Determinant to Detect Near Singularity

An $n\times n$ matrix $A$ is nonsingular if and only if its determinant is nonzero. One might therefore expect that a small value for $\det(A)$ indicates a matrix that is nearly singular. However, the size of $\det(A)$ tells us nothing about near singularity. Indeed, since $\det(\alpha A) = \alpha^n \det(A)$ we can achieve any value for the determinant by multiplying by a scalar $\alpha$, yet $\alpha A$ is no more or less nearly singular than $A$ for $\alpha \ne 0$.

Another limitation of the determinant is shown by the two matrices

$\notag T = \begin{bmatrix} 1 & -1 & -1 & \dots & -1\\ & 1 & -1 & \dots & -1\\ & & 1 & \dots & \vdots\\ & & & \ddots & -1 \\ & & & & 1 \end{bmatrix}, \quad U = \begin{bmatrix} 1 & 1 & 1 & \dots & 1\\ & 1 & 1 & \dots & 1\\ & & 1 & \dots & \vdots\\ & & & \ddots & 1 \\ & & & & 1 \end{bmatrix} \qquad (1)$

Both matrices have unit diagonal and off-diagonal elements bounded in modulus by $1$. So $\det(T) = \det(U) = 1$, yet

$\notag T^{-1} = \begin{bmatrix} 1 & 1 & 2 & \dots & 2^{n-2}\\ & 1 & 1 & \dots & \vdots\\ & & 1 & \ddots & 2\\ & & & \ddots & 1 \\ & & & & 1 \end{bmatrix}, \quad U^{-1} = \begin{bmatrix} 1 & -1 & & & \\ & 1 & -1 & & \\ & & 1 & \ddots & \\ & & & \ddots & -1 \\ & & & & 1 \end{bmatrix}.$

So $T$ is ill conditioned for large $n$. In fact, if we change the $(n,1)$ element of $T$ to $-2^{n-2}$ then the matrix becomes singular! By contrast, $U$ is always very well conditioned. The determinant cannot distinguish between the ill-conditioned $T$ and the well-conditioned $U$.

## 7. Using Eigenvalues to Estimate Conditioning

For any $n\times n$ matrix $A$ and any consistent matrix norm it is true that $\|A\| \ge |\lambda_i|$ for all $i$, where the $\lambda_i$ are the eigenvalue of $A$. Since the eigenvalues of $A^{-1}$ are $\lambda^{-1}$, it follows that the matrix condition number $\kappa(A) = \|A\| \, \|A^{-1}\|$ is bounded below by the ratio of largest to smallest eigenvalue in absolute value, that is,

$\notag \kappa(A) \ge \displaystyle\frac{ \max_i |\lambda_i| } { \min_i |\lambda_i| }.$

But as the matrix $T$ in (1) shows, this bound can be very weak.

It is singular values not eigenvalues that characterize the condition number for the 2-norm. Specifically,

$\notag \kappa_2(A) = \displaystyle\frac{\sigma_1}{\sigma_n},$

where $A = U\Sigma V^T$ is a singular value decomposition (SVD), with $U$ and $V$ orthogonal and $\Sigma = \mathrm{diag}(\sigma_i)$, $\sigma_1 \ge \sigma_2 \ge \cdots \ge \sigma_n \ge 0$. If $A$ is symmetric, for example, then the sets $\{ |\lambda_i| \}$ and $\{\sigma_i \}$ are the same, but in general the eigenvalues $\lambda_i$ and singular values $\sigma_i$ can be very different.

# Cleve Moler Wins ICIAM Industry Prize 2023

Congratulations to Cleve Moler, who has won the inaugural ICIAM Industry Prize 2023 for “outstanding contributions to the development of mathematical and computational tools and methods for the solution of science and engineering problems and his invention of MATLAB”.

I first saw Cleve demonstrate the original Fortran version of MATLAB on an IBM PC at the Gatlinburg meeting at the University of Waterloo in 1984. The commercial version of MATLAB was released soon after, and it has been my main programming environment ever since.

MATLAB succeeded for a number of reasons, some of which Dennis Sherwood and I describe in one of the creativity stories in our recent book How to Be Creative: A Practical Guide for the Mathematical Sciences. But there is one reason that is rarely mentioned.

From the beginning, MATLAB supported complex arithmetic—indeed, the basic data type has always been a complex matrix. The original 1980 MATLAB Users’ Guide says

MATLAB works with essentially only one kind of object, a rectangular matrix with complex elements. If the imaginary parts of the elements are all zero, they are not printed, but they still occupy storage.

By contrast, early competing packages usually supported only real arithmetic (see my 1989 SIAM News article Matrix Computations on a PC for a comparison of PC-MATLAB and GAUSS). Cleve understood the fundamental need to compute in the complex plane in real life problems, as opposed to textbook examples, and he appreciated how tedious it is to program with real and imaginary parts stored in separate arrays. The storing of zero imaginary parts of real numbers was a small price to pay for the convenience. Of course, the commercial version of MATLAB was optimized not to store the imaginary part of reals. Control engineers—a group who were early adopters of MATLAB—appreciated the MATLAB approach, because the stability of control systems depends on eigenvalues, which are in general complex.

Another wise choice was that MATLAB allows the imaginary unit to be written as i or j, thus keeping mathematicians and electrical engineers happy!

Here is Cleve demonstrating MATLAB in October 2000:

# What Is a Circulant Matrix?

An $n\times n$ circulant matrix is defined by $n$ parameters, the elements in the first row, and each subsequent row is a cyclic shift forward of the one above:

$\notag C = \begin{bmatrix} c_1 & c_2 & \dots & c_n \\ c_n & c_1 & \dots & \vdots \\ \vdots & \ddots & \ddots & c_2 \\ c_2 & \dots & c_n & c_1 \\ \end{bmatrix}.$

Circulant matrices have the important property that they are diagonalized by the discrete Fourier transform matrix

$\notag F_n = \Bigl(\exp( -2\pi \mathrm{i} (r-1)(s-1) / n )\Bigr)_{r,s=1}^n,$

which satisfies $F_n^*F_n = nI$, so that $n^{-1/2}F_n$ is a unitary matrix. ($F_n$ is a Vandermonde matrix with points the roots of unity.) Specifically,

$\notag F_n C F_n^{-1} = D = \mathrm{diag}(d_i). \qquad(1)$

Hence circulant matrices are normal ($C^*C = CC^*$). Moreover, the eigenvalues are given by $d = F_n Ce_1$, where $e_1$ is the first unit vector.

Note that one particular eigenpair is immediate, since $Ce = \bigl(\sum_{i=1}^n c_i\bigr) e$, where $e$ is the vector of ones.

The factorization (1) enables a circulant linear system to be solved in $O(n\log n)$ flops, since multiplication by $F_n$ can be done using the fast Fourier transform.

A particular circulant matrix is the (up) shift matrix $K_n$, the $4\times 4$ version of which is

$\notag K_4 = \begin{bmatrix} 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ 1 & 0 & 0 & 0 \end{bmatrix}.$

It is not hard to see that

$C = c_1 I + c_2 K_n + c_3K_n^2 + \cdots + c_n K_n^{n-1}.$

Since powers of $K_n$ commute, it follows that any two circulant matrices commute (this is also clear from (1)). Furthermore, the sum and product of two circulant matrices is a circulant matrix and the inverse of a nonsingular circulant matrix is a circulant matrix.

One important use of circulant matrices is to act as preconditioners for Toeplitz linear systems. Several methods have been proposed for constructing a circulant matrix from a Toeplitz matrix, including copying the central diagonals and wrapping them around and finding the nearest circulant matrix to the Toeplitz matrix. See Chan and Ng (1996) or Chan and Jin (2017) for a summary of work on circulant preconditioners for Toeplitz systems.

An interesting circulant matrix is anymatrix('core/circul_binom',n) in the Anymatrix collection, which is the $n\times n$ circulant matrix whose first row has $i$th element $n \choose ,i-1$. The eigenvalues are $(1 + w^i)^n - 1$, $i=1:n$, where $w = \exp(2\pi\mathrm{i}/n)$. The matrix is singular when $n$ is a multiple of 6, in which case the null space has dimension 2. Example:

>> n = 6; C = anymatrix('core/circul_binom',n), svd(C)
C =
1     6    15    20    15     6
6     1     6    15    20    15
15     6     1     6    15    20
20    15     6     1     6    15
15    20    15     6     1     6
6    15    20    15     6     1
ans =
6.3000e+01
2.8000e+01
2.8000e+01
1.0000e+00
2.0244e-15
7.6607e-16


## Notes

A classic reference on circulant matrices is Davis (1994).

# Photos and Videos from NJH60 Conference

The conference Advances in Numerical Linear Algebra: Celebrating the 60th Birthday of Nick Higham was held at the University of Manchester, July 6–8, 2022.

Most of the talks are available on the NLA Group YouTube channel and links to them are available on the conference web page.

Here is the conference photo.

Here is me with some of my current and former PhD students.

And with some of my current and former postdocs.

After-dinner speaker Charlie Van Loan:

Rob Corless kindly gave me a Bohemian matrix eigenvalue tie based on this image.

Many thanks to Stefan Güttel, Sven Hammarling, Stephanie Lai, Françoise Tisseur and Marcus Webb for organizing the conference and the Royal Society, MathWorks and the University of Manchester for financial support.

# What Is Fast Matrix Multiplication?

The definition of matrix multiplication says that for $n\times n$ matrices $A$ and $B$, the product $C = AB$ is given by $c_{ij} = \sum_{k=1}^n a_{ik}b_{kj}$. Each element of $C$ is an inner product of a row of $A$ and a column of $B$, so if this formula is used then the cost of forming $C$ is $n^2(2n-1)$ additions and multiplications, that is, $O(n^3)$ operations. For over a century after the development of matrix algebra in the 1850s by Cayley, Sylvester and others, all methods for matrix multiplication were based on this formula and required $O(n^3)$ operations.

In 1969 Volker Strassen showed that when $n=2$ the product can be computed from the formulas

\notag \begin{gathered} p_1 =(a_{11}+a_{22})(b_{11}+b_{22}), \\ p_2=(a_{21}+a_{22})b_{11}, \qquad p_3=a_{11}(b_{12}-b_{22}), \\ p_4=a_{22}(b_{21}-b_{11}), \qquad p_5=(a_{11}+a_{12})b_{22}, \\ p_6=(a_{21}-a_{11})(b_{11}+b_{12}), \qquad p_7=(a_{12}-a_{22})(b_{21}+b_{22}), \\ \noalign{\smallskip} C = \begin{bmatrix} p_1+p_4-p_5+p_7 & p_3+p_5 \\ p_2+p_4 & p_1+p_3-p_2+p_6 \end{bmatrix}. \end{gathered}

The evaluation requires $7$ multiplications and $18$ additions instead of $8$ multiplications and $4$ additions for the usual formulas.

At first sight, Strassen’s formulas may appear simply to be a curiosity. However, the formulas do not rely on commutativity so are valid when the $a_{ij}$ and $b_{ij}$ are matrices, in which case for large dimensions the saving of one multiplication greatly outweighs the extra $14$ additions. Assuming $n$ is a power of $2$, we can partition $A$ and $B$ into four blocks of size $n/2$, apply Strassen’s formulas for the multiplication, and then apply the same formulas recursively on the half-sized matrix products.

Let us examine the number of multiplications for the recursive Strassen algorithm. Denote by $M(k)$ the number of scalar multiplications required to multiply two $2^k \times 2^k$ matrices. We have $M(k) = 7M(k-1)$, so

$M(k) = 7M(k-1) = 7^2M(k-2) = \cdots = 7^k M(0) = 7^k.$

But $7^k = 2^{\log_2{7^k}} = (2^k)^{\log_2 7} = n^{\log_2 7} = n^{2.807\dots}$. The number of additions can be shown to be of the same order of magnitude, so the algorithm requires $O(n^{\log_27})=O(n^{2.807\dots})$ operations.

Strassen’s work sparked interest in finding matrix multiplication algorithms of even lower complexity. Since there are $O(n^2)$ elements of data, which must each participate in at least one operation, the exponent of $n$ in the operation count must be at least $2$.

The current record upper bound on the exponent is $2.37286$, proved by Alman and Vassilevska Williams (2021) which improved on the previous record of $2.37287$, proved by Le Gall (2014) The following figure plots the best upper bound for the exponent for matrix multiplication over time.

In the methods that achieve exponents lower than 2.775, various intricate techniques are used, based on representing matrix multiplication in terms of bilinear or trilinear forms and their representation as tensors having low rank. Laderman, Pan, and Sha (1993) explain that for these methods “very large overhead constants are hidden in the $O$‘ notation”, and that the methods “improve on Strassen’s (and even the classical) algorithm only for immense numbers $N$.”

Strassen’s method, when carefully implemented, can be faster than conventional matrix multiplication for reasonable dimensions. In practice, one does not recur down to $1\times 1$ matrices, but rather uses conventional multiplication once $n_0\times n_0$ matrices are reached, where the parameter $n_0$ is tuned for the best performance.

Strassen’s method has the drawback that it satisfies a weaker form of rounding error bound that conventional multiplication. For conventional multiplication $C = AB$ of $A\in\mathbb{R}^{m\times n}$ and $B\in\mathbb{R}^{n\times p}$ we have the componentwise bound

$\notag |C - \widehat{C}| \le \gamma^{}_n |A|\, |B|, \qquad(1)$

where $\gamma^{}_n = nu/(1-nu)$ and $u$ is the unit roundoff. For Strassen’s method we have only a normwise error bound. The following result uses the norm $\|A\| = \max_{i,j} |a_{ij}|$, which is not a consistent norm.

Theorem 1 (Brent). Let $A,B \in \mathbb{R}^{n\times n}$, where $n=2^k$. Suppose that $C=AB$ is computed by Strassen’s method and that $n_0=2^r$ is the threshold at which conventional multiplication is used. The computed product $\widehat{C}$ satisfies

$\notag \|C - \widehat{C}\| \le \left[ \Bigl( \displaystyle\frac{n}{n_0} \Bigr)^{\log_2{12}} (n_0^2+5n_0) - 5n \right] u \|A\|\, \|B\| + O(u^2). \qquad(2)$

With full recursion ($n_0=1$) the constant in (2) is $6 n^{\log_2 12}-5n \approx 6 n^{3.585}-5n$, whereas with just one level of recursion ($n_0 = n/2$) it is $3n^2+25n$. These compare with $n^2u + O(u^2)$ for conventional multiplication (obtained by taking norms in (1)). So the constant for Strassen’s method grows at a faster rate than that for conventional multiplication no matter what the choice of $n_0$.

The fact that Strassen’s method does not satisfy a componentwise error bound is a significant weakness of the method. Indeed Strassen’s method cannot even accurately multiply by the identity matrix. The product

$\notag C = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & \epsilon \\ \epsilon & \epsilon^2 \end{bmatrix}, \quad 0 < \epsilon \ll 1$

is evaluated exactly in floating-point arithmetic by conventional multiplication, but Strassen’s method computes

$c_{22} = 2(1+\epsilon^2) + (\epsilon-\epsilon^2) - 1 - (1+\epsilon).$

Because $c_{22}$ involves subterms of order unity, the error $c_{22} - \widehat c_{22}$ will be of order $u$. Thus the relative error $|c_{22} - \widehat c_{22}| / |c_{22}| = O(u/\epsilon^2) \gg O(u)$,

Another weakness of Strassen’s method is that while the scaling $AB \to (AD) (D^{-1}B)$, where $D$ is diagonal, leaves (1) unchanged, it can alter (2) by an arbitrary amount. Dumitrescu (1998) suggests computing $D_1(D_1^{-1}A\cdot B D_2^{-1})D_2$, where the diagonal matrices $D_1$ and $D_2$ are chosen to equilibrate the rows of $A$ and the columns of $B$ in the $\infty$-norm; he shows that this scaling can improve the accuracy of the result. Further investigations along these lines are made by Ballard et al. (2016).

Should one use Strassen’s method in practice, assuming that an implementation is available that is faster than conventional multiplication? Not if one needs a componentwise error bound, which ensures accurate products of matrices with nonnegative entries and ensures that the column scaling of $A$ and row scaling of $B$ has no effect on the error. But if a normwise error bound with a faster growing constant than for conventional multiplication is acceptable then the method is worth considering.

## Notes

For recent work on high-performance implementation of Strassen’s method see Huang et al. (2016, 2020).

Theorem 1 is from an unpublished technical report of Brent (1970). A proof can be found in Higham (2002, §23.2.2).

For more on fast matrix multiplication see Bini (2014) and Higham (2002, Chapter 23).

## References

This is a minimal set of references, which contain further useful references within.

# Creativity Workshop on Numerical Linear Algebra

Sixteen members of the Numerical Linear Algebra group at the University of Manchester recently attended a two-day creativity workshop in order to generate ideas for our research and other activities. The workshop was facilitated by Dennis Sherwood, who is an expert in creativity and has run many such workshops. Dennis and I have previously collaborated on workshops for the Manchester Numerical Analysis group (2013), the EPSRC NA-HPC Network (2014), and the SIAM leadership (2018).

A creativity workshop brings together a group of people to tackle questions using a structured approach, in which people share what they know about the question, ask “how might this be different” about the aspects identified, and then discuss the resulting possibilities. A key feature of these workshops is that every idea is captured—on flip charts, coloured cards, and post-it notes—and ideas are not evaluated until they have all been generated. This approach contrasts will the all-too-common situation where excellent ideas generated in a discussion are instantly dismissed because of a “that will never work” reaction.

At our workshop a number of topics were addressed, covering strategic plans for the group and plans for future research projects and grant proposals, including “Mixed precision algorithms”, “Being a magnet for talent”, and “Conferences”. Many ideas were generated and assessed and the group is now planning the next steps with the help of the detailed 70-page written report produced by Dennis.

One idea has already been implemented: we have a new logo; see A New Logo for the Numerical Linear Algebra Group.

For more on the creativity process mentioned here, as well as details of creativity workshops, including sample briefs that can be used at them, see the new book by Dennis and me: How to Be Creative: A Practical Guide for the Mathematical Sciences (SIAM, 2022).

# How to Space Displayed Mathematical Equations

In a displayed mathematical equation with more than one component, how much space should be placed between the components?

Here are the guidelines I use, with examples in LaTeX. Recall that a \quad is approximately the width of a capital M and \qquad is twice the width of a \quad.

## Case 1. Equation with Qualifying Expression

An equation or other mathematical construct is separated from a qualifying expression by a \quad. Examples:

$\notag |a_{ii}| \ge \displaystyle\sum_{j\ne i} |a_{ij}|, \quad i=1\colon n.$

$\notag fl(x\mathbin{\mathrm{op}}y) = (x\mathbin{\mathrm{op}} y)(1+\delta), \quad |\delta|\le u, \quad \mathbin{\mathrm{op}} =+,-,*,/.$

$\notag y' = t^2+y^2, \quad 0\le t\le 1, \quad y(0)=0.$

When the qualifying expression is a prepositional phrase it is given standard sentence spacing. Examples:

$\notag \min_x c^Tx \quad \mathrm{subject~to~} Ax=b,~ x\ge 0.$

$\notag \|J(v)-J(w)\| \le \theta_L \|v-w\| \quad \mathrm{for~all~} v,w \in \mathbb{R}^n.$

The first example was typed as (using the equation* environment provided by the amsmath package)

\begin{equation*}
\min_x c^Tx \quad \text{subject to $Ax=b$, $x\ge 0$}.
\end{equation*}


Here, the qualifying phrase is placed inside a \text command, which jumps out of math mode and formats its argument as regular text, with the usual interword spacing in effect, and we re-enter math mode for the conditions. This is better than writing

\min_x c^Tx \quad \text{subject to} ~Ax=b, ~x\ge 0.


with hard spaces. Note that \text is a command from the amsmath package, and it is similar to the LaTeX command \mbox and the TeX command \hbox, both of which work equally well here.

## Case 2. Equation with Conjunction

When an equation contains a conjunction such as and or or, the conjunction has a \quad on each side. Examples:

$\notag x = 1 \quad \mathrm{or} \quad x = 2.$

$\notag a = \displaystyle\sum_{j=1}^n c_j v_j \quad \mathrm{where} \quad c_j = \langle a,~ u_j\rangle~\mathrm{for}~j=1,2,\dots,n.$

In the second example, one might argue for a \quad before the qualifying “for”, on the basis of case 1, but it I prefer the word spacing. This example was typed as

\begin{equation*}
\text{$c_j = \langle a, u_j\rangle$ for $j=1,2,\dots,n$}.
\end{equation*}


## Case 3. Multiple Equations

Two or more equations are separated by a \qquad. Examples:

$\notag A = e_1^{}e_3^T, \qquad B = e_1^{}e_4^T, \qquad C = e_2^{}e_3^T, \qquad D = e_2^{}e_4^T$

\notag \begin{aligned} AXA &= A, \qquad & XAX &= X,\\ (AX)^* &= AX, \qquad & (XA)^* &= XA. \end{aligned}

## Limitations

It is important to emphasize that one might diverge from following these (or any other) guidelines, for a variety of reasons. With a complicated display, or if a narrow text width is in use (as with a two-column format), horizontal space may be at a premium so one may need to reduce the spacing. And the guidelines do not cover every possible situation.

## Notes

My guidelines are the same ones that were used in typesetting the Princeton Companion to Applied Mathematics, and I am grateful to Sam Clark (T&T Productions), copy editor and typesetter of the Companion, for discussions about them. Cases 1 and 3 are recommended in my Handbook of Writing for the Mathematical Sciences (2017).

The SIAM Style Guide (link to PDF) prefers a \qquad in Case 1 and \quad in Case 3 with three or more equations. The AMS Style Guide (link to PDF) has the same guidelines as SIAM. Both SIAM and the AMS allow an author to use just a \quad between an equation an a qualifying expression.

In the TeXbook (1986, p. 166), Knuth advocates using a \qquad between an equation and a qualifying expression.

# What Is the Pascal Matrix?

The Pascal matrix $P_n\in\mathbb{R}^{n\times n}$ is the symmetric matrix defined by

$p_{ij} = \displaystyle{i+j-2 \choose j-1} = \displaystyle\frac{ (i+j-2)! }{ (i-1)! (j-1)! }.$

It contains the rows of Pascal’s triangle along the anti-diagonals. For example:

$\notag P_5 = \left[\begin{array}{ccccc} 1 & 1 & 1 & 1 & 1\\ 1 & 2 & 3 & 4 & 5\\ 1 & 3 & 6 & 10 & 15\\ 1 & 4 & 10 & 20 & 35\\ 1 & 5 & 15 & 35 & 70 \end{array}\right].$

In MATLAB, the matrix is pascal(n).

The Pascal matrix is positive definite and has the Cholesky factorization

$\notag P_n = L_nL_n^T, \qquad(1)$

where the rows of $L_n$ are the rows of Pascal’s triangle. For example,

$\notag L_5 = \left[\begin{array}{ccccc} 1 & 0 & 0 & 0 & 0\\ 1 & 1 & 0 & 0 & 0\\ 1 & 2 & 1 & 0 & 0\\ 1 & 3 & 3 & 1 & 0\\ 1 & 4 & 6 & 4 & 1 \end{array}\right]\\$

From (1) we have $\det(P_n) = \det(L_n)^2 = 1$. Form this equation, or by inverting (1), it follows that $P_n^{-1}$ has integer elements. Indeed the inverse is known to have $(i,j)$ element

$\notag (-1)^{i-j} \displaystyle\sum_{k=0}^{n-i} {i+k-1 \choose k} {i+k-1 \choose i+k-j}, \quad i \ge j. \qquad(2)$

The Cholesky factor $L_n$ can be factorized as

$\notag L_n = M_{n-1} M_{n-2} \dots M_1, \qquad(3)$

where $M_i$ is unit lower bidiagonal with the first $i-1$ entries along the subdiagonal of $M_i$ zero and the rest equal to $1$. For example,

\notag \begin{aligned} L_5 = \left[\begin{array}{ccccc} 1 & 0 & 0 & 0 & 0\\ 0 & 1 & 0 & 0 & 0\\ 0 & 0 & 1 & 0 & 0\\ 0 & 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1 & 1 \end{array}\right] \left[\begin{array}{ccccc} 1 & 0 & 0 & 0 & 0\\ 0 & 1 & 0 & 0 & 0\\ 0 & 0 & 1 & 0 & 0\\ 0 & 0 & 1 & 1 & 0\\ 0 & 0 & 0 & 1 & 1 \end{array}\right] \left[\begin{array}{ccccc} 1 & 0 & 0 & 0 & 0\\ 0 & 1 & 0 & 0 & 0\\ 0 & 1 & 1 & 0 & 0\\ 0 & 0 & 1 & 1 & 0\\ 0 & 0 & 0 & 1 & 1 \end{array}\right] \left[\begin{array}{ccccc} 1 & 0 & 0 & 0 & 0\\ 1 & 1 & 0 & 0 & 0\\ 0 & 1 & 1 & 0 & 0\\ 0 & 0 & 1 & 1 & 0\\ 0 & 0 & 0 & 1 & 1 \end{array}\right]. \end{aligned}

The factorization (3) shows that $P_n$ is totally positive, that is, every minor (a determinant of a square submatrix) is positive. Indeed each bidiagonal factor $M_i$ is totally nonnegative, that is, every minor is nonnegative, and the product of totally nonnegative matrices is totally nonnegative. Further results in the theory of totally positive matrices show that the product is actually totally positive.

The positive definiteness of $P_n$ implies that the eigenvalues are real and positive. The total positivity, together with the fact that $P_n$ is (trivially) irreducible, implies that the eigenvalues are distinct.

For a symmetric positive semidefinite matrix $A$ with nonnegative entries, we define $A^{\circ t} = (a_{ij}^t)$, which is the matrix with every entry raised to the power $t\in \mathbb{R}$. We say that $A$ is infinitely divisible if $A^{\circ t}$ is positive semidefinite for all nonnegative $t$. The Pascal matrix is infinitely divisible.

It is possible to show that

$\notag L_n^{-1} = DL_nD, \qquad (4)$

where $D = \mathrm{diag}((-1)^i)$. In other words, $Y_n = L_nD$ is involutory, that is, $Y_n^2 = I$. It follows from $P_n = Y_n Y_n^T$ that

\notag \begin{aligned} P_n^{-1} = Y_n^{-T} Y_n^{-1} = Y_n^T Y_n = Y_n^{-1} (Y_n Y_n^T) Y_n = Y_n^{-1} P_n Y_n, \end{aligned}

so $P_n$ and $P_n^{-1}$ are similar and hence have the same eigenvalues. This means that the eigenvalues of $P_n$ appear in reciprocal pairs and that the characteristic polynomial is palindromic. Here is an illustration in MATLAB:

>> P = pascal(5); evals = eig(P); [evals 1./evals], coeffs = charpoly(P)
ans =
1.0835e-02   9.2290e+01
1.8124e-01   5.5175e+00
1.0000e+00   1.0000e+00
5.5175e+00   1.8124e-01
9.2290e+01   1.0835e-02
coeffs =
1   -99   626  -626    99    -1


Now

$p_{nn} \le \|P\|_2 \le (\|P\|_1 \|P\|_{\infty})^{1/2} = \biggl(\displaystyle\frac{2n-1}{n}\biggr) p_{nn},$

where for the equality we used a binomial coefficient summation identity. The fact that $P_n$ is positive definite with reciprocal eigenvalues implies that $\kappa_2(P) = \|P\|_2^2$. Hence, using Stirling’s approximation ($n! \sim \sqrt{2\pi n} (n/\mathrm{e})^n$),

$\kappa_2(P_n) \sim p_{nn}^2 \sim\left( \displaystyle\frac{ (2n)! }{ (n!)^2 } \right)^2 \sim \displaystyle\frac{16^n}{n \pi}.$

Thus $P_n$ is exponentially ill conditioned as $n\to\infty$.

The matrix $Y_n$ is obtained in MATLAB with pascal(n,1); this is a lower triangular square root of the identity matrix. Turnbull (1927) noted that by rotating $Y_n$ through 90 degrees one obtains a cube root of the identity matrix. This matrix is returned by pascal(n,2). For example, with $n = 5$:

$\notag \hspace*{-1cm} X = \left[\begin{array}{rrrrr} 1 & 1 & 1 & 1 & 1\\ -4 & -3 & -2 & -1 & 0\\ 6 & 3 & 1 & 0 & 0\\ -4 & -1 & 0 & 0 & 0\\ 1 & 0 & 0 & 0 & 0 \end{array}\right], \quad X^2 = \left[\begin{array}{rrrrr} 0 & 0 & 0 & 0 & 1\\ 0 & 0 & 0 & -1 & -4\\ 0 & 0 & 1 & 3 & 6\\ 0 & -1 & -2 & -3 & -4\\ 1 & 1 & 1 & 1 & 1 \end{array}\right], \quad X^3 = I.$

The logarithm of $L_n$ is explicitly known: it is the upper bidiagonal matrix

$\notag \log L_n = \left[\begin{array}{ccccc} 0 & 1 & & & \\ & 0 & 2 & & \\[-5pt] & & 0 & \ddots & \\[-5pt] & & & \ddots & n-1\\ & & & & 0 \end{array}\right]. \qquad (5)$

## Notes

For proofs of (2) and (4) see Cohen (1975) and Call and Velleman (1993). respectively. For (5), see Edelman and Strang (2004). The infinite divisibility of the Pascal matrix is infinitely is shown by Bhatia (2006). For the total positivity property see Fallat and Johnson (2011).

# The Big Six Matrix Factorizations

Six matrix factorizations dominate in numerical linear algebra and matrix analysis: for most purposes one of them is sufficient for the task at hand. We summarize them here.

For each factorization we give the cost in flops for the standard method of computation, stating only the highest order terms. We also state the main uses of each factorization.

For full generality we state factorizations for complex matrices. Everything translates to the real case with “Hermitian” and “unitary” replaced by “symmetric” and “orthogonal”, respectively.

The terms “factorization” and “decomposition” are synonymous and it is a matter of convention which is used. Our list comprises three factorization and three decompositions.

Recall that an upper triangular matrix is a matrix of the form

$\notag R = \begin{bmatrix} r_{11} & r_{12} & \dots & r_{1n}\\ & r_{22} & \dots & r_{2n}\\ & & \ddots& \vdots\\ & & & r_{nn} \end{bmatrix},$

and a lower triangular matrix is the transpose of an upper triangular one.

## Cholesky Factorization

Every Hermitian positive definite matrix $A\in\mathbb{C}^{n\times n}$ has a unique Cholesky factorization $A = R^*R$, where $R\in\mathbb{C}^{n\times n}$ is upper triangular with positive diagonal elements.

Cost: $n^3/3$ flops.

Use: solving positive definite linear systems.

## LU Factorization

Any matrix $A\in\mathbb{C}^{n\times n}$ has an LU factorization $PA = LU$, where $P$ is a permutation matrix, $L$ is unit lower triangular (lower triangular with 1s on the diagonal), and $U$ is upper triangular. We can take $P = I$ if the leading principal submatrices $A(1\colon k, 1\colon k)$, $k = 1\colon n-1$, of $A$ are nonsingular, but to guarantee that the factorization is numerically stable we need $A$ to have particular properties, such as diagonal dominance. In practical computation we normally choose $P$ using the partial pivoting strategy, which almost always ensures numerically stable.

Cost: $2n^3/3$ flops.

Use: solving general linear systems.

## QR Factorization

Any matrix $A\in\mathbb{C}^{m\times n}$ with $m\ge n$ has a QR factorization $A = QR$, where $Q\in \mathbb{C}^{m\times m}$ is unitary and $R$ is upper trapezoidal, that is, $R = \left[\begin{smallmatrix} R_1 \\ 0\end{smallmatrix}\right]$, where $R_1\in\mathbb{C}^{n\times n}$ is upper triangular.

Partitioning $Q = [Q_1~Q_2]$, where $Q_1\in\mathbb{C}^{m\times n}$ has orthonormal columns, gives $A = Q_1R_1$, which is the reduced, economy size, or thin QR factorization.

Cost: $2(n^2m-n^3/3)$ flops for Householder QR factorization. The explicit formation of $Q$ (which is not usually necessary) requires a further $4(m^2n-mn^2+n^3/3)$ flops.

Use: solving least squares problems, computing an orthonormal basis for the range space of $A$, orthogonalization.

## Schur Decomposition

Any matrix $A\in\mathbb{C}^{n\times n}$ has a Schur decomposition $A = QTQ^*$, where $Q$ is unitary and $T$ is upper triangular. The eigenvalues of $A$ appear on the diagonal of $T$. For each $k$, the leading $k$ columns of $Q$ span an invariant subspace of $A$.

For real matrices, a special form of this decomposition exists in which all the factors are real. An upper quasi-triangular matrix $R$ is a block upper triangular with whose diagonal blocks $R_{ii}$ are either $1\times1$ or $2\times2$. Any $A\in\mathbb{R}^{n\times n}$ has a real Schur decomposition $A = Q R Q^T$, where $Q$ is real orthogonal and $R$ is real upper quasi-triangular with any $2\times2$ diagonal blocks having complex conjugate eigenvalues.

Cost: $25n^3$ flops for $Q$ and $T$ (or $R$) by the QR algorithm; $10n^3$ flops for $T$ (or $R$) only.

Use: computing eigenvalues and eigenvectors, computing invariant subspaces, evaluating matrix functions.

## Spectral Decomposition

Every Hermitian matrix $A\in\mathbb{C}^{n\times n}$ has a spectral decomposition $A = Q\Lambda Q^*$, where $Q$ is unitary and $\Lambda = \mathrm{diag}(\lambda_i)$. The $\lambda_i$ are the eigenvalues of $A$, and they are real. The spectral decomposition is a special case of the Schur decomposition but is of interest in its own right.

Cost: $9n^3$ for $Q$ and $D$ by the QR algorithm, or $4n^3\!/3$ flops for $D$ only.

Use: any problem involving eigenvalues of Hermitian matrices.

## Singular Value Decomposition

Any matrix $A\in\mathbb{C}^{m\times n}$ has a singular value decomposition (SVD)

$\notag A = U\Sigma V^*, \quad \Sigma = \mathrm{diag}(\sigma_1,\sigma_2,\dots,\sigma_p) \in \mathbb{R}^{m\times n}, \quad p = \min(m,n),$

where $U\in\mathbb{C}^{m\times m}$ and $V\in\mathbb{C}^{n\times n}$ are unitary and $\sigma_1\ge\sigma_2\ge\cdots\ge\sigma_p\ge0$. The $\sigma_i$ are the singular values of $A$, and they are the nonnegative square roots of the $p$ largest eigenvalues of $A^*A$. The columns of $U$ and $V$ are the left and right singular vectors of $A$, respectively. The rank of $A$ is equal to the number of nonzero singular values. If $A$ is real, $U$ and $V$ can be taken to be real. The essential SVD information is contained in the compact or economy size SVD $A = U\Sigma V^*$, where $U\in\mathbb{C}^{m\times r}$, $\Sigma = \mathrm{diag}(\sigma_1,\dots,\sigma_r)$, $V\in\mathbb{C}^{n\times r}$, and $r = \mathrm{rank}(A)$.

Cost: $14mn^2+8n^3$ for $P(:,1\colon n)$, $\Sigma$, and $Q$ by the Golub–Reinsch algorithm, or $6mn^2+20n^3$ with a preliminary QR factorization.

Use: determining matrix rank, solving rank-deficient least squares problems, computing all kinds of subspace information.

## Discussion

Pivoting can be incorporated into both Cholesky factorization and QR factorization, giving $\Pi^T A \Pi = R^*R$ (complete pivoting) and $A\Pi = QR$ (column pivoting), respectively, where $\Pi$ is a permutation matrix. These pivoting strategies are useful for problems that are (nearly) rank deficient as they force $R$ to have a zero (or small) $(2,2)$ block.

The big six factorizations can all be computed by numerically stable algorithms. Another important factorization is that provided by the Jordan canonical form, but while it is a useful theoretical tool it cannot in general be computed in a numerically stable way.

For further details of these factorizations see the articles below.

These factorizations are precisely those discussed by Stewart (2000) in his article The Decompositional Approach to Matrix Computation, which explains the benefits of matrix factorizations in numerical linear algebra.

# What Is a Schur Decomposition?

A Schur decomposition of a matrix $A\in\mathbb{C}^{n\times n}$ is a factorization $A = QTQ^*$, where $Q$ is unitary and $T$ is upper triangular. The diagonal entries of $T$ are the eigenvalues of $A$, and they can be made to appear in any order by choosing $Q$ appropriately. The columns of $Q$ are called Schur vectors.

A subspace $\mathcal{X}$ of $\mathbb{C}^{n\times n}$ is an invariant subspace of $A$ if $Ax\in\mathcal{X}$ for all $x\in\mathcal{X}$. If we partition $Q$ and $T$ conformably we can write

$\notag A [Q_1,~Q_2] = [Q_1,~Q_2] \begin{bmatrix} T_{11} & T_{12} \\ 0 & T_{22} \\ \end{bmatrix},$

which gives $A Q_1 = Q_1 T_{11}$, showing that the columns of $Q_1$ span an invariant subspace of $A$. Furthermore, $Q_1^*AQ_1 = T_{11}$. The first column of $Q$ is an eigenvector of $A$ corresponding to the eigenvalue $\lambda_1 = t_{11}$, but the other columns are not eigenvectors, in general. Eigenvectors can be computed by solving upper triangular systems involving $T - \lambda I$, where $\lambda$ is an eigenvalue.

Write $T = D+N$, where $D = \mathrm{diag}(\lambda_i)$ and $N$ is strictly upper triangular. Taking Frobenius norms gives $\|A\|_F^2 = \|D\|_F^2 + \|N\|_F^2$, or

$\notag \|N\|_F^2 = \|A\|_F^2 - \displaystyle\sum_{i=1}^n |\lambda_i|^2.$

Hence $\|N\|_F$ is independent of the particular Schur decomposition and it provides a measure of the departure from normality. The matrix $A$ is normal (that is, $A^*A = AA^*$) if and only if $N = 0$. So a normal matrix is unitarily diagonalizable: $A = QDQ^*$.

An important application of the Schur decomposition is to compute matrix functions. The relation $f(A) = Qf(T)Q^*$ shows that computing $f(A)$ reduces to computing a function of a triangular matrix. Matrix functions illustrate what Van Loan (1975) describes as “one of the most basic tenets of numerical algebra”, namely “anything that the Jordan decomposition can do, the Schur decomposition can do better!”. Indeed the Jordan canonical form is built on a possibly ill conditioned similarity transformation while the Schur decomposition employs a perfectly conditioned unitary similarity, and the full upper triangular factor of the Schur form can do most of what the Jordan form’s bidiagonal factor can do.

An upper quasi-triangular matrix is a block upper triangular matrix

$\notag R = \begin{bmatrix} R_{11} & R_{12} & \dots & R_{1m}\\ & R_{22} & \dots & R_{2m}\\ & & \ddots& \vdots\\ & & & R_{mm} \end{bmatrix}$

whose diagonal blocks $R_{ii}$ are either $1\times1$ or $2\times2$. A real matrix $A\in\mathbb{R}^{n \times n}$ has a real Schur decomposition $A = QRQ^T$ in which in which all the factors are real, $Q$ is orthogonal, and $R$ is upper quasi-triangular with any $2\times2$ diagonal blocks having complex conjugate eigenvalues. If $A$ is normal then the $2\times 2$ blocks $R_{ii}$ have the form

$R_{ii} = \left[\begin{array}{@{}rr@{\mskip2mu}} a & b \\ -b & a \end{array}\right], \quad b \ne 0,$

which has eigenvalues $a \pm \mathrm{i}b$.

The Schur decomposition can be computed by the QR algorithm at a cost of about $25n^3$ flops for $Q$ and $T$, or $10n^3$ flops for $T$ only.

In MATLAB, the Schur decomposition is computed with the schur function: the command [Q,T] = schur(A) returns the real Schur form if $A$ is real and otherwise the complex Schur form. The complex Schur form for a real matrix can be obtained with [Q,T] = schur(A,'complex'). The rsf2csf function converts the real Schur form to the complex Schur form. The= ordschur` function takes a Schur decomposition and modifies it so that the eigenvalues appear in a specified order along the diagonal of $T$.