Photos and Videos from NJH60 Conference

The conference Advances in Numerical Linear Algebra: Celebrating the 60th Birthday of Nick Higham was held at the University of Manchester, July 6–8, 2022.

Most of the talks are available on the NLA Group YouTube channel and links to them are available on the conference web page.

Here is the conference photo.

220707-1226-51_3010-edit-2_1024px-1.jpg

(Hires version)

Here is me with some of my current and former PhD students.

220707-1230-52_3021_1024px-1.jpg (Hires version)

And with some of my current and former postdocs.

220707-1232-19_3029_1024px-1.jpg

(Hires version)

After-dinner speaker Charlie Van Loan: 220707_20-24-34_1024px.jpg

Rob Corless kindly gave me a Bohemian matrix eigenvalue tie based on this image. 220717_14-46-08_1024px.jpg

Many thanks to Stefan Güttel, Sven Hammarling, Stephanie Lai, Françoise Tisseur and Marcus Webb for organizing the conference and the Royal Society, MathWorks and the University of Manchester for financial support.

What Is Fast Matrix Multiplication?

The definition of matrix multiplication says that for n\times n matrices A and B, the product C = AB is given by c_{ij} = \sum_{k=1}^n a_{ik}b_{kj}. Each element of C is an inner product of a row of A and a column of B, so if this formula is used then the cost of forming C is n^2(2n-1) additions and multiplications, that is, O(n^3) operations. For over a century after the development of matrix algebra in the 1850s by Cayley, Sylvester and others, all methods for matrix multiplication were based on this formula and required O(n^3) operations.

In 1969 Volker Strassen showed that when n=2 the product can be computed from the formulas

\notag \begin{gathered} p_1 =(a_{11}+a_{22})(b_{11}+b_{22}), \\ p_2=(a_{21}+a_{22})b_{11}, \qquad p_3=a_{11}(b_{12}-b_{22}), \\ p_4=a_{22}(b_{21}-b_{11}), \qquad p_5=(a_{11}+a_{12})b_{22}, \\ p_6=(a_{21}-a_{11})(b_{11}+b_{12}), \qquad p_7=(a_{12}-a_{22})(b_{21}+b_{22}), \\ \noalign{\smallskip} C = \begin{bmatrix} p_1+p_4-p_5+p_7 & p_3+p_5 \\ p_2+p_4 & p_1+p_3-p_2+p_6 \end{bmatrix}. \end{gathered}

The evaluation requires 7 multiplications and 18 additions instead of 8 multiplications and 4 additions for the usual formulas.

At first sight, Strassen’s formulas may appear simply to be a curiosity. However, the formulas do not rely on commutativity so are valid when the a_{ij} and b_{ij} are matrices, in which case for large dimensions the saving of one multiplication greatly outweighs the extra 14 additions. Assuming n is a power of 2, we can partition A and B into four blocks of size n/2, apply Strassen’s formulas for the multiplication, and then apply the same formulas recursively on the half-sized matrix products.

Let us examine the number of multiplications for the recursive Strassen algorithm. Denote by M(k) the number of scalar multiplications required to multiply two 2^k \times 2^k matrices. We have M(k) = 7M(k-1), so

M(k) = 7M(k-1) = 7^2M(k-2) = \cdots = 7^k M(0) = 7^k.

But 7^k = 2^{\log_2{7^k}} = (2^k)^{\log_2 7} = n^{\log_2 7} = n^{2.807\dots}. The number of additions can be shown to be of the same order of magnitude, so the algorithm requires O(n^{\log_27})=O(n^{2.807\dots}) operations.

Strassen’s work sparked interest in finding matrix multiplication algorithms of even lower complexity. Since there are O(n^2) elements of data, which must each participate in at least one operation, the exponent of n in the operation count must be at least 2.

The current record upper bound on the exponent is 2.37286, proved by Alman and Vassilevska Williams (2021) which improved on the previous record of 2.37287, proved by Le Gall (2014) The following figure plots the best upper bound for the exponent for matrix multiplication over time.

mmhist.jpg

In the methods that achieve exponents lower than 2.775, various intricate techniques are used, based on representing matrix multiplication in terms of bilinear or trilinear forms and their representation as tensors having low rank. Laderman, Pan, and Sha (1993) explain that for these methods “very large overhead constants are hidden in the `O‘ notation”, and that the methods “improve on Strassen’s (and even the classical) algorithm only for immense numbers N.”

Strassen’s method, when carefully implemented, can be faster than conventional matrix multiplication for reasonable dimensions. In practice, one does not recur down to 1\times 1 matrices, but rather uses conventional multiplication once n_0\times n_0 matrices are reached, where the parameter n_0 is tuned for the best performance.

Strassen’s method has the drawback that it satisfies a weaker form of rounding error bound that conventional multiplication. For conventional multiplication C = AB of A\in\mathbb{R}^{m\times n} and B\in\mathbb{R}^{n\times p} we have the componentwise bound

\notag        |C - \widehat{C}| \le \gamma^{}_n |A| @ |B|, \qquad(1)

where \gamma^{}_n = nu/(1-nu) and u is the unit roundoff. For Strassen’s method we have only a normwise error bound. The following result uses the norm \|A\| = \max_{i,j} |a_{ij}|, which is not a consistent norm.

Theorem 1 (Brent). Let A,B \in \mathbb{R}^{n\times n}, where n=2^k. Suppose that C=AB is computed by Strassen’s method and that n_0=2^r is the threshold at which conventional multiplication is used. The computed product \widehat{C} satisfies

\notag    \|C - \widehat{C}\| \le \left[ \Bigl( \displaystyle\frac{n}{n_0} \Bigr)^{\log_2{12}}                       (n_0^2+5n_0) - 5n \right] u \|A\|\, \|B\|                       + O(u^2). \qquad(2)

With full recursion (n_0=1) the constant in (2) is 6 n^{\log_2 12}-5n \approx 6 n^{3.585}-5n, whereas with just one level of recursion (n_0 = n/2) it is 3n^2+25n. These compare with n^2u + O(u^2) for conventional multiplication (obtained by taking norms in (1)). So the constant for Strassen’s method grows at a faster rate than that for conventional multiplication no matter what the choice of n_0.

The fact that Strassen’s method does not satisfy a componentwise error bound is a significant weakness of the method. Indeed Strassen’s method cannot even accurately multiply by the identity matrix. The product

\notag       C = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}           \begin{bmatrix} 1 & \epsilon \\ \epsilon & \epsilon^2 \end{bmatrix},             \quad 0 < \epsilon \ll 1

is evaluated exactly in floating-point arithmetic by conventional multiplication, but Strassen’s method computes

c_{22} = 2(1+\epsilon^2) + (\epsilon-\epsilon^2) - 1 - (1+\epsilon).

Because c_{22} involves subterms of order unity, the error c_{22} - \widehat c_{22} will be of order u. Thus the relative error |c_{22} - \widehat c_{22}| / |c_{22}| = O(u/\epsilon^2) \gg O(u),

Another weakness of Strassen’s method is that while the scaling AB \to (AD) (D^{-1}B), where D is diagonal, leaves (1) unchanged, it can alter (2) by an arbitrary amount. Dumitrescu (1998) suggests computing D_1(D_1^{-1}A\cdot B D_2^{-1})D_2, where the diagonal matrices D_1 and D_2 are chosen to equilibrate the rows of A and the columns of B in the \infty-norm; he shows that this scaling can improve the accuracy of the result. Further investigations along these lines are made by Ballard et al. (2016).

Should one use Strassen’s method in practice, assuming that an implementation is available that is faster than conventional multiplication? Not if one needs a componentwise error bound, which ensures accurate products of matrices with nonnegative entries and ensures that the column scaling of A and row scaling of B has no effect on the error. But if a normwise error bound with a faster growing constant than for conventional multiplication is acceptable then the method is worth considering.

Notes

For recent work on high-performance implementation of Strassen’s method see Huang et al. (2016, 2020).

Theorem 1 is from an unpublished technical report of Brent (1970). A proof can be found in Higham (2002, §23.2.2).

For more on fast matrix multiplication see Bini (2014) and Higham (2002, Chapter 23).

References

This is a minimal set of references, which contain further useful references within.

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

Creativity Workshop on Numerical Linear Algebra

220412-1322-35_2415.jpg

Sixteen members of the Numerical Linear Algebra group at the University of Manchester recently attended a two-day creativity workshop in order to generate ideas for our research and other activities. The workshop was facilitated by Dennis Sherwood, who is an expert in creativity and has run many such workshops. Dennis and I have previously collaborated on workshops for the Manchester Numerical Analysis group (2013), the EPSRC NA-HPC Network (2014), and the SIAM leadership (2018).

220412-1323-08_2420.jpg

A creativity workshop brings together a group of people to tackle questions using a structured approach, in which people share what they know about the question, ask “how might this be different” about the aspects identified, and then discuss the resulting possibilities. A key feature of these workshops is that every idea is captured—on flip charts, coloured cards, and post-it notes—and ideas are not evaluated until they have all been generated. This approach contrasts will the all-too-common situation where excellent ideas generated in a discussion are instantly dismissed because of a “that will never work” reaction.

At our workshop a number of topics were addressed, covering strategic plans for the group and plans for future research projects and grant proposals, including “Mixed precision algorithms”, “Being a magnet for talent”, and “Conferences”. Many ideas were generated and assessed and the group is now planning the next steps with the help of the detailed 70-page written report produced by Dennis.

220412-0801-16_2407.jpg

One idea has already been implemented: we have a new logo; see A New Logo for the Numerical Linear Algebra Group.

For more on the creativity process mentioned here, as well as details of creativity workshops, including sample briefs that can be used at them, see the new book by Dennis and me: How to Be Creative: A Practical Guide for the Mathematical Sciences (SIAM, 2022).

220411-1026-13_4259.jpg

How to Space Displayed Mathematical Equations

In a displayed mathematical equation with more than one component, how much space should be placed between the components?

Here are the guidelines I use, with examples in LaTeX. Recall that a \quad is approximately the width of a capital M and \qquad is twice the width of a \quad.

Case 1. Equation with Qualifying Expression

An equation or other mathematical construct is separated from a qualifying expression by a \quad. Examples:

\notag     |a_{ii}| \ge \displaystyle\sum_{j\ne i} |a_{ij}|, \quad i=1\colon n.

\notag       fl(x\mathbin{\mathrm{op}}y) = (x\mathbin{\mathrm{op}} y)(1+\delta), \quad |\delta|\le u,       \quad \mathbin{\mathrm{op}} =+,-,*,/.

\notag    y' = t^2+y^2, \quad 0\le t\le 1, \quad y(0)=0.

When the qualifying expression is a prepositional phrase it is given standard sentence spacing. Examples:

\notag \min_x c^Tx \quad \mathrm{subject~to~} Ax=b,~ x\ge 0.

\notag   \|J(v)-J(w)\| \le \theta_L \|v-w\| \quad   \mathrm{for~all~} v,w \in \mathbb{R}^n.

The first example was typed as (using the equation* environment provided by the amsmath package)

\begin{equation*}
\min_x c^Tx \quad \text{subject to $Ax=b$, $x\ge 0$}.
\end{equation*}

Here, the qualifying phrase is placed inside a \text command, which jumps out of math mode and formats its argument as regular text, with the usual interword spacing in effect, and we re-enter math mode for the conditions. This is better than writing

\min_x c^Tx \quad \text{subject to} ~Ax=b, ~x\ge 0.

with hard spaces. Note that \text is a command from the amsmath package, and it is similar to the LaTeX command \mbox and the TeX command \hbox, both of which work equally well here.

Case 2. Equation with Conjunction

When an equation contains a conjunction such as and or or, the conjunction has a \quad on each side. Examples:

\notag      x = 1 \quad \mathrm{or} \quad x = 2.

\notag      a = \displaystyle\sum_{j=1}^n c_j v_j \quad \mathrm{where} \quad      c_j = \langle a,~ u_j\rangle~\mathrm{for}~j=1,2,\dots,n.

In the second example, one might argue for a \quad before the qualifying “for”, on the basis of case 1, but it I prefer the word spacing. This example was typed as

\begin{equation*}
     a = \sum_{j=1}^n c_j v_j \quad \text{where} \quad
     \text{$c_j = \langle a, u_j\rangle$ for $j=1,2,\dots,n$}.
\end{equation*}

Case 3. Multiple Equations

Two or more equations are separated by a \qquad. Examples:

\notag  A = e_1^{}e_3^T, \qquad  B = e_1^{}e_4^T, \qquad  C = e_2^{}e_3^T, \qquad  D = e_2^{}e_4^T

\notag \begin{aligned}     AXA &= A,  \qquad  & XAX    &= X,\\  (AX)^* &= AX, \qquad  & (XA)^* &= XA. \end{aligned}

Limitations

It is important to emphasize that one might diverge from following these (or any other) guidelines, for a variety of reasons. With a complicated display, or if a narrow text width is in use (as with a two-column format), horizontal space may be at a premium so one may need to reduce the spacing. And the guidelines do not cover every possible situation.

Notes

My guidelines are the same ones that were used in typesetting the Princeton Companion to Applied Mathematics, and I am grateful to Sam Clark (T&T Productions), copy editor and typesetter of the Companion, for discussions about them. Cases 1 and 3 are recommended in my Handbook of Writing for the Mathematical Sciences (2017).

The SIAM Style Guide (link to PDF) prefers a \qquad in Case 1 and \quad in Case 3 with three or more equations. The AMS Style Guide (link to PDF) has the same guidelines as SIAM. Both SIAM and the AMS allow an author to use just a \quad between an equation an a qualifying expression.

In the TeXbook (1986, p. 166), Knuth advocates using a \qquad between an equation and a qualifying expression.

What Is the Pascal Matrix?

The Pascal matrix P_n\in\mathbb{R}^{n\times n} is the symmetric matrix defined by

p_{ij} = \displaystyle{i+j-2 \choose j-1} = \displaystyle\frac{ (i+j-2)! }{ (i-1)! (j-1)! }.

It contains the rows of Pascal’s triangle along the anti-diagonals. For example:

\notag  P_5 = \left[\begin{array}{ccccc}     1 & 1 & 1 & 1 & 1\\     1 & 2 & 3 & 4 & 5\\     1 & 3 & 6 & 10 & 15\\     1 & 4 & 10 & 20 & 35\\     1 & 5 & 15 & 35 & 70 \end{array}\right].

In MATLAB, the matrix is pascal(n).

The Pascal matrix is positive definite and has the Cholesky factorization

\notag    P_n = L_nL_n^T,   \qquad(1)

where the rows of L_n are the rows of Pascal’s triangle. For example,

\notag    L_5 = \left[\begin{array}{ccccc} 1 & 0 & 0 & 0 & 0\\ 1 & 1 & 0 & 0 & 0\\ 1 & 2 & 1 & 0 & 0\\ 1 & 3 & 3 & 1 & 0\\ 1 & 4 & 6 & 4 & 1 \end{array}\right]\\

From (1) we have \det(P_n) = \det(L_n)^2 = 1. Form this equation, or by inverting (1), it follows that P_n^{-1} has integer elements. Indeed the inverse is known to have (i,j) element

\notag      (-1)^{i-j} \displaystyle\sum_{k=0}^{n-i} {i+k-1 \choose k} {i+k-1 \choose i+k-j},      \quad i \ge j. \qquad(2)

The Cholesky factor L_n can be factorized as

\notag    L_n = M_{n-1} M_{n-2} \dots M_1, \qquad(3)

where M_i is unit lower bidiagonal with the first i-1 entries along the subdiagonal of M_i zero and the rest equal to 1. For example,

\notag \begin{aligned}    L_5 =     \left[\begin{array}{ccccc} 1 & 0 & 0 & 0 & 0\\ 0 & 1 & 0 & 0 & 0\\ 0 & 0 & 1 & 0 & 0\\ 0 & 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1 & 1 \end{array}\right]     \left[\begin{array}{ccccc} 1 & 0 & 0 & 0 & 0\\ 0 & 1 & 0 & 0 & 0\\ 0 & 0 & 1 & 0 & 0\\ 0 & 0 & 1 & 1 & 0\\ 0 & 0 & 0 & 1 & 1 \end{array}\right]     \left[\begin{array}{ccccc} 1 & 0 & 0 & 0 & 0\\ 0 & 1 & 0 & 0 & 0\\ 0 & 1 & 1 & 0 & 0\\ 0 & 0 & 1 & 1 & 0\\ 0 & 0 & 0 & 1 & 1 \end{array}\right]     \left[\begin{array}{ccccc} 1 & 0 & 0 & 0 & 0\\ 1 & 1 & 0 & 0 & 0\\ 0 & 1 & 1 & 0 & 0\\ 0 & 0 & 1 & 1 & 0\\ 0 & 0 & 0 & 1 & 1 \end{array}\right]. \end{aligned}

The factorization (3) shows that P_n is totally positive, that is, every minor (a determinant of a square submatrix) is positive. Indeed each bidiagonal factor M_i is totally nonnegative, that is, every minor is nonnegative, and the product of totally nonnegative matrices is totally nonnegative. Further results in the theory of totally positive matrices show that the product is actually totally positive.

The positive definiteness of P_n implies that the eigenvalues are real and positive. The total positivity, together with the fact that P_n is (trivially) irreducible, implies that the eigenvalues are distinct.

For a symmetric positive semidefinite matrix A with nonnegative entries, we define A^{\circ t} = (a_{ij}^t), which is the matrix with every entry raised to the power t\in \mathbb{R}. We say that A is infinitely divisible if A^{\circ t} is positive semidefinite for all nonnegative t. The Pascal matrix is infinitely divisible.

It is possible to show that

\notag    L_n^{-1}  = DL_nD, \qquad (4)

where D = \mathrm{diag}((-1)^i). In other words, Y_n = L_nD is involutory, that is, Y_n^2 = I. It follows from P_n = Y_n Y_n^T that

\notag \begin{aligned} P_n^{-1} = Y_n^{-T} Y_n^{-1} = Y_n^T Y_n = Y_n^{-1} (Y_n Y_n^T) Y_n          = Y_n^{-1} P_n Y_n, \end{aligned}

so P_n and P_n^{-1} are similar and hence have the same eigenvalues. This means that the eigenvalues of P_n appear in reciprocal pairs and that the characteristic polynomial is palindromic. Here is an illustration in MATLAB:

>> P = pascal(5); evals = eig(P); [evals 1./evals], coeffs = charpoly(P)
ans =
   1.0835e-02   9.2290e+01
   1.8124e-01   5.5175e+00
   1.0000e+00   1.0000e+00
   5.5175e+00   1.8124e-01
   9.2290e+01   1.0835e-02
coeffs =
     1   -99   626  -626    99    -1

Now

p_{nn} \le \|P\|_2 \le (\|P\|_1 \|P\|_{\infty})^{1/2}      = \biggl(\displaystyle\frac{2n-1}{n}\biggr) p_{nn},

where for the equality we used a binomial coefficient summation identity. The fact that P_n is positive definite with reciprocal eigenvalues implies that \kappa_2(P) = \|P\|_2^2. Hence, using Stirling’s approximation (n! \sim \sqrt{2\pi n} (n/\mathrm{e})^n),

\kappa_2(P_n) \sim p_{nn}^2                  \sim\left( \displaystyle\frac{ (2n)! }{ (n!)^2 } \right)^2                  \sim \displaystyle\frac{16^n}{n \pi}.

Thus P_n is exponentially ill conditioned as n\to\infty.

The matrix Y_n is obtained in MATLAB with pascal(n,1); this is a lower triangular square root of the identity matrix. Turnbull (1927) noted that by rotating Y_n through 90 degrees one obtains a cube root of the identity matrix. This matrix is returned by pascal(n,2). For example, with n = 5:

\notag \hspace*{-1cm} X = \left[\begin{array}{rrrrr} 1 & 1 & 1 & 1 & 1\\ -4 & -3 & -2 & -1 & 0\\ 6 & 3 & 1 & 0 & 0\\ -4 & -1 & 0 & 0 & 0\\ 1 & 0 & 0 & 0 & 0 \end{array}\right], \quad X^2 = \left[\begin{array}{rrrrr} 0 & 0 & 0 & 0 & 1\\ 0 & 0 & 0 & -1 & -4\\ 0 & 0 & 1 & 3 & 6\\ 0 & -1 & -2 & -3 & -4\\ 1 & 1 & 1 & 1 & 1 \end{array}\right], \quad X^3 = I.

The logarithm of L_n is explicitly known: it is the upper bidiagonal matrix

\notag   \log L_n = \left[\begin{array}{ccccc}     0 & 1 &   &        &  \\       & 0 & 2 &        &   \\[-5pt]       &   & 0 & \ddots &   \\[-5pt]       &   &   & \ddots & n-1\\       &   &   &        & 0 \end{array}\right].  \qquad (5)

Notes

For proofs of (2) and (4) see Cohen (1975) and Call and Velleman (1993). respectively. For (5), see Edelman and Strang (2004). The infinite divisibility of the Pascal matrix is infinitely is shown by Bhatia (2006). For the total positivity property see Fallat and Johnson (2011).

References

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/index-of-what-is-articles/ and in PDF form from the GitHub repository https://github.com/higham/what-is.

The Big Six Matrix Factorizations

Six matrix factorizations dominate in numerical linear algebra and matrix analysis: for most purposes one of them is sufficient for the task at hand. We summarize them here.

For each factorization we give the cost in flops for the standard method of computation, stating only the highest order terms. We also state the main uses of each factorization.

For full generality we state factorizations for complex matrices. Everything translates to the real case with “Hermitian” and “unitary” replaced by “symmetric” and “orthogonal”, respectively.

The terms “factorization” and “decomposition” are synonymous and it is a matter of convention which is used. Our list comprises three factorization and three decompositions.

Recall that an upper triangular matrix is a matrix of the form

\notag   R =   \begin{bmatrix}   r_{11} & r_{12} & \dots & r_{1n}\\          & r_{22} & \dots & r_{2n}\\          &        & \ddots& \vdots\\          &        &       & r_{nn}   \end{bmatrix},

and a lower triangular matrix is the transpose of an upper triangular one.

Cholesky Factorization

Every Hermitian positive definite matrix A\in\mathbb{C}^{n\times n} has a unique Cholesky factorization A = R^*R, where R\in\mathbb{C}^{n\times n} is upper triangular with positive diagonal elements.

Cost: n^3/3 flops.

Use: solving positive definite linear systems.

LU Factorization

Any matrix A\in\mathbb{C}^{n\times n} has an LU factorization PA = LU, where P is a permutation matrix, L is unit lower triangular (lower triangular with 1s on the diagonal), and U is upper triangular. We can take P = I if the leading principal submatrices A(1\colon k, 1\colon k), k = 1\colon n-1, of A are nonsingular, but to guarantee that the factorization is numerically stable we need A to have particular properties, such as diagonal dominance. In practical computation we normally choose P using the partial pivoting strategy, which almost always ensures numerically stable.

Cost: 2n^3/3 flops.

Use: solving general linear systems.

QR Factorization

Any matrix A\in\mathbb{C}^{m\times n} with m\ge n has a QR factorization A = QR, where Q\in \mathbb{C}^{m\times m} is unitary and R is upper trapezoidal, that is, R = \left[\begin{smallmatrix} R_1 \\ 0\end{smallmatrix}\right], where R_1\in\mathbb{C}^{n\times n} is upper triangular.

Partitioning Q = [Q_1~Q_2], where Q_1\in\mathbb{C}^{m\times n} has orthonormal columns, gives A = Q_1R_1, which is the reduced, economy size, or thin QR factorization.

Cost: 2(n^2m-n^3/3) flops for Householder QR factorization. The explicit formation of Q (which is not usually necessary) requires a further 4(m^2n-mn^2+n^3/3) flops.

Use: solving least squares problems, computing an orthonormal basis for the range space of A, orthogonalization.

Schur Decomposition

Any matrix A\in\mathbb{C}^{n\times n} has a Schur decomposition A = QTQ^*, where Q is unitary and T is upper triangular. The eigenvalues of A appear on the diagonal of T. For each k, the leading k columns of Q span an invariant subspace of A.

For real matrices, a special form of this decomposition exists in which all the factors are real. An upper quasi-triangular matrix R is a block upper triangular with whose diagonal blocks R_{ii} are either 1\times1 or 2\times2. Any A\in\mathbb{R}^{n\times n} has a real Schur decomposition A = Q R Q^T, where Q is real orthogonal and R is real upper quasi-triangular with any 2\times2 diagonal blocks having complex conjugate eigenvalues.

Cost: 25n^3 flops for Q and T (or R) by the QR algorithm; 10n^3 flops for T (or R) only.

Use: computing eigenvalues and eigenvectors, computing invariant subspaces, evaluating matrix functions.

Spectral Decomposition

Every Hermitian matrix A\in\mathbb{C}^{n\times n} has a spectral decomposition A = Q\Lambda Q^*, where Q is unitary and \Lambda = \mathrm{diag}(\lambda_i). The \lambda_i are the eigenvalues of A, and they are real. The spectral decomposition is a special case of the Schur decomposition but is of interest in its own right.

Cost: 9n^3 for Q and D by the QR algorithm, or 4n^3\!/3 flops for D only.

Use: any problem involving eigenvalues of Hermitian matrices.

Singular Value Decomposition

Any matrix A\in\mathbb{C}^{m\times n} has a singular value decomposition (SVD)

\notag   A = U\Sigma V^*, \quad   \Sigma = \mathrm{diag}(\sigma_1,\sigma_2,\dots,\sigma_p)             \in \mathbb{R}^{m\times n},   \quad   p = \min(m,n),

where U\in\mathbb{C}^{m\times m} and V\in\mathbb{C}^{n\times n} are unitary and \sigma_1\ge\sigma_2\ge\cdots\ge\sigma_p\ge0. The \sigma_i are the singular values of A, and they are the nonnegative square roots of the p largest eigenvalues of A^*A. The columns of U and V are the left and right singular vectors of A, respectively. The rank of A is equal to the number of nonzero singular values. If A is real, U and V can be taken to be real. The essential SVD information is contained in the compact or economy size SVD A = U\Sigma V^*, where U\in\mathbb{C}^{m\times r}, \Sigma = \mathrm{diag}(\sigma_1,\dots,\sigma_r), V\in\mathbb{C}^{n\times r}, and r = \mathrm{rank}(A).

Cost: 14mn^2+8n^3 for P(:,1\colon n), \Sigma, and Q by the Golub–Reinsch algorithm, or 6mn^2+20n^3 with a preliminary QR factorization.

Use: determining matrix rank, solving rank-deficient least squares problems, computing all kinds of subspace information.

Discussion

Pivoting can be incorporated into both Cholesky factorization and QR factorization, giving \Pi^T A \Pi = R^*R (complete pivoting) and A\Pi = QR (column pivoting), respectively, where \Pi is a permutation matrix. These pivoting strategies are useful for problems that are (nearly) rank deficient as they force R to have a zero (or small) (2,2) block.

The big six factorizations can all be computed by numerically stable algorithms. Another important factorization is that provided by the Jordan canonical form, but while it is a useful theoretical tool it cannot in general be computed in a numerically stable way.

For further details of these factorizations see the articles below.

These factorizations are precisely those discussed by Stewart (2000) in his article The Decompositional Approach to Matrix Computation, which explains the benefits of matrix factorizations in numerical linear algebra.

What Is a Schur Decomposition?

A Schur decomposition of a matrix A\in\mathbb{C}^{n\times n} is a factorization A = QTQ^*, where Q is unitary and T is upper triangular. The diagonal entries of T are the eigenvalues of A, and they can be made to appear in any order by choosing Q appropriately. The columns of Q are called Schur vectors.

A subspace \mathcal{X} of \mathbb{C}^{n\times n} is an invariant subspace of A if Ax\in\mathcal{X} for all x\in\mathcal{X}. If we partition Q and T conformably we can write

\notag   A [Q_1,~Q_2] = [Q_1,~Q_2]   \begin{bmatrix}   T_{11} & T_{12} \\       0  & T_{22} \\   \end{bmatrix},

which gives A Q_1 = Q_1 T_{11}, showing that the columns of Q_1 span an invariant subspace of A. Furthermore, Q_1^*AQ_1 = T_{11}. The first column of Q is an eigenvector of A corresponding to the eigenvalue \lambda_1 = t_{11}, but the other columns are not eigenvectors, in general. Eigenvectors can be computed by solving upper triangular systems involving T - \lambda I, where \lambda is an eigenvalue.

Write T = D+N, where D = \mathrm{diag}(\lambda_i) and N is strictly upper triangular. Taking Frobenius norms gives \|A\|_F^2 = \|D\|_F^2 + \|N\|_F^2, or

\notag        \|N\|_F^2 = \|A\|_F^2 - \displaystyle\sum_{i=1}^n |\lambda_i|^2.

Hence \|N\|_F is independent of the particular Schur decomposition and it provides a measure of the departure from normality. The matrix A is normal (that is, A^*A = AA^*) if and only if N = 0. So a normal matrix is unitarily diagonalizable: A = QDQ^*.

An important application of the Schur decomposition is to compute matrix functions. The relation f(A) = Qf(T)Q^* shows that computing f(A) reduces to computing a function of a triangular matrix. Matrix functions illustrate what Van Loan (1975) describes as “one of the most basic tenets of numerical algebra”, namely “anything that the Jordan decomposition can do, the Schur decomposition can do better!”. Indeed the Jordan canonical form is built on a possibly ill conditioned similarity transformation while the Schur decomposition employs a perfectly conditioned unitary similarity, and the full upper triangular factor of the Schur form can do most of what the Jordan form’s bidiagonal factor can do.

An upper quasi-triangular matrix is a block upper triangular matrix

\notag   R =   \begin{bmatrix}   R_{11} & R_{12} & \dots & R_{1m}\\          & R_{22} & \dots & R_{2m}\\          &        & \ddots& \vdots\\          &        &       &  R_{mm}   \end{bmatrix}

whose diagonal blocks R_{ii} are either 1\times1 or 2\times2. A real matrix A\in\mathbb{R}^{n \times n} has a real Schur decomposition A = QRQ^T in which in which all the factors are real, Q is orthogonal, and R is upper quasi-triangular with any 2\times2 diagonal blocks having complex conjugate eigenvalues. If A is normal then the 2\times 2 blocks R_{ii} have the form

R_{ii} = \left[\begin{array}{@{}rr@{\mskip2mu}} a & b \\                -b & a \end{array}\right], \quad b \ne 0,

which has eigenvalues a \pm \mathrm{i}b.

The Schur decomposition can be computed by the QR algorithm at a cost of about 25n^3 flops for Q and T, or 10n^3 flops for T only.

In MATLAB, the Schur decomposition is computed with the schur function: the command [Q,T] = schur(A) returns the real Schur form if A is real and otherwise the complex Schur form. The complex Schur form for a real matrix can be obtained with [Q,T] = schur(A,'complex'). The rsf2csf function converts the real Schur form to the complex Schur form. The= ordschur function takes a Schur decomposition and modifies it so that the eigenvalues appear in a specified order along the diagonal of T.

References

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/index-of-what-is-articles/ and in PDF form from the GitHub repository https://github.com/higham/what-is.

What Is a Permutation Matrix?

A permutation matrix is a square matrix in which every row and every column contains a single 1 and all the other elements are zero. Such a matrix, P say, is orthogonal, that is, P^TP = PP^T = I_n, so it is nonsingular and has determinant \pm 1. The total number of n\times n permutation matrices is n!.

Premultiplying a matrix by P reorders the rows and postmultiplying by P reorders the columns. A permutation matrix P that has the desired reordering effect is constructed by doing the same operations on the identity matrix.

Examples of permutation matrices are the identity matrix I_n, the reverse identity matrix J_n, and the shift matrix K_n (also called the cyclic permutation matrix), illustrated for n = 4 by

\notag   I_4 = \begin{bmatrix}   1 & 0 & 0 & 0  \\   0 & 1 & 0 & 0  \\   0 & 0 & 1 & 0  \\   0 & 0 & 0 & 1 \end{bmatrix}, \qquad   J_4 = \begin{bmatrix}   0 & 0 & 0 & 1  \\   0 & 0 & 1 & 0  \\   0 & 1 & 0 & 0  \\   1 & 0 & 0 & 0 \end{bmatrix}, \qquad   K_4 = \begin{bmatrix}   0 & 1 & 0 & 0  \\   0 & 0 & 1 & 0  \\   0 & 0 & 0 & 1  \\   1 & 0 & 0 & 0 \end{bmatrix}.

Pre- or postmultiplying a matrix by J_n reverses the order of the rows and columns, respectively. Pre- or postmultiplying a matrix by K_n shifts the rows or columns, respectively, one place forward and moves the first one to the last position—that is, it cyclically permutes the rows or columns. Note that J_n is a symmetric Hankel matrix and K_n is a circulant matrix.

An elementary permutation matrix P differs from I_n in just two rows and columns, i and j, say. It can be written P = I_n - (e_i-e_j)(e_i-e_j)^T, where e_i is the ith column of I_n. Such a matrix is symmetric and so satisfies P^2 = I_n, and it has determinant -1. A general permutation matrix can be written as a product of elementary permutation matrices P = P_1P_2\dots P_k, where k is such that \det(P) = (-1)^k.

It is easy to show that \det(\lambda I - K_n) = \lambda^n - 1, which means that the eigenvalues of K_n are 1, w, w^2, \dots, w^{n-1}, where w = \exp(2\pi\mathrm{i}/n) is the nth root of unity. The matrix K_n has two diagonals of 1s, which move up through the matrix as it is powered: K_n^i \ne I for i< n and K_n^n = I. The following animated gif superposes MATLAB spy plots of K_{64}, K_{64}^2, …, K_{64}^{64} = I_{64}.

shift_powers.gif

The shift matrix K_n plays a fundamental role in characterizing irreducible permutation matrices. Recall that a matrix A\in\mathbb{C}^{n\times n} is irreducible if there does not exist a permutation matrix P such that

\notag         P^TAP = \begin{bmatrix} A_{11} & A_{12} \\                                    0   & A_{22}                  \end{bmatrix},

where A_{11} and A_{22} are square, nonempty submatrices.

Theorem 1. For a permutation matrix P \in \mathbb{R}^{n \times n} the following conditions are equivalent.

  • P is irreducible.
  • There exists a permutation matrix Q such that Q^{-1}PQ = K_n
  • The eigenvalues of P are 1, w, w^2, \dots, w^{n-1}.

One consequence of Theorem 1 is that for any irreducible permutation matrix P, \mathrm{rank}(P - I) = \mathrm{rank}(K_n - I) = n-1.

The next result shows that a reducible permutation matrix can be expressed in terms of irreducible permutation matrices.

Theorem 2. Every reducible permutation matrix is permutation similar to a direct sum of irreducible permutation matrices.

Another notable permutation matrix is the vec-permutation matrix, which relates A\otimes B to B\otimes A, where \otimes is the Kronecker product.

A permutation matrix is an example of a doubly stochastic matrix: a nonnegative matrix whose row and column sums are all equal to 1. A classic result characterizes doubly stochastic matrices in terms of permutation matrices.

Theorem 3 (Birkhoff). A matrix is doubly stochastic if and only if it is a convex combination of permutation matrices.

In coding, memory can be saved by representing a permutation matrix P as an integer vector p, where p_i is the column index of the 1 within the ith row of P. MATLAB functions that return permutation matrices can also return the permutation in vector form. Here is an example with the MATLAB lu function that illustrates how permuting a matrix can be done using the vector permutation representation.

>> A = gallery('frank',4), [L,U,P] = lu(A); P
A =
     4     3     2     1
     3     3     2     1
     0     2     2     1
     0     0     1     1
P =
     1     0     0     0
     0     0     1     0
     0     0     0     1
     0     1     0     0
>> P*A
ans =
     4     3     2     1
     0     2     2     1
     0     0     1     1
     3     3     2     1
>> [L,U,p] = lu(A,'vector'); p
p =
     1     3     4     2
>> A(p,:)
ans =
     4     3     2     1
     0     2     2     1
     0     0     1     1
     3     3     2     1

For more on handling permutations in MATLAB see section 24.3 of MATLAB Guide.

Notes

For proofs of Theorems 1–3 see Zhang (2011, Sec. 5.6). Theorem 3 is also proved in Horn and Johnson (2013, Thm. 8.7.2).

Permutations play a key role in the fast Fourier transform and its efficient implementation; see Van Loan (1992).

References

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/index-of-what-is-articles/ and in PDF form from the GitHub repository https://github.com/higham/what-is.

What’s New in MATLAB R2022a?

In this post I discuss some of the new features in MATLAB R2022a, focusing on ones that relate to my particular interests. See the release notes for a detailed list of the many changes in MATLAB and its toolboxes. For my articles about new features in earlier releases, see here.

Themes

MATLAB Online now has themes, including a dark theme (which is my preference). We will have to wait for a future release for themes to be supported on desktop MATLAB.

Economy Factorizations

One can now write qr(A,'econ') instead of qr(A,0) and gsvd(A,B,'econ') instead of gsvd(A,B) for the “economy size” decompositions. This is useful as the 'econ' form is more descriptive. The svd function already supported the 'econ' argument. The economy-size QR factorization is sometimes called the thin QR factorization.

Tie Breaking in the round Function

The round function, which rounds to the nearest integer, now breaks ties by rounding away from zero by default and has several other tie-breaking options (albeit not stochastic rounding). See a sequence of four blog posts on this topic by Cleve Moler starting with this one from February 2021.

Tolerances for null and orth

The null (nullspace) and orth (orthonormal basis for the range) functions now accept a tolerance as a second argument, and any singular values less than that tolerance are treated as zero. The default tolerance is max(size(A)) * eps(norm(A)). This change brings the two functions into line with rank, which already accepted the tolerance. If you are working in double precision (the MATLAB default) and your matrix has inherent errors of order 10^{-8} (for example), you might set the tolerance to 10^{-8}, since singular values smaller than this are indistinguishable from zero.

Unit Testing Reports

The unit testing framework can now generate docx, html, and pdf reports after test execution, by using the function generatePDFReport in the latter case. This is useful for keeping a record of test results and for printing them. We use unit testing in Anymatrix and have now added an option to return the results in a variable so that the user can call one of these new functions.

Checking Arrays for Special Values

Previously, if you wanted to check whether a matrix had all finite values you would need to use a construction such as all(all(isfinite(A))) or all(isfinite(A),'all'). The new allfinite function does this in one go: allfinite(A) returns true or false according as all the elements of A are finite or not, and it works for arrays of any dimension.

Similarly, anynan and anymissing check for NaNs or missing values. A missing value is a NaN for numerical arrays, but is indicated in other ways for other data types.

Linear Algebra on Multidimensional Arrays

The new pagemldivide, pagemrdivide, and pageinv functions solve linear equations and calculate matrix inverses using pages of d-dimensional arrays, while tensorprod calculates tensor products (inner products, outer products, or a combination of the two) between two d-dimensional arrays.

Animated GIFs

The append option of the exportgraphics function now supports the GIF format, enabling one to create animated GIFs (previously only multipage PDF files were supported). The key command is exportgraphics(gca,file_name,"Append",true). There are other ways of creating animated GIFs in MATLAB, but this one is particularly easy. Here is an example M-file (based on cheb3plot in MATLAB Guide) with its output below.

%CHEB_GIF  Animated GIF of Chebyshev polynomials.
%   Based on cheb3plot in MATLAB Guide.
x = linspace(-1,1,1500)';
p = 49
Y = ones(length(x),p);

Y(:,2) = x;
for k = 3:p
  Y(:,k) = 2*x.*Y(:,k-1) - Y(:,k-2);
end

delete cheby_animated.gif
a = get(groot,'defaultAxesColorOrder'); m = length(a);

for j = 1:p-1 % length(k)
    plot(x,Y(:,j),'LineWidth',1.5,'color',a(1+mod(j-1,m),:));
    xlim([-1 1]), ylim([-1 1])  % Must freeze axes.
    title(sprintf('%2.0f', j),'FontWeight','normal')
    exportgraphics(gca,"cheby_animated.gif","Append",true)
end

cheby_animated.gif

What Is the Matrix Inverse?

The inverse of a matrix A\in\mathbb{C}^{n\times n} is a matrix X\in\mathbb{C}^{n\times n} such that AX = I, where I is the identity matrix (which has ones on the diagonal and zeros everywhere else). The inverse is written as A^{-1}. If the inverse exists then A is said to be nonsingular or invertible, and otherwise it is singular.

The inverse X of A also satisfies XA = I, as we now show. The equation AX = I says that Ax_j = e_j for j=1\colon n, where x_j is the jth column of A and e_j is the jth unit vector. Hence the n columns of A span \mathbb{C}^n, which means that the columns are linearly independent. Now A(I-XA) = A - AXA = A -A = 0, so every column of I - XA is in the null space of A. But this contradicts the linear independence of the columns of A unless I - XA = 0, that is, XA = I.

The inverse of a nonsingular matrix is unique. If AX = AW = I then premultiplying by X gives XAX = XAW, or, since XA = I, X = W.

The inverse of the inverse is the inverse: (A^{-1})^{-1} = A, which is just another way of interpreting the equations AX = XA = I.

Connections with the Determinant

Since the determinant of a product of matrices is the product of the determinants, the equation AX = I implies \det(A) \det(X) = 1, so the inverse can only exist when \det(A) \ne 0. In fact, the inverse always exists when \det(A) \ne 0.

An explicit formula for the inverse is

\notag   A^{-1} = \displaystyle\frac{\mathrm{adj}(A)}{\det(A)},   \qquad (1)

where the adjugate \mathrm{adj} is defined by

\bigl(\mathrm{adj}(A)\bigr)_{ij} = (-1)^{i+j} \det(A_{ji})

and where A_{pq} denotes the submatrix of A obtained by deleting row p and column q. A special case is the formula

\notag      \begin{bmatrix}        a & b \\ c& d      \end{bmatrix}^{-1}    = \displaystyle\frac{1}{ad-bc}      \begin{bmatrix}        d & -b \\ -c & a      \end{bmatrix}.

The equation AA^{-1} = I implies \det(A^{-1}) = 1/\det(A).

Conditions for Nonsingularity

The following result collects some equivalent conditions for a matrix to be nonsingular. We denote by \mathrm{null}(A) the null space of A (also called the kernel).

Theorem 1. For A \in \mathbb{C}^{n \times n} the following conditions are equivalent to A being nonsingular:

  • \mathrm{null}(A) = \{0\},
  • \mathrm{rank}(A) = n,
  • Ax=b has a unique solution x, for any b,
  • none of the eigenvalues of A is zero,
  • \det(A) \ne 0.

A useful formula is

\notag     (AB)^{-1} = B^{-1}A^{-1}.

Here are some facts about the inverses of n\times n matrices of special types.

  • A diagonal matrix D = \mathrm{diag}(d_i) is nonsingular if d_i\ne0 for all i, and D^{-1} = \mathrm{diag}(d_i^{-1}).
  • An upper (lower) triangular matrix T is nonsingular if its diagonal elements are nonzero, and the inverse is upper (lower) triangular with (i,i) element t_{ii}^{-1}.
  • If x,y\in\mathbb{C}^n and y^*A^{-1}x \ne -1, then A + xy^* is nonsingular and

    \notag     \bigl(A + xy^*\bigr)^{-1}       = A^{-1} - \displaystyle\frac{A^{-1}x y^* A^{-1}}{1 +  y^*A^{-1}x}.

    This is the Sherman–Morrison formula.

The Inverse as a Matrix Polynomial

The Cayley-–Hamilton theorem says that a matrix satisfies its own characteristic equation, that is, if p(t) = \det(tI - A) = t^n + c_{n-1} t^{n-1} + \cdots + c_0, then p(A) = 0. In other words, A^n + c_{n-1} A^{n-1} + \cdots + c_0I = 0, and if A is nonsingular then multiplying through by A^{-1} gives (since c_0 = p(0) = (-1)^n\det(A) \ne 0)

A^{-1} = -\displaystyle\frac{1}{c_0} (A^{n-1} + c_{n-1} A^{n-2} + \cdots + c_1 I).     \qquad (2)

This means that A^{-1} is expressible as a polynomial of degree at most n-1 in A (with coefficients that depend on A).

To Compute or Not to Compute the Inverse

The inverse is an important theoretical tool, but it is rarely necessary to compute it explicitly. If we wish to solve a linear system of equations Ax = b then computing A^{-1} and then forming x = A^{-1}b is both slower and less accurate in floating-point arithmetic than using LU factorization (Gaussian elimination) to solve the system directly. Indeed, for n = 1 one would not solve 3x = 1 by computing 3^{-1} \times 1.

For sparse matrices, computing the inverse may not even be practical, as the inverse is usually dense.

If one needs to compute the inverse, how should one do it? We will consider the cost of different methods, measured by the number of elementary arithmetic operations (addition, subtraction, division, multiplication) required. Using (1), the cost is that of computing one determinant of order n and n^2 determinants of order n-1. Since computing a k\times k determinant costs at least O(k^3) operations by standard methods, this approach costs at least O(n^5) operations, which is prohibitively expensive unless n is very small. Instead one can compute an LU factorization with pivoting and then solve the systems Ax_i = e_i for the columns x_i of A^{-1}, at a total cost of 2n^3 + O(n^2) operations.

Equation (2) does not give a good method for computing A^{-1}, because computing the coefficients c_i and evaluating a matrix polynomial are both expensive.

It is possible to exploit fast matrix multiplication methods, which compute the product of two n\times n matrices in O(n^\alpha) operations for some \alpha < 3. By using a block LU factorization recursively, one can reduce matrix inversion to matrix multiplication. If we use Strassen’s fast matrix multiplication method, which has \alpha = \log_2 7 \approx 2.807, then we can compute A^{-1} in O(n^{2.807}) operations.

Slash Notation

MATLAB uses the backslash and forward slash for “matrix division”, with the meanings A \backslash B = A^{-1}B and A / B = AB^{-1}. Note that because matrix multiplication is not commutative, A \backslash B \ne A / B, in general. We have A\backslash I = I/A = A^{-1} and I\backslash A = A/I = A. In MATLAB, the inverse can be compute with inv(A), which uses LU factorization with pivoting.

Rectangular Matrices

If A is m\times n then the equation AX = I_m requires X to be n\times m, as does XA = I_n. Rank considerations show that at most one of these equations can hold if m\ne n. For example, if A = a^* is a nonzero row vector, then AX = 1 for X = a/a^*a, but XA = aa^*/a^*a\ne I. This is an example of a generalized inverse.

An Interesting Inverse

Here is a triangular matrix with an interesting inverse. This example is adapted from the LINPACK Users’ Guide, which has the matrix, with “LINPACK” replacing “INVERSE” on the front cover and the inverse on the back cover.

\notag \left[\begin{array}{ccccccc} I & N & V & E & R & S & E\\ 0 & N & V & E & R & S & E\\ 0 & 0 & V & E & R & S & E\\ 0 & 0 & 0 & E & R & S & E\\ 0 & 0 & 0 & 0 & R & S & E\\ 0 & 0 & 0 & 0 & 0 & S & E\\ 0 & 0 & 0 & 0 & 0 & 0 & E \end{array}\right]^{-1} = \left[\begin{array}{*{7}{r@{\hspace{4pt}}}} 1/I & -1/I & 0 & 0 & 0 & 0 & 0\\ 0 & 1/N & -1/N & 0 & 0 & 0 & 0\\ 0 & 0 & 1/V & -1/V & 0 & 0 & 0\\ 0 & 0 & 0 & 1/E & -1/E & 0 & 0\\ 0 & 0 & 0 & 0 & 1/R & -1/R & 0\\ 0 & 0 & 0 & 0 & 0 & 1/S & -1/S\\ 0 & 0 & 0 & 0 & 0 & 0 & 1/E \end{array}\right].