Theorem 1(Gershgorin’s theorem).The eigenvalues of lie in the union of the discs in the complex plane

Proof.Let be an eigenvalue of and a corresponding eigenvector and let . From the th equation in we have

Hence

and since it follows that belongs to the th disc, .

The Gershgorin discs are defined in terms of a summation over the rows of , but since the eigenvalues of are the same as those of the same result holds with summation over the columns.

A consequence of the theorem is that if does not belong to any of the Gershgorin discs then is nonsingular. This is equivalent to the well-known result that a strictly diagonally dominant matrix is nonsingular.

Another consequence of the theorem is that if is symmetric and all the discs lie in the open right half-plane then the eigenvalues are positive and so is positive definite. This condition is equivalent to having positive diagonal elements and being strictly diagonally dominant.

The Gershgorin discs for the matrix

are shown here:

The eigenvalues—three real and one complex conjugate pair—are the black dots. It happens that each disc contains an eigenvalue, but this is not always the case. For the matrix

the discs are

and the blue disc does not contain an eigenvalue. The next result, which is proved by a continuity argument, provides additional information that increases the utility of Gershgorin’s theorem. In particular it says that if a disc is disjoint from the other discs then it contains an eigenvalue.

Theorem 2.If of the Gershgorin discs of are disjoint from the other discs then their union contains eigenvalues of .

Theorem 2 tells us that the rightmost disc in our example contains one eigenvalue, , since it is disjoint from the other discs, and the union of the other four discs contains four eigenvalues. Furthermore, must be real, because if not it occurs in a complex conjugate pair since the matrix is real, and as the disc is symmetric about the real axis would also lie in the disc, contradicting Theorem 2.

Gershgorin’s theorem is most useful for matrices that are close to being diagonal. A technique that can produce improved eigenvalue estimates is to apply the theorem to for some nonsingular diagonal matrix . This similarity transformation does not change the eigenvalues but it does change the discs, and the aim is to choose to reduce the radiuses of the discs. Consider our example. We know that there is one eigenvalue in the rightmost disc and that it is real, so . For the rightmost disc shrinks and remains distinct from the others and we obtain the sharper bounds . The discs for are shown here:

Most books on matrix analysis or numerical linear algebra include Gershgorin’s theorem.

Eigenvalue inclusion regions have been developed with discs replaced by more complicated shapes, such as Brauer’s ovals of Cassini.

Varga’s 2004 book is devoted to Gershgorin’s theorem and related results. It reproduces Gershgorin’s 1931 paper in an appendix.

- S. Gerschgorin. Uber die Abgrenzung der Eigenwerte einer Matrix. Izv. Akad. Nauk. SSSR, 1:749–754, 1931.
- Richard S. Varga. Geršgorin and His Circles. Springer-Verlag, Berlin, Germany, 2004.

- What Is a Diagonally Dominant Matrix? (2021)
- What Is an Eigenvalue? (2022)

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

Here are some examples of nilpotent matrices.

Matrix is the instance of the upper bidiagonal matrix

for which

and . The superdiagonal of ones moves up to the right with each increase in the index of the power until it disappears off the top right corner of the matrix.

Matrix has rank and was constructed using a general formula: if with then . We simply took orthogonal vectors and .

If is nilpotent then every eigenvalue is zero, since with implies or . Consequently, the trace and determinant of a nilpotent matrix are both zero.

If is nilpotent and Hermitian or symmetric, or more generally normal (), then , since such a matrix has a spectral decomposition and the matrix is zero. It is only for nonnormal matrices that nilpotency is a nontrivial property, and the best way to understand it is with the Jordan canonical form (JCF). The JCF of a matrix with only zero eigenvalues has the form , where , where is of the form (1) and hence . It follows that the index of nilpotency is .

What is the rank of an nilpotent matrix ? The minimum possible rank is , attained for the zero matrix. The maximum possible rank is , attained when the JCF of has just one Jordan block of size . Any rank between and is possible: rank is attained when there is a Jordan block of size and all other blocks are .

Finally, while a nilpotent matrix is obviously not invertible, like every matrix it has a Moore–Penrose pseudoinverse. The pseudoinverse of a Jordan block with eigenvalue zero is just the transpose of the block: for in (1).

- Matrix Rank Relations (2021)
- What Is a Generalized Inverse? (2020)
- What Is the Jordan Canonical Form? (2022)

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

An matrix has eigenvalues. This can be seen by noting that is equivalent to , which means that is singular, since . Hence . But

is a scalar polynomial of degree (the characteristic polynomial of ) with nonzero leading coefficient and so has roots, which are the eigenvalues of . Since , the eigenvalues of are the same as those of .

A real matrix may have complex eigenvalues, but they appear in complex conjugate pairs. Indeed implies , so if is real then is an eigenvalue of with eigenvector .

Here are some matrices and their eigenvalues.

Note that and are upper triangular, that is, for . For such a matrix the eigenvalues are the diagonal elements.

A *symmetric matrix* () or *Hermitian matrix* (, where ) has real eigenvalues. A proof is so premultiplying the first equation by and postmultiplying the second by gives and , which means that , or since . The matrix above is symmetric.

A *skew-symmetric matrix* () or *skew-Hermitian complex matrix* () has pure imaginary eigenvalues. A proof is similar to the Hermitian case: and so is equal to both and , so . The matrix above is skew-symmetric.

In general, the eigenvalues of a matrix can lie anywhere in the complex plane, subject to restrictions based on matrix structure such as symmetry or skew-symmetry, but they are restricted to the disc centered at the origin with radius , because for any matrix norm it can be shown that every eigenvalue satisfies .

Here are some example eigenvalue distributions, computed in MATLAB. (The eigenvalues are computed at high precision using the Advanpix Multiprecision Computing Toolbox in order to ensure that rounding errors do not affect the plots.) The second and third matrices are real, so the eigenvalues are symmetrically distributed about the real axis. (The first matrix is complex.)

Although this article is about eigenvalues we need to say a little more about eigenvectors. An matrix with distinct eigenvalues has linearly independent eigenvectors. Indeed it is diagonalizable: for some nonsingular matrix with the matrix of eigenvalues. If we write in terms of its columns as then is equivalent to , , so the are eigenvectors of . The matrices and above both have two linearly independent eigenvectors.

If there are repeated eigenvalues there can be less than linearly independent eigenvectors. The matrix above has only one eigenvector: the vector (or any nonzero scalar multiple of it). This matrix is a Jordan block. The matrix shows that a matrix with repeated eigenvalues can have linearly independent eigenvectors.

Here are some questions about eigenvalues.

- What matrix decompositions reveal eigenvalues? The answer is the Jordan canonical form and the Schur decomposition. The Jordan canonical form shows how many linearly independent eigenvectors are associated with each eigenvalue.
- Can we obtain better bounds on where eigenvalues lie in the complex plane? Many results are available, of which the most well-known is Gershgorin’s theorem.
- How can we compute eigenvalues? Various methods are available. The QR algorithm is widely used and is applicable to all types of eigenvalue problems.

Finally, we note that the concept of eigenvalue is more general than just for matrices: it extends to nonlinear operators on finite or infinite dimensional spaces.

Many books include treatments of eigenvalues of matrices. We give just three examples.

- Gene Golub and Charles F. Van Loan, Matrix Computations, fourth edition, Johns Hopkins University Press, Baltimore, MD, USA, 2013.
- Roger A. Horn and Charles R. Johnson, Matrix Analysis, second edition, Cambridge University Press, 2013. My review of the second edition.
- Carl D. Meyer, Matrix Analysis and Applied Linear Algebra, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2000.

A neat way to express the indefinitess is that there exist vectors and for which .

A symmetric indefinite matrix has both positive and negative eigenvalues and in some sense is a typical symmetric matrix. For example, a random symmetric matrix is usually indefinite:

>> rng(3); B = rand(4); A = B + B'; eig(A)' ans = -8.9486e-01 -6.8664e-02 1.1795e+00 3.9197e+00

In general it is difficult to tell if a symmetric matrix is indefinite or definite, but there is one easy-to-spot sufficient condition for indefinitess: if the matrix has a zero diagonal element that has a nonzero element in its row then it is indefinite. Indeed if then , where is the th unit vector, so cannot be positive definite or negative definite. The existence of a nonzero element in the row of the zero rules out the matrix being positive semidefinite ( for all ) or negative semidefinite ( for all ).

An example of a symmetric indefinite matrix is a saddle point matrix, which has the block form

where is symmetric positive definite and . When is the identity matrix, is the augmented system matrix associated with a least squares problem . Another example is the reverse identity matrix , illustrated by

which has eigenvalues (exercise: how many s and how many s?). A third example is a Toeplitz tridiagonal matrix with zero diagonal:

>> A = full(gallery('tridiag',5,1,0,1)), eig(sym(A))' A = 0 1 0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 1 0 ans = [-1, 0, 1, 3^(1/2), -3^(1/2)]

How can we exploit symmetry in solving a linear system with a symmetric indefinite matrix ? A Cholesky factorization does not exist, but we could try to compute a factorization , where is unit lower triangular and is diagonal with both positive and negative diagonal entries. However, this factorization does not always exist and if it does, its computation in floating-point arithmetic can be numerically unstable. The simplest example of nonexistence is the matrix

The way round this is to allow to have blocks. We can compute a block factorization , were is a permutation matrix, is unit lower triangular, and is block diagonal with diagonal blocks of size or . Various pivoting strategies, which determine , are possible, but the recommend one is the symmetric rook pivoting strategy of Ashcraft, Grimes, and Lewis (1998), which has the key property of producing a bounded factor. Solving now reduces to substitutions with and a solve with , which involves solving linear systems for the blocks and doing divisions for the blocks (scalars).

MATLAB implements factorization in its `ldl`

function. Here is an example using Anymatrix:

>> A = anymatrix('core/blockhouse',4), [L,D,P] = ldl(A), eigA = eig(A)' A = -4.0000e-01 -8.0000e-01 -2.0000e-01 4.0000e-01 -8.0000e-01 4.0000e-01 -4.0000e-01 -2.0000e-01 -2.0000e-01 -4.0000e-01 4.0000e-01 -8.0000e-01 4.0000e-01 -2.0000e-01 -8.0000e-01 -4.0000e-01 L = 1.0000e+00 0 0 0 0 1.0000e+00 0 0 5.0000e-01 -8.3267e-17 1.0000e+00 0 -2.2204e-16 -5.0000e-01 0 1.0000e+00 D = -4.0000e-01 -8.0000e-01 0 0 -8.0000e-01 4.0000e-01 0 0 0 0 5.0000e-01 -1.0000e+00 0 0 -1.0000e+00 -5.0000e-01 P = 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 eigA = -1.0000e+00 -1.0000e+00 1.0000e+00 1.0000e+00

Notice the blocks on the diagonal of , each of which contains one negative eigenvalue and one positive eigenvalue. The eigenvalues of are not the same as those of , but since and are congruent they have the same number of positive, zero, and negative eigenvalues.

- Cleve Ashcraft, Roger Grimes, and John Lewis, Accurate Symmetric Indefinite Linear Equation Solvers, SIAM J. Matrix Anal. Appl. 20, 513–561, 1998.
- Nicholas J. Higham and Mantas Mikaitis, Anymatrix: An Extendable MATLAB Matrix Collection, Numer. Algorithms, 90:3, 1175–1196, 2021.

- What Is a Modified Cholesky Factorization? (2020)
- What Is a Symmetric Positive Definite Matrix? (2020)

Toeplitz matrices arise in various problems, including analysis of time series, discretization of constant coefficient differential equations, and discretization of convolution equations .

Since a Toeplitz matrix depends on just parameters it is reasonable to expect that a linear system can be solved in less than the flops that would be required by LU factorization. Indeed methods are available that require only flops; see Golub and Van Loan (2013) for details.

Upper triangular Toeplitz matrices can be written in the form

where is upper bidiagonal with a superdiagonal of ones and . It follows that the product of two upper triangular Toeplitz matrices is again upper triangular Toeplitz, upper triangular Toeplitz matrices commute, and is also an upper triangular Toeplitz matrix (assuming is nonzero, so that is nonsingular).

Tridiagonal Toeplitz matrices arise frequently:

The eigenvalues of are

The Kac–Murdock–Szegö matrix is the symmetric Toeplitz matrix

It has a number of interesting properties.

In MATLAB, a Toeplitz matrix can be constructed using `toeplitz(c,r)`

, which produces the matrix with first column `c`

and first row `r`

. Example:

>> n = 5; A = toeplitz(1:n,[1 -2:-1:-n]) A = 1 -2 -3 -4 -5 2 1 -2 -3 -4 3 2 1 -2 -3 4 3 2 1 -2 5 4 3 2 1

- Gene Golub and Charles F. Van Loan, Matrix Computations, fourth edition, Johns Hopkins University Press, Baltimore, MD, USA, 2013. Section 4.7.

- What Is a Circulant Matrix? (2022)
- What Is a Tridiagonal Matrix? (2022)
- What Is the Kac–Murdock–Szegö Matrix? (2020)

In linear algebra courses we learn that the solution to a linear system of equations in unknowns can be written , where is the matrix inverse. What is not always emphasized is that there are very few circumstances in which one should compute . Indeed one would not solve the scalar () system by computing , but rather would carry out a division . In the case, it is faster and more accurate to solve a linear system by LU factorization (Gaussian elimination) with partial pivoting than by inverting (which has, in any case, to be done by LU factorization).

Rare cases where is required are in statistics, where the diagonal elements of the inverse of the covariance matrix are relevant quantities, and in certain algorithms for computing matrix functions.

The solution to the linear least squares problem , where is a full-rank matrix with , satisfies the normal equations . It is therefore natural to form the symmetric positive definite matrix and solve the normal equations by Cholesky factorization. While fast, this method is numerically unstable when is ill conditioned. By contrast, solving the least squares problem via QR factorization is always numerically stable.

What is wrong with the cross-product matrix (also known as the Gram matrix)? It squares the data, which can cause a loss of information in floating-point arithmetic. For example, if

where is the unit roundoff of the floating point arithmetic, then

is positive definite but, since , in floating-point arithmetic rounds to and so

which is singular, and the information in has been lost.

Another problem with the cross product matrix is that the -norm condition number of is the square of that of , and this leads to numerical instability in algorithms that work with when the condition number is large.

The cost of evaluating a matrix product depends on the order in which the product is evaluated (assuming the matrices are not all ). More precisely, matrix multiplication is associative, so , and in general the cost of the evaluation of a product depends on where one puts the parentheses. One order may be much superior to others, so one should not simply evaluate the product in a fixed left-right or right-left order. For example, if , , and are -vectors then can be evaluated as

- : a vector outer product followed by a matrix–vector product, costing operations, or
- : a vector scalar product followed by a vector scaling, costing just operations.

In general. finding where to put the parentheses in a matrix product in order to minimize the operation count is a difficult problem, but for many cases that arise in practice it is easy to determine a good order.

Symmetric positive definite matrices (symmetric matrices with positive eigenvalues) are ubiquitous, not least because they arise in the solution of many minimization problems. However, a matrix that is supposed to be positive definite may fail to be so for a variety of reasons. Missing or inconsistent data in forming a covariance matrix or a correlation matrix can cause a loss of definiteness, and rounding errors can cause a tiny positive eigenvalue to go negative.

Definiteness implies that

- the diagonal entries are positive,
- ,
- for all ,

but none of these conditions, or even all taken together, guarantees that the matrix has positive eigenvalues.

The best way to check definiteness is to compute a Cholesky factorization, which is often needed anyway. The MATLAB function `chol`

returns an error message if the factorization fails, and a second output argument can be requested, which is set to the number of the stage on which the factorization failed, or to zero if the factorization succeeded. In the case of failure, the partially computed factor is returned in the first argument, and it can be used to compute a direction of negative curvature (as needed in optimization), for example.

This sin takes the top spot in Schmelzer and Hauser’s Seven Sins in Portfolio Optimization, because in portfolio optimization a negative eigenvalue in the covariance matrix can identify a portfolio with negative variance, promising an arbitrarily large investment with no risk!

One of the fundamental tenets of numerical linear algebra is that one should try to exploit any matrix structure that might be present. Sparsity (a matrix having a large number of zeros) is particularly important to exploit, since algorithms intended for dense matrices may be impractical for sparse matrices because of extensive fill-in (zeros becoming nonzero). Here are two examples of structures that can be exploited.

Matrices from saddle point problems are symmetric indefinite and of the form

with symmetric positive definite. Much work has been done on developing numerical methods for solving that exploit the block structure and possible sparsity in and . A second example is a circulant matrix

Circulant matrices have the important property that they are diagonalized by a unitary matrix called the discrete Fourier transform matrix. Using this property one can solve in operations, rather than the operations required if the circulant structure is ignored.

Ideally, linear algebra software would detect structure in a matrix and call an algorithm that exploits that structure. A notable example of such a meta-algorithm is the MATLAB backslash function `x = A\b`

for solving . Backslash checks whether the matrix is triangular (or a permutation of a triangular matrix), upper Hessenberg, symmetric, or symmetric positive definite, and applies an appropriate method. It also allows to be rectangular and solves the least squares problem if there are more rows than columns and the underdetermined system if there are more columns than rows.

An matrix is nonsingular if and only if its determinant is nonzero. One might therefore expect that a small value for indicates a matrix that is nearly singular. However, the size of tells us nothing about near singularity. Indeed, since we can achieve any value for the determinant by multiplying by a scalar , yet is no more or less nearly singular than for .

Another limitation of the determinant is shown by the two matrices

Both matrices have unit diagonal and off-diagonal elements bounded in modulus by . So , yet

So is ill conditioned for large . In fact, if we change the element of to then the matrix becomes singular! By contrast, is always very well conditioned. The determinant cannot distinguish between the ill-conditioned and the well-conditioned .

For any matrix and any consistent matrix norm it is true that for all , where the are the eigenvalue of . Since the eigenvalues of are , it follows that the matrix condition number is bounded below by the ratio of largest to smallest eigenvalue in absolute value, that is,

But as the matrix in (1) shows, this bound can be very weak.

It is singular values *not* eigenvalues that characterize the condition number for the 2-norm. Specifically,

where is a singular value decomposition (SVD), with and orthogonal and , . If is symmetric, for example, then the sets and are the same, but in general the eigenvalues and singular values can be very different.

I first saw Cleve demonstrate the original Fortran version of MATLAB on an IBM PC at the Gatlinburg meeting at the University of Waterloo in 1984. The commercial version of MATLAB was released soon after, and it has been my main programming environment ever since.

MATLAB succeeded for a number of reasons, some of which Dennis Sherwood and I describe in one of the creativity stories in our recent book How to Be Creative: A Practical Guide for the Mathematical Sciences. But there is one reason that is rarely mentioned.

From the beginning, MATLAB supported complex arithmetic—indeed, the basic data type has always been a complex matrix. The original 1980 *MATLAB Users’ Guide* says

MATLAB works with essentially only one kind of object, a rectangular matrix with complex elements. If the imaginary parts of the elements are all zero, they are not printed, but they still occupy storage.

By contrast, early competing packages usually supported only real arithmetic (see my 1989 SIAM News article Matrix Computations on a PC for a comparison of PC-MATLAB and GAUSS). Cleve understood the fundamental need to compute in the complex plane in real life problems, as opposed to textbook examples, and he appreciated how tedious it is to program with real and imaginary parts stored in separate arrays. The storing of zero imaginary parts of real numbers was a small price to pay for the convenience. Of course, the commercial version of MATLAB was optimized not to store the imaginary part of reals. Control engineers—a group who were early adopters of MATLAB—appreciated the MATLAB approach, because the stability of control systems depends on eigenvalues, which are in general complex.

Another wise choice was that MATLAB allows the imaginary unit to be written as `i`

or `j`

, thus keeping mathematicians and electrical engineers happy!

Here is Cleve demonstrating MATLAB in October 2000:

]]>

Circulant matrices have the important property that they are diagonalized by the discrete Fourier transform matrix

which satisfies , so that is a unitary matrix. ( is a Vandermonde matrix with points the roots of unity.) Specifically,

Hence circulant matrices are normal (). Moreover, the eigenvalues are given by , where is the first unit vector.

Note that one particular eigenpair is immediate, since , where is the vector of ones.

The factorization (1) enables a circulant linear system to be solved in flops, since multiplication by can be done using the fast Fourier transform.

A particular circulant matrix is the (up) shift matrix , the version of which is

It is not hard to see that

Since powers of commute, it follows that any two circulant matrices commute (this is also clear from (1)). Furthermore, the sum and product of two circulant matrices is a circulant matrix and the inverse of a nonsingular circulant matrix is a circulant matrix.

One important use of circulant matrices is to act as preconditioners for Toeplitz linear systems. Several methods have been proposed for constructing a circulant matrix from a Toeplitz matrix, including copying the central diagonals and wrapping them around and finding the nearest circulant matrix to the Toeplitz matrix. See Chan and Ng (1996) or Chan and Jin (2017) for a summary of work on circulant preconditioners for Toeplitz systems.

An interesting circulant matrix is `anymatrix('core/circul_binom',n)`

in the Anymatrix collection, which is the circulant matrix whose first row has th element . The eigenvalues are , , where . The matrix is singular when is a multiple of 6, in which case the null space has dimension 2. Example:

>> n = 6; C = anymatrix('core/circul_binom',n), svd(C) C = 1 6 15 20 15 6 6 1 6 15 20 15 15 6 1 6 15 20 20 15 6 1 6 15 15 20 15 6 1 6 6 15 20 15 6 1 ans = 6.3000e+01 2.8000e+01 2.8000e+01 1.0000e+00 2.0244e-15 7.6607e-16

A classic reference on circulant matrices is Davis (1994).

- Raymond Chan and Michael Ng, Conjugate Gradient Methods for Toeplitz Systems, SIAM Rev. 38(3), 427–482, 1996.
- Raymond Chan and Xiao-Qing Jin, An Introduction to Iterative Toeplitz Solvers, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2007.
- Philip Davis, Circulant Matrices, Second edition, Chelsea, New York, 1994.
- Nicholas J. Higham and Mantas Mikaitis, Anymatrix: An Extendable MATLAB Matrix Collection, Numer. Algorithms, 90:3, 1175-1196, 2021.

Most of the talks are available on the NLA Group YouTube channel and links to them are available on the conference web page.

Here is the conference photo.

Here is me with some of my current and former PhD students.

And with some of my current and former postdocs.

After-dinner speaker Charlie Van Loan:

Rob Corless kindly gave me a Bohemian matrix eigenvalue tie based on this image.

Many thanks to Stefan Güttel, Sven Hammarling, Stephanie Lai, Françoise Tisseur and Marcus Webb for organizing the conference and the Royal Society, MathWorks and the University of Manchester for financial support.

]]>In 1969 Volker Strassen showed that when the product can be computed from the formulas

The evaluation requires multiplications and additions instead of multiplications and additions for the usual formulas.

At first sight, Strassen’s formulas may appear simply to be a curiosity. However, the formulas do not rely on commutativity so are valid when the and are matrices, in which case for large dimensions the saving of one multiplication greatly outweighs the extra additions. Assuming is a power of , we can partition and into four blocks of size , apply Strassen’s formulas for the multiplication, and then apply the same formulas recursively on the half-sized matrix products.

Let us examine the number of multiplications for the recursive Strassen algorithm. Denote by the number of scalar multiplications required to multiply two matrices. We have , so

But . The number of additions can be shown to be of the same order of magnitude, so the algorithm requires operations.

Strassen’s work sparked interest in finding matrix multiplication algorithms of even lower complexity. Since there are elements of data, which must each participate in at least one operation, the exponent of in the operation count must be at least .

The current record upper bound on the exponent is , proved by Alman and Vassilevska Williams (2021) which improved on the previous record of , proved by Le Gall (2014) The following figure plots the best upper bound for the exponent for matrix multiplication over time.

In the methods that achieve exponents lower than 2.775, various intricate techniques are used, based on representing matrix multiplication in terms of bilinear or trilinear forms and their representation as tensors having low rank. Laderman, Pan, and Sha (1993) explain that for these methods “very large overhead constants are hidden in the `‘ notation”, and that the methods “improve on Strassen’s (and even the classical) algorithm only for immense numbers .”

Strassen’s method, when carefully implemented, can be faster than conventional matrix multiplication for reasonable dimensions. In practice, one does not recur down to matrices, but rather uses conventional multiplication once matrices are reached, where the parameter is tuned for the best performance.

Strassen’s method has the drawback that it satisfies a weaker form of rounding error bound that conventional multiplication. For conventional multiplication of and we have the componentwise bound

where and is the unit roundoff. For Strassen’s method we have only a normwise error bound. The following result uses the norm , which is not a consistent norm.

Theorem 1 (Brent).Let , where . Suppose that is computed by Strassen’s method and that is the threshold at which conventional multiplication is used. The computed product satisfies

With full recursion () the constant in (2) is , whereas with just one level of recursion () it is . These compare with for conventional multiplication (obtained by taking norms in (1)). So the constant for Strassen’s method grows at a faster rate than that for conventional multiplication no matter what the choice of .

The fact that Strassen’s method does not satisfy a componentwise error bound is a significant weakness of the method. Indeed Strassen’s method cannot even accurately multiply by the identity matrix. The product

is evaluated exactly in floating-point arithmetic by conventional multiplication, but Strassen’s method computes

Because involves subterms of order unity, the error will be of order . Thus the relative error ,

Another weakness of Strassen’s method is that while the scaling , where is diagonal, leaves (1) unchanged, it can alter (2) by an arbitrary amount. Dumitrescu (1998) suggests computing , where the diagonal matrices and are chosen to equilibrate the rows of and the columns of in the -norm; he shows that this scaling can improve the accuracy of the result. Further investigations along these lines are made by Ballard et al. (2016).

Should one use Strassen’s method in practice, assuming that an implementation is available that is faster than conventional multiplication? Not if one needs a componentwise error bound, which ensures accurate products of matrices with nonnegative entries and ensures that the column scaling of and row scaling of has no effect on the error. But if a normwise error bound with a faster growing constant than for conventional multiplication is acceptable then the method is worth considering.

For recent work on high-performance implementation of Strassen’s method see Huang et al. (2016, 2020).

Theorem 1 is from an unpublished technical report of Brent (1970). A proof can be found in Higham (2002, §23.2.2).

For more on fast matrix multiplication see Bini (2014) and Higham (2002, Chapter 23).

This is a minimal set of references, which contain further useful references within.

- Josh Alman and Virginia Vassilevska Williams. A refined laser method and faster matrix multiplication. In
*Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA)*, Society for Industrial and Applied Mathematics, January 2021, pages 522–539. - Grey Ballard, Austin R. Benson, Alex Druinsky, Benjamin Lipshitz, and Oded Schwartz. Improving the numerical stability of fast matrix multiplication.
*SIAM J. Matrix Anal. Appl*, 37(4):1382–1418, 2016. - Benson, Alex Druinsky, Benjamin Lipshitz, and Oded Schwartz. Improving the numerical stability of fast matrix multiplication.
*SIAM J. Matrix Anal. Appl.*, 37(4):1382–1418, 2016. - Dario A. Bini. Fast matrix multiplication. In
*Handbook of Linear Algebra*, Leslie Hogben, editor, second edition, Chapman and Hall/CRC, Boca Raton, FL, USA, 2014, pages 61.1–61.17. - Bogdan Dumitrescu. Improving and estimating the accuracy of Strassen’s algorithm.
*Numer. Math.*, 79:485–499, 1998. - Nicholas J. Higham, Accuracy and Stability of Numerical Algorithms, second edition, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2002.
- Jianyu Huang, Tyler M. Smith, Greg M. Henry, and Robert A. van de Geijn. Strassen’s algorithm reloaded. In
*SC16: International Conference for High Performance Computing, Networking, Storage and Analysis*, IEEE, November 2016. - Jianyu Huang, Chenhan D. Yu, and Robert A. van de Geijn. Strassen’s algorithm reloaded on GPUs,
*ACM Trans. Math. Software*, 46(1):1:1–1:22, 2020. - Julian Laderman, Victor Pan, and Xuan-He Sha. On practical algorithms for accelerated matrix multiplication.
, 162–164:557–588, 1992.*Linear Algebra Appl.* - François Le Gall. Powers of tensors and fast matrix multiplication. In
*Proceedings of the 39th International Symposium on Symbolic and Algebraic Computation*, 2014, pages 296–303.