Let be orthogonal and suppose that is even and is partitioned into four equally sized blocks:

Then there exist orthogonal matrices such that

where and with , , and for all . This CS decomposition comprises four SVDs:

(Strictly speaking, for we need to move the minus sign from to or to obtain an SVD.) The orthogonality ensures that there are only four different singular vector matrices instead of eight, and it makes the singular values of the blocks closely linked. We also obtain SVDs of four cross products of the blocks: , etc.

Note that for , the CS decomposition reduces to the fact that any orthogonal matrix is of the form (a rotation ) up to multiplication of a row or column by .

A consequence of the decomposition is that and have the same 2-norms and Frobenius norms, as do their inverses if they are nonsingular. The same is true for and .

Now we drop the requirement that is even and consider diagonal blocks of different sizes:

The CS decomposition now has the form

with , , , and , and and (both now ), having the same properties as before. The new feature for is the identity matrix in the bottom right-hand corner on the right-hand side. Here is an example with and , with elements shown to two decimal places:

We mention two interesting consequences of the CS decomposition.

- With : if then is singular.
- For unequally sized diagonal blocks it is no longer always true that and have the same norms, but their inverses do: . When , this relation becomes .

The CS decomposition also exists for a rectangular matrix with orthonormal columns,

Now the decomposition takes the form

where , , and are orthogonal and and have the same form as before except that they are rectangular.

The most general form of the CS decomposition is for an orthogonal matrix with diagonal blocks that are not square. Now the matrix on the right-hand side has a more complicated block structure (eee the references for details).

The CS decomposition arises in measuring angles and distances between subspaces. These are defined in terms of the orthogonal projectors onto the subspaces, so singular values of orthonormal matrices naturally arise.

Software for computing the CS decomposition is available in LAPACK, based on an algorithm of Sutton (2009). We used a MATLAB interface to it, available on MathWorks File Exchange, for the numerical example. Note that the output of this code is not quite in the form in which we have presented the decomposition, so some post-processing is required to achieve it.

This is a minimal set of references, which contain further useful references within.

- Gene Golub and Charles F. Van Loan, Matrix Computations, fourth edition, Johns Hopkins University Press, Baltimore, MD, USA, 2013.
- C. C. Paige and M. Wei, History and Generality of the CS Decomposition, Linear Algebra Appl. 208/209, 303–326, 1994.
- Brian Sutton, Computing the Complete CS Decomposition, Numer. Algorithms 50(1), 33–65, 2009.

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

The `exportgraphics`

function is very useful for saving to a file a tightly cropped version of a figure with the border white instead of gray. Simple usages are

exportgraphics(gca,'image.pdf') exportgraphics(gca,'image.jpg','Resolution',200)

I have previously used the `export_fig`

function, which is not built into MATLAB but is available from File Exchange; I think I will be using `exportgraphics`

instead from now on.

The new `svdsketch`

function computes the singular value decomposition (SVD) of a low rank approximation to a matrix ( and orthogonal, diagonal with nonnegative diagonal entries). It is mainly intended for use with matrices that are close to having low rank, as is the case in various applications.

This function uses a randomized algorithm that computes a sketch of the given -by- matrix , which is essentially a product , where is an orthonormal basis for the product , where is a random -by- matrix. The value of is chosen automatically to achieve , where is a tolerance that defaults to and must not be less than , where is the machine epsilon ( for double precision). The algorithm includes a power method iteration that refines the sketch before computing the SVD.

The output of the function is an SVD in which and are numerically orthogonal and the singular values in of size or larger are good approximations to singular values of , but smaller singular values in may not be good approximations to singular values of .

Here is an example. The code

n = 8; rng(1); 8; A = gallery('randsvd',n,1e8,3); [U,S,V] = svdsketch(A,1e-3); rel_res = norm(A-U*S*V')/norm(A) singular_values = [svd(A) [diag(S); zeros(n-length(S),1)]]

produces the following output, with the exact singular values in the first column and the approximate ones in the second column:

rel_res = 1.9308e-06 singular_values = 1.0000e+00 1.0000e+00 7.1969e-02 7.1969e-02 5.1795e-03 5.1795e-03 3.7276e-04 3.7276e-04 2.6827e-05 2.6827e-05 1.9307e-06 0 1.3895e-07 0 1.0000e-08 0

The approximate singular values are correct down to around , which is more than the requested. This is a difficult matrix for `svdsketch`

because there is no clear gap in the singular values of .

The padding property of an axis puts some padding between the axis limits and the surrounding box. The code

x = linspace(0,2*pi,50); plot(x,tan(x),'linewidth',1.4) title('Original axis') axis padded, title('Padded axis')

produces the output

The default colormap changed from jet (the rainbow color map) to parula in R2014b (with a tweak in R2017a), because parula is more perceptually uniform and maintains information when printed in monochrome. The new turbo colormap is a more perceptually uniform version of jet, as these examples show. Notice that turbo has a longer transition through the greens and yellows. If you can’t give up on jet, use turbo instead.

Turbo:

Jet:

Parula:

The new `pagemtimes`

function performs matrix multiplication on pages of -dimensional arrays, while `pagetranspose`

and `pagectranspose`

carry out the transpose and conjugate transpose, respectively, on pages of -dimensional arrays.

Both releases report significantly improved speed of certain functions, including some of the ODE solvers.

where and are orthogonal, , where , and .

Partition and . The are called the *singular values* of and the and are the left and right *singular vectors*. We have , . The matrix is unique but and are not. The form of is

Here is an example, in which the entries of have been specially chosen to give simple forms for the elements of the factors:

The power of the SVD is that it reveals a great deal of useful information about norms, rank, and subspaces of a matrix and it enables many problems to be reduced to a trivial form.

Since and are nonsingular, , where is the number of nonzero singular values. Since the -norm and Frobenius norm are invariant under orthogonal transformations, for both norms, giving

and hence . The range space and null space of are given in terms of the columns of and by

We can write the SVD as

which expresses as a sum of rank- matrices, the th of which has -norm . The famous Eckart–Young theorem (1936) says that

and that the minimum is attained at

In other words, truncating the sum after terms gives the best rank- approximation to in both the -norm and the Frobenius norm. In particular, this result implies that when has full rank the distance from to the nearest rank-deficient matrix is .

The SVD is not directly related to the eigenvalues and eigenvectors of . However, for , implies

so the singular values of are the square roots of the eigenvalues of the symmetric positive semidefinite matrices and (modulo zeros in the latter case), and the singular vectors are eigenvectors. Moreover, the eigenvalues of the matrix

are plus and minus the singular values of , together with additional zeros if , and the eigenvectors of and the singular vectors of are also related.

Consequently, by applying results or algorithms for the eigensystem of a symmetric matrix to , , or one obtains results or algorithms for the singular value decomposition of .

The pseudoinverse of a matrix can be expressed in terms of the SVD as

The least squares problem , where with is solved by , and when is rank-deficient this is the solution of minimum -norm. For this is an underdetermined system and gives the minimum 2-norm solution.

We can write , where is orthogonal and is symmetric positive semidefinite. This decomposition is the polar decomposition and is unique. This connection between the SVD and the polar decomposition is useful both theoretically and computationally.

The SVD is used in a very wide variety of applications—too many and varied to attempt to summarize here. We just mention two.

The SVD can be used to help identify to which letters vowels and consonants have been mapped in a substitution cipher (Moler and Morrison, 1983).

An inverse use of the SVD is to construct test matrices by forming a diagonal matrix of singular values from some distribution then pre- and post-multiplying by random orthogonal matrices. The result is matrices with known singular values and 2-norm condition number that are nevertheless random. Such “randsvd” matrices are widely used to test algorithms in numerical linear algebra.

The SVD was introduced independently by Beltrami in 1873 and Jordan in 1874. Golub popularized the SVD as an essential computational tool and developed the first reliable algorithms for computing it. The Golub–Reinsch algorithm, dating from the late 1960s and based on bidiagonalization and the QR algorithm, is the standard way to compute the SVD. Various alternatives are available; see the references.

This is a minimal set of references, which contain further useful references within.

- Jack Dongarra and Mark Gates and Azzam Haidar and Jakub Kurzak and Piotr Luszczek and Stanimire Tomov and Ichitaro Yamazaki, The Singular Value Decomposition: Anatomy of Optimizing an Algorithm for Extreme Scale, SIAM Rev. 60(4), 808–865, 2018.
- Gene Golub and Charles F. Van Loan, Matrix Computations, fourth edition, Johns Hopkins University Press, Baltimore, MD, USA, 2013.
- Roger Horn and Charles Johnson, Topics in Matrix Analysis, Cambridge University Press, 1991. Chapter 3.
- Cleve B. Moler and Donald Morrison, Singular Value Analysis of Cryptograms, Amer. Math. Monthly 90, 78–87, 1983.
- Yuji Nakatsukasa and Nicholas J. Higham, Stable and Efficient Spectral Divide and Conquer Algorithms for the Symmetric Eigenvalue Decomposition and the SVD, SIAM J. Sci. Comput. 35(3), A1325–A1349, 2013.

- Faster SVD via Polar Decomposition (2015)
- Professor SVD by Cleve Moler (2006)
- What Is a Random Orthogonal Matrix? (2020)
- What Is the Polar Decomposition? (2020)

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

For an analytic function we have the Taylor expansion

where is the imaginary unit. Assume that maps the real line to the real line and that and are real. Then equating real and imaginary parts in gives and . This means that for small , the approximations

both have error . So a single evaluation of at a complex argument gives, for small , a good approximation to , as well as a good approximation to if we need it.

The usual way to approximate derivatives is with finite differences, for example by the forward difference approximation

This approximation has error so it is less accurate than the complex step approximation for a given , but more importantly it is prone to numerical cancellation. For small , and agree to many significant digits and so in floating-point arithmetic the difference approximation suffers a loss of significant digits. Consequently, as decreases the error in the computed approximation eventually starts to increase. As numerical analysis textbooks explain, the optimal choice of that balances truncation error and rounding errors is approximately

where is the unit roundoff. The optimal error is therefore of order .

A simple example illustrate these ideas. For the function with , we plot in the figure below the relative error for the finite difference, in blue, and the relative error for the complex step approximation, in orange, for ranging from about to . The dotted lines show and . The computations are in double precision (). The finite difference error decreases with until it reaches about ; thereafter the error grows, giving the characteristic V-shaped error curve. The complex step error decreases steadily until it is of order for , and for each it is about the square of the finite difference error, as expected from the theory.

Remarkably, one can take extremely small in the complex step approximation (e.g., ) without any ill effects from roundoff.

The complex step approximation carries out a form of approximate automatic differentiation, with the variable functioning like a symbolic variable that propagates through the computations in the imaginary parts.

The complex step approximation applies to gradient vectors and it can be extended to matrix functions. If is analytic and maps real matrices to real matrices and and are real then (Al-Mohy and Higham, 2010)

where is the Fréchet derivative of at in the direction . It is important to note that the method used to evaluate must not itself use complex arithmetic (as methods based on the Schur decomposition do); if it does, then the interaction of those complex terms with the much smaller term can lead to damaging subtractive cancellation.

The complex step approximation has also been extended to higher derivatives by using “different imaginary units” in different components (Lantoine et al., 2012).

Here are some applications where the complex step approximation has been used.

- Sensitivity analysis in engineering applications (Giles et al., 2003).
- Approximating gradients in deep learning (Goodfellow et al., 2016).
- Approximating the exponential of an operator in option pricing (Ackerer and Filipović, 2019).

Software has been developed for automatically carrying out the complex step method—for example, by Shampine (2007).

The complex step approximation has been rediscovered many times. The earliest published appearance that we are aware of is in a paper by Squire and Trapp (1998), who acknowledge earlier work of Lyness and Moler on the use of complex variables to approximate derivatives.

This is a minimal set of references, which contain further useful references within.

- Awad H. Al-Mohy and Nicholas J. Higham, The Complex Step Approximation to the Fréchet Derivative of a Matrix Function, Numer. Algorithms 53, 133–148, 2010.
- Damien Ackerer and Damir Filipović, Option Pricing with Orthogonal Polynomial Expansions, Mathematical Finance 30, 47–84, 2019.
- Michael B. Giles, Mihai C. Duta, Jens-Dominik Möuller, and Niles A. Pierce, Algorithm Developments for Discrete Adjoint Methods, AIAA Journal 4(2), 198–205, 2003.
- Ian Goodfellow, Yoshua Bengio, and Aaron Courville, Deep Learning, MIT Press, 2016. Page 434.
- Gregory Lantoine, Ryan P. Russell P., and Thierry Dargent, Using Multicomplex Variables for Automatic Computation of High-Order Derivatives, ACM Trans. Math. Software 38, 16:1–16:21, 2012.
- L. F. Shampine, Accurate Numerical Derivatives in MATLAB, ACM Trans. Math. Software 33,26:1–26:17, 2007.
- W. Squire and G. E. Trapp (1998), Using Complex Variables to Estimate Derivatives of Real Functions, SIAM Rev., 40(1), 110–112.

- Complex Step Differentiation by Cleve Moler (2013)
- Differentiation With(out) a Difference (2018)
- What Is a Fréchet Derivative? (2020)

The Sherman–Morrison–Woodbury formula provides an explicit formula for the inverse of the perturbed matrix .

We will begin with the simpler case of a rank- perturbation: , where and are -vectors, and we consider first the case where . We might expect that for some (consider a binomial expansion of the inverse). Multiplying out, we obtain

so the product equals the identity matrix when . The condition that be nonsingular is (as can also be seen from , derived in What Is a Block Matrix?). So

For the general case write . Inverting this equation and applying the previous result gives

subject to the nonsingularity condition . This is known as the Sherman–Morrison formula. It explicitly identifies the rank- change to the inverse.

As an example, if we take and (where is the th column of the identity matrix) then, writing , we have

The Frobenius norm of the change to is

If is sufficiently small then this quantity is approximately maximized for and such that the product of the norms of th column and th row of is maximized. For an upper triangular matrix and are likely to give the maximum, which means that the inverse of an upper triangular matrix is likely to be most sensitive to perturbations in the element of the matrix. To illustrate, we consider the matrix

The element of the following matrix is :

As our analysis suggests, the entry is the most sensitive to perturbation.

Now consider a perturbation , where and are . This perturbation has rank at most , and its rank is if and are both of rank . If is nonsingular then is nonsingular and

which is the Sherman–Morrison–Woodbury formula. The significance of this formula is that is , so if and is known then it is much cheaper to evaluate the right-hand side than to invert directly. In practice, of course, we rarely invert matrices, but rather exploit factorizations of them. If we have an LU factorization of then we can use it in conjunction with the Sherman–Morrison–Woodbury formula to solve in flops, as opposed to the flops required to factorize from scratch.

The Sherman–Morrison–Woodbury formula is straightforward to verify, by showing that the product of the two sides is the identity matrix. How can the formula be derived in the first place? Consider any two matrices and such that and are both defined. The associative law for matrix multiplication gives , or , which can be written as . Postmultiplying by gives

Setting and gives the special case of the Sherman–Morrison–Woodbury formula with , and the general formula follows from .

We will give a different derivation of an even more general formula using block matrices. Consider the block matrix

where is , and are , and is . We will obtain a formula for by looking at .

It is straightforward to verify that

Hence

In the block we see the right-hand side of a Sherman–Morrison–Woodbury-like formula, but it is not immediately clear how this relates to . Let , and note that . Then

and applying the above formula (appropriately renaming the blocks) gives, with denoting a block whose value does not matter,

Hence . Equating our two formulas for gives

provided that is nonsingular.

To see one reason why this formula is useful, suppose that the matrix and its perturbation are symmetric and we wish to preserve symmetry in our formulas. The Sherman–Morrison–Woodbury requires us to write the perturbation as , so the perturbation must be positive semidefinite. In , however, we can write an arbitrary symmetric perturbation as , with symmetric but possibly indefinite, and obtain a symmetric formula.

The matrix is the Schur complement of in . Consequently the inversion formula is intimately connected with the theory of Schur complements. By manipulating the block matrices in different ways it is possible to derive variations of . We mention just the simple rewriting

which is valid if is singular, as long as is nonsingular. Note that the formula is not symmetric when and . This variant can also be obtained by replacing by in the Sherman–Morrison–Woodbury formula.

Formulas for the change in a matrix inverse under low rank perturbations have a long history. They have been rediscovered on multiple occasions, sometimes appearing without comment within other formulas. Equation is given by Duncan (1944), which is the earliest appearance in print that I am aware of. For discussions of the history of these formulas see Henderson and Searle (1981) or Puntanen and Styan (2005).

This is a minimal set of references, which contain further useful references within.

- W. J. Duncan, LXXVIII. Some Devices for the Solution of Large Sets of Simultaneous Linear Equations, The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 35, 660–670, 1944.
- H. V. Henderson and S. R. Searle, On Deriving the Inverse of a Sum of Matrices, SIAM Rev. 23(1), 53–60, 1981.
- Simo Puntanen and George Styan, Historical Introduction: Issai Schur and the Early Development of the Schur Complement, pages 1-16 in Fuzhen Zhang, ed., The Schur Complement and Its Applications, Springer-Verlag, New York, 2005.

- What Is a Block Matrix? (2020)

A block matrix is defined in terms of a partitioning, which breaks a matrix into contiguous pieces. The most common and important case is for an matrix to be partitioned as a block matrix (two block rows and two block columns). For , partitioning into blocks gives

where

and similarly for the other blocks. The diagonal blocks in a partitioning of a square matrix are usually square (but not necessarily so), and they do not have to be of the same dimensions. This same matrix could be partitioned as

where is a scalar, is a column vector, and is a row vector.

The sum of two block matrices and of the same dimension is obtained by adding blockwise as long as and have the same dimensions for all and , and the result has the same block structure: ,

The product of an matrix and an matrix can be computed as as long as the products are all defined. In this case the matrices and are said to be conformably partitioned for multiplication. Here, has as many block rows as and as many block columns as . For example,

as long as all the eight products are defined.

Block matrix notation is an essential tool in numerical linear algebra. Here are some examples of its usage.

For an matrix with nonzero element we can write

The first row and column of have the correct form for a unit lower triangular matrix and likewise the first row and column of have the correct form for an upper triangular matrix. If we can find an LU factorization of the Schur complement then is an LU factorization of . This construction is the basis of an inductive proof of the existence of an LU factorization (provided all the pivots are nonzero) and it also yields an algorithm for computing it.

The same type of construction applies to other factorizations, such as Cholesky factorization, QR factorization, and the Schur decomposition.

A useful formula for the inverse of a nonsingular block triangular matrix

is

which has the special case

If is upper triangular then so are and . By taking of dimension the nearest integer to this formula can be used to construct a divide and conquer algorithm for computing .

We note that , a fact that will be used in the next section.

Block matrices provides elegant proofs of many results involving determinants. For example, consider the equations

which hold for any and such that and are defined. Taking determinants gives the formula . In particular we can take , , for -vectors and , giving .

We can sometimes build a matrix with certain desired properties by a block construction. For example, if is an involutory matrix () then

is a (block triangular) involutory matrix. And if and are any two matrices then

is involutory.

For matrices and consider the anti block diagonal matrix

Note that

Using these properties one can show a relation between the matrix sign function and the principal matrix square root:

This allows one to derive iterations for computing the matrix square root and its inverse from iterations for computing the matrix sign function.

It is easy to derive explicit formulas for all the powers of , and hence for any power series evaluated at . In particular, we have the formula

where denotes any square root of . With , this formula arises in the solution of the ordinary differential equation initial value problem , , ,

The most well known instance of the trick is when . The eigenvalues of

are plus and minus the singular values of , together with additional zeros if is with , and the eigenvectors of and the singular vectors of are also related. Consequently, by applying results or algorithms for symmetric matrices to one obtains results or algorithms for the singular value decomposition of .

This is a minimal set of references, which contain further useful references within.

- Gene Golub and Charles F. Van Loan, Matrix Computations, fourth edition, Johns Hopkins University Press, Baltimore, MD, USA, 2013.
- Nicholas J. Higham, Functions of Matrices: Theory and Computation, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2008. (Sections 1.5 and 1.6 for the theory of matrix square roots.)
- Roger A. Horn and Charles R. Johnson, Matrix Analysis, second edition, Cambridge University Press, 2013. My review of the second edition.

It is easily verified that is

- orthogonal (),
- symmetric (),
- involutory ( that is, is a square root of the identity matrix),

where the last property follows from the first two.

A Householder matrix is a rank- perturbation of the identity matrix and so all but one of its eigenvalues are . The eigensystem can be fully described as follows.

- has an eigenvalue with eigenvector , since .
- has eigenvalues with eigenvectors any set of linearly independent vectors orthogonal to , which can be taken to be mutually orthogonal: for every such .

has trace and determinant , as can be derived directly or deduced from the facts that the trace is the sum of the eigenvalues and the determinant is the product of the eigenvalues.

For , a Householder matrix can be written as

Simple examples of Householder matrices are obtained by choosing , for which . For we obtain the matrices

Note that the matrix is times a Hadamard matrix.

Applying to a vector gives

This equation shows that reflects about the hyperplane , as illustrated in the following diagram, which explains why is sometimes called a Householder reflector. Another way of expressing this property is to write , where is orthogonal to . Then , so the component of in the direction has been reversed. If we take , the th unit vector, then , which has in the position. In this case premultiplying a vector by flips the sign of the th component.

Householder matrices are powerful tools for introducing zeros into vectors. Suppose we are given vectors and and wish to find a Householder matrix such that . Since is orthogonal, we require that , and since can never equal the identity matrix we also require . Now

and this last equation has the form for some . But is independent of the scaling of , so we can set . Now with we have

and, since ,

Therefore

as required. Most often we choose to be zero in all but its first component.

What can we say about square roots of a Householder matrix, that is, matrices such that ?

We note first that the eigenvalues of are the square roots of those of and so of them will be and one will be . This means that cannot be real, as the nonreal eigenvalues of a real matrix must appear in complex conjugate pairs.

Write , where is normalized so that . It is natural to look for a square root of the form . Setting leads to the quadratic equation , and hence . As expected, these two square roots are complex even though is real. As an example, gives the following square root of the matrix above corresponding to with :

A good way to understand all the square roots is to diagonalize , which can be done by a similarity transformation with a Householder matrix! Normalizing again, let and . Then from the construction above we know that . Hence

Then and so gives square roots on taking all possible combinations of signs on the diagonal for . Because has repeated eigenvalues these are not the only square roots. The infinitely many others are obtained by taking non-diagonal square roots of , which are of the form , where is any non-diagonal square root of the identity matrix, which in particular could be a Householder matrix!

It is possible to define an block Householder matrix in terms of a given , where , as

Here, “” denotes the Moore–Penrose pseudoinverse. For , clearly reduces to a standard Householder matrix. It can be shown that (this is most easily proved using the SVD), and so

where is the orthogonal projector onto the range of (that is, , , and ). Hence, like a standard Householder matrix, is symmetric, orthogonal, and involutory. Furthermore, premultiplication of a matrix by has the effect of reversing the component in the range of .

As an example, here is the block Householder matrix corresponding to :

One can show (using the SVD again) that the eigenvalues of are repeated times and repeated times, where . Hence and .

Schreiber and Parlett (1988) note the representation for ,

where and are orthogonal and is symmetric positive definite. This formula neatly generalizes the formula for a standard Householder matrix for given above, and a similar formula holds for odd .

Schreiber and Parlett also show how given () one can construct a block Householder matrix such that

The polar decomposition plays a key role in the theory and algorithms for such .

We can define a rectangular Householder matrix as follows. Let , , , and

Then , that is, has orthonormal columns, if

Of course, is just the first columns of the Householder matrix built from the vector .

The earliest appearance of Householder matrices is in the book by Turnbull and Aitken (1932). These authors show that if () then a unitary matrix of the form (in their notation) can be constructed so that . They use this result to prove the existence of the Schur decomposition. The first systematic use of Householder matrices for computational purposes was by Householder (1958) who used them to construct the QR factorization.

This is a minimal set of references, which contain further useful references within.

- Massimiliano Fasi and Nicholas J. Higham, Generating Extreme-Scale Matrices with Specified Singular Values or Condition Numbers, MIMS EPrint 2020.8, Manchester Institute for Mathematical Sciences, The University of Manchester, UK, March 2020. (For the use of rectangular Householder matrices.)
- Nicholas J. Higham, Accuracy and Stability of Numerical Algorithms, second edition, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2002. (Chapter 19.)
- Nicholas J. Higham, Functions of Matrices: Theory and Computation, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2008. (Sections 1.5 and 1.6 for the theory of matrix square roots.)
- Robert S. Schreiber and Beresford N. Parlett, Block Reflectors: Theory and Computation, SIAM J. Numer. Anal. 25(1), 189–205, 1988.

- What Is a Generalized Inverse? (2020)
- What Is a Hadamard Matrix? (2020)
- What Is an Orthogonal Matrix? (2020)
- What Is the Polar Decomposition? (2020)

Sparsity is not to be confused with data sparsity, which refers to the situation where, because of redundancy, the data can be efficiently compressed while controlling the loss of information. Data sparsity typically manifests itself in low rank structure, whereas sparsity is solely a property of the pattern of nonzeros.

Important sources of sparse matrices include discretization of partial differential equations, image processing, optimization problems, and networks and graphs. In designing algorithms for sparse matrices we have several aims.

- Store the nonzeros only, in some suitable data structure.
- Avoid operations involving only zeros.
- Preserve sparsity, that is, minimize
*fill-in*(a zero element becoming nonzero).

We wish to achieve these aims without sacrificing speed, stability, or reliability.

An important class of sparse matrices is *banded matrices*. A matrix has bandwidth if the elements outside the main diagonal and the first superdiagonals and subdiagonals are zero, that is, if for and .

The most common type of banded matrix is a tridiagonal matrix ), of which an archetypal example is the second-difference matrix, illustrated for by

This matrix (or more precisely its negative) corresponds to a centered finite difference approximation to a second derivative: .

The following plots show the sparsity patterns for two symmetric positive definite matrices. Here, the nonzero elements are indicated by dots.

The matrices are both from power network problems and they are taken from the SuiteSparse Matrix Collection (`https://sparse.tamu.edu/`

). The matrix names are shown in the titles and the `nz`

values below the -axes are the numbers of nonzeros. The plots were produced using MATLAB code of the form

W = ssget('HB/494_bus'); A = W.A; spy(A)

where the `ssget`

function is provided with the collection. The matrix on the left shows no particular pattern for the nonzero entries, while that on the right has a structure comprising four diagonal blocks with a relatively small number of elements connecting the blocks.

It is important to realize that while the sparsity pattern often reflects the structure of the underlying problem, it is arbitrary in that it will change under row and column reorderings. If we are interested in solving , for example, then for any permutation matrices and we can form the transformed system , which has a coefficient matrix having permuted rows and columns, a permuted right-hand side , and a permuted solution. We usually wish to choose the permutations to minimize the fill-in or (almost equivalently) the number of nonzeros in and . Various methods have been derived for this task; they are necessarily heuristic because finding the minimum is in general an NP-complete problem. When is symmetric we take in order to preserve symmetry.

For the `HB/494_bus`

matrix the symmetric reverse Cuthill-McKee permutation gives a reordered matrix with the following sparsity pattern, plotted with the MATLAB commands

r = symrcm(A); spy(A(r,r))

The reordered matrix with a variable band structure that is characteristic of the symmetric reverse Cuthill-McKee permutation. The number of nonzeros is, of course, unchanged by reordering, so what has been gained? The next plots show the Cholesky factors of the `HB/494_bus`

matrix and the reordered matrix. The Cholesky factor for the reordered matrix has a much narrower bandwidth than that for the original matrix and has fewer nonzeros by a factor 3. Reordering has greatly reduced the amount of fill-in that occurs; it leads to a Cholesky factor that is cheaper to compute and requires less storage.

Because Cholesky factorization is numerically stable, the matrix can be permuted without affecting the numerical stability of the computation. For a nonsymmetric problem the choice of row and column interchanges also needs to take into account the need for numerical stability, which complicates matters.

The world of sparse matrix computations is very different from that for dense matrices. In the first place, sparse matrices are not stored as arrays, but rather just the nonzeros are stored, in some suitable data structure. Programming sparse matrix computations is, consequently, more difficult than for dense matrix computations. A second difference from the dense case is that certain operations are, for practical purposes, forbidden, Most notably, we never invert sparse matrices because of the possibly severe fill-in. Indeed the inverse of a sparse matrix is usually dense. For example, the inverse of the tridiagonal matrix given at the start of this article is

While it is always true that one should not solve by forming , for reasons of cost and numerical stability (unless is orthogonal!), it is even more true when is sparse.

Finally, we mention an interesting property of . Its upper triangle agrees with the upper triangle of the rank- matrix

This property generalizes to other tridiagonal matrices. So while a tridiagonal matrix is sparse, its inverse is data sparse—as it has to be because in general depends on parameters and hence so does . One implication of this property is that it is possible to compute the condition number of a tridiagonal matrix in flops.

This is a minimal set of references, which contain further useful references within.

- Timothy A. Davis, Direct Methods for Sparse Linear Systems, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2006.
- Timothy A. Davis, Sivasankaran Rajamanickam, and Wissam M. Sid-Lakhdar, A Survey of Direct Methods for Sparse Linear Systems, Acta Numerica 25, 383–566, 2016.
- Timothy A. Davis and Yifan Hu, The University of Florida Sparse Matrix Collection, ACM Trans. Math. Software 38 (1), 1:1–1:25, 2011.
*Note*: this collection is now called the SuiteSparse Matrix Collection. - Gareth I. Hargreaves, Computing the Condition Number of Tridiagonal and Diagonal-Plus-Semiseparable Matrices in Linear Time, SIAM J. Matrix Anal. Appl. 27, 801–820, 2006.
- Gérard Meurant, A Review on the Inverse of Symmetric Tridiagonal and Block Tridiagonal Matrices, SIAM J. Matrix Anal. Appl. 13, 707–728, 1992.
- Yousef Saad, Iterative Methods for Sparse Linear Systems, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2003.

where , , and . It is named after James Joseph Sylvester (1814–1897), who considered the homogeneous version of the equation, in 1884. Special cases of the equation are (a standard linear system), (matrix commutativity), (an eigenvalue–eigenvector equation), and (matrix inversion).

In the case where , taking the trace of both sides of the equation gives

so a solution can exist only when has zero trace. Hence , for example, has no solution.

To determine when the Sylvester equation has a solution we will transform it into a simpler form. Let and be Schur decompositions, where and are unitary and and are upper triangular. Premultiplying the Sylvester equation by , postmultiplying by , and setting and , we obtain

which is a Sylvester equation with upper triangular coefficient matrices. Equating the th columns on both sides leads to

As long as the triangular matrices are nonsingular for all we can uniquely solve for , , …, in turn. Hence the Sylvester equation has a unique solution if for all and , that is, if and have no eigenvalue in common.

Since the Sylvester is a linear equation it must be possible to express it in the standard form “”. This can be done by applying the vec operator, which yields

where is the Kronecker product. Using the Schur transformations above it is easy to show that the eigenvalues of the coefficient matrix are given in terms of those of and by

so the coefficient matrix is nonsingular when for all and .

By considering the derivative of , it can be shown that if the eigenvalues of and have negative real parts (that is, and are stable matrices) then

is the unique solution of .

An important application of the Sylvester equation is in block diagonalization. Consider the block upper triangular matrix

If we can find a nonsingular matrix such that then certain computations with become much easier. For example, for any function ,

so computing reduces to computing and . Setting

and noting that is just with the sign of the (1,2) block reversed, we find that

Hence block diagonalizes if satisfies the Sylvester equation , which we know is possible if the eigenvalues of and are distinct. This restriction is unsurprising, as without it we could use this construction to diagonalize a Jordan block, which of course is impossible.

For another way in which Sylvester equations arises consider the expansion for square matrices and , from which it follows that is the Fréchet derivative of the function at in the direction , written . Consequently, Newton’s method for the square root requires the solution of Sylvester equations, though in practice certain simplifications can be made to avoid their appearance. We can find the Fréchet derivative of by applying the chain rule to , which gives . Therefore is the solution to the Sylvester equation . Consequently, the Sylvester equation plays a role in the perturbation theory for matrix square roots.

Sylvester equations also arise in the Schur–Parlett algorithm for computing matrix functions, which reduces a matrix to triangular Schur form and then solves for , blockwise, by a recurrence.

How can we solve the Sylvester equation? One possibility is to solve by LU factorization with partial pivoting. However, the coefficient matrix is and LU factorization cannot exploit the Kronecker product structure, so this approach is prohibitively expensive unless and are small. It is more efficient to compute Schur decompositions of and , transform the problem, and solve a sequence of triangular systems, as described above in our derivation of the conditions for the existence of a unique solution. This method was developed by Bartels and Stewart in 1972 and it is implemented in the MATLAB function `sylvester`

.

In recent years research has focused particularly on solving Sylvester equations in which and are large and sparse and has low rank, which arise in applications in control theory and model reduction, for example. In this case it is usually possible to find good low rank approximations to and iterative methods based on Krylov subspaces have been very successful.

Define the separation of and by

The separation is positive if and have no eigenvalue in common, which we now assume. If is the solution to then

so is bounded by

It is not hard to show that , where is the matrix in . This bound on is a generalization of for .

The separation features in a perturbation bound for the Sylvester equation. If

then

where

While we have the upper bound , this inequality can be extremely weak for nonnormal matrices, so two matrices can have a small separation even if their eigenvalues are well separated. To illustrate, let denote the upper triangular matrix with on the diagonal and in all entries above the diagonal. The following table shows the values of for several values of .

Even though the eigenvalues of and are apart, the separation is at the level of the unit roundoff for as small as .

The sep function was originally introduced by Stewart in the 1970s as a tool for studying the sensitivity of invariant subspaces.

The Sylvester equation has many variations and special cases, including the Lyapunov equation , the discrete Sylvester equation , and versions of all these for operators. It has also been generalized to multiple terms and to have coefficient matrices on both sides of , yielding

For and this equation can be solved in flops. For , no flops algorithm is known and deriving efficient numerical methods remains an open problem. The equation arises in stochastic finite element discretizations of partial differential equations with random inputs, where the matrices and are large and sparse and, depending on the statistical properties of the random inputs, can be arbitrarily large.

This is a minimal set of references, which contain further useful references within.

- Peter Lancaster, Explicit Solutions of Linear Matrix Equations, SIAM Review, 12(4), 544–566, 1970.
- Nicholas J. Higham, Accuracy and Stability of Numerical Algorithms, second edition, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2002. Chapter 16.
- Nicholas J. Higham, Functions of Matrices: Theory and Computation, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2008.
- J. M. Varah, On the Separation of Two Matrices, SIAM J. Numer. Anal. 16 (2), 216–222, 1979.

In other words, is the block matrix with block . For example,

Notice that the entries of comprise every possible product , which is not the case for the usual matrix product when it is defined. Indeed if and are then

- is and contains sums of of the products ,
- is and contains all products .

Two key properties of the Kronecker product are

The second equality implies that when is an eigenvector of with eigenvalue and is an eigenvector of with eigenvalue then

so that is an eigenvalue of with eigenvector . In fact, the eigenvalues of are precisely for and .

Kronecker product structure arises in image deblurring models in which the blur is separable, that is, the blur in the horizontal direction can be separated from the blur in the vertical direction. Kronecker products also arise in the construction of Hadamard matrices. Recall that a Hadamard matrix is a matrix of s whose rows and columns are mutually orthogonal. If is an Hadamard matrix then

is a Hadamard matrix.

The practical significance of Kronecker product structure is that it allows computations on a large matrix to be reduced to computations on smaller matrices. For example, suppose and are Hermitian positive definite matrices and , which can be shown to be Hermitian positive definite from the properties mentioned above. If and are Cholesky factorizations then

so , which is easily seen to be triangular with positive diagonal elements, is the Cholesky factor of . If and are then forming and computing its Cholesky factorization costs flops, whereas and can be computed in flops. If we want to solve a linear system this can be done using and without explicitly form .

The vec operator stacks the columns of a matrix into one long vector: if then . The vec operator and the Kronecker product interact nicely: for any , , and for which the product is defined,

This relation allows us to express a linear system in the usual form “”.

The Kronecker sum of and is defined by . The eigenvalues of are , , , where the are the eigenvalues of and the are those of .

The Kronecker sum arises when we apply the vec operator to the matrix :

Kronecker sum structure also arises in finite difference discretizations of partial differential equations, such as when Poisson’s equation is discretized on a square by the usual five-point operator.

Since for the vectors and contain the same elements in different orders, we must have

for some permutation matrix . This matrix is called the the vec-permutation matrix, and is also known as the commutation matrix.

Kronecker multiplication is not commutative, that is, in general, but and do contain the same elements in different orders. In fact, the two matrices are related by row and column permutations: if and then

This relation can be obtained as follows: for ,

Since these equalities hold for all , we have , from which the relation follows on using , which can be obtained by replacing by in the definition of vec.

An explicit expression for the the vec-permutation matrix is

where is the th unit vector.

The following plot shows the sparsity patterns of all the vec permutation matrices with , where the title of each subplot is .

In MATLAB the Kronecker product can be computed as `kron(A,B)`

and is obtained by indexing with a colon: `A(:)`

. Be careful using `kron`

as it can generate very large matrices!

The Kronecker product is named after Leopold Kronecker (1823–1891). Henderson et al. (1983) suggest that it should be called the Zehfuss product, after Johann Georg Zehfuss (1832–1891), who obtained the result for and in 1858.

This is a minimal set of references, which contain further useful references within.

- Per Christian Hansen, James G. Nagy, and Dianne P. O’Leary, Deblurring Images: Matrices, Spectra, and Filtering, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2006.
- Harold V. Henderson, Friedrich Pukelsheim, and Shayle R. Searle, On the History of the Kronecker Product, Linear and Multilinear Algebra 14(2), 113–120, 1983.
- Harold V. Henderson and Shayle R. Searle, The Vec-Permutation Matrix, the Vec Operator and Kronecker Products: A Review, Linear and Multilinear Algebra 9, 271–288, 1981.
*Note*: the definition of gives the transpose of as we have defined it. - Jan R. Magnus and Heinz Neudecker, The Commutation Matrix: Some Properties and Applications, Ann. Statist. 7, 381–394, 1979
- Roger Horn and Charles Johnson, Topics in Matrix Analysis, Cambridge University Press, 1991. Chapter 4.
- Charles F. Van Loan, The Ubiquitous Kronecker Product, J. Comput. Appl. Math. 123(1–2), 85–100, 2000.

The symbol is typed in as `\otimes`

.