The Sylvester equation is the linear matrix equation

where , , and . It is named after James Joseph Sylvester (1814–1897), who considered the homogeneous version of the equation, in 1884. Special cases of the equation are (a standard linear system), (matrix commutativity), (an eigenvalue–eigenvector equation), and (matrix inversion).

In the case where , taking the trace of both sides of the equation gives

so a solution can exist only when has zero trace. Hence , for example, has no solution.

To determine when the Sylvester equation has a solution we will transform it into a simpler form. Let and be Schur decompositions, where and are unitary and and are upper triangular. Premultiplying the Sylvester equation by , postmultiplying by , and setting and , we obtain

which is a Sylvester equation with upper triangular coefficient matrices. Equating the th columns on both sides leads to

As long as the triangular matrices are nonsingular for all we can uniquely solve for , , …, in turn. Hence the Sylvester equation has a unique solution if for all and , that is, if and have no eigenvalue in common.

Since the Sylvester is a linear equation it must be possible to express it in the standard form “”. This can be done by applying the vec operator, which yields

where is the Kronecker product. Using the Schur transformations above it is easy to show that the eigenvalues of the coefficient matrix are given in terms of those of and by

so the coefficient matrix is nonsingular when for all and .

By considering the derivative of , it can be shown that if the eigenvalues of and have negative real parts (that is, and are stable matrices) then

is the unique solution of .

## Applications

An important application of the Sylvester equation is in block diagonalization. Consider the block upper triangular matrix

If we can find a nonsingular matrix such that then certain computations with become much easier. For example, for any function ,

so computing reduces to computing and . Setting

and noting that is just with the sign of the (1,2) block reversed, we find that

Hence block diagonalizes if satisfies the Sylvester equation , which we know is possible if the eigenvalues of and are distinct. This restriction is unsurprising, as without it we could use this construction to diagonalize a Jordan block, which of course is impossible.

For another way in which Sylvester equations arises consider the expansion for square matrices and , from which it follows that is the Fréchet derivative of the function at in the direction , written . Consequently, Newton’s method for the square root requires the solution of Sylvester equations, though in practice certain simplifications can be made to avoid their appearance. We can find the Fréchet derivative of by applying the chain rule to , which gives . Therefore is the solution to the Sylvester equation . Consequently, the Sylvester equation plays a role in the perturbation theory for matrix square roots.

Sylvester equations also arise in the Schur–Parlett algorithm for computing matrix functions, which reduces a matrix to triangular Schur form and then solves for , blockwise, by a recurrence.

## Solution Methods

How can we solve the Sylvester equation? One possibility is to solve by LU factorization with partial pivoting. However, the coefficient matrix is and LU factorization cannot exploit the Kronecker product structure, so this approach is prohibitively expensive unless and are small. It is more efficient to compute Schur decompositions of and , transform the problem, and solve a sequence of triangular systems, as described above in our derivation of the conditions for the existence of a unique solution. This method was developed by Bartels and Stewart in 1972 and it is implemented in the MATLAB function `sylvester`

.

In recent years research has focused particularly on solving Sylvester equations in which and are large and sparse and has low rank, which arise in applications in control theory and model reduction, for example. In this case it is usually possible to find good low rank approximations to and iterative methods based on Krylov subspaces have been very successful.

## Sensitivity and the Separation

Define the separation of and by

The separation is positive if and have no eigenvalue in common, which we now assume. If is the solution to then

so is bounded by

It is not hard to show that , where is the matrix in . This bound on is a generalization of for .

The separation features in a perturbation bound for the Sylvester equation. If

then

where

While we have the upper bound , this inequality can be extremely weak for nonnormal matrices, so two matrices can have a small separation even if their eigenvalues are well separated. To illustrate, let denote the upper triangular matrix with on the diagonal and in all entries above the diagonal. The following table shows the values of for several values of .

Even though the eigenvalues of and are apart, the separation is at the level of the unit roundoff for as small as .

The sep function was originally introduced by Stewart in the 1970s as a tool for studying the sensitivity of invariant subspaces.

## Variations and Generalizations

The Sylvester equation has many variations and special cases, including the Lyapunov equation , the discrete Sylvester equation , and versions of all these for operators. It has also been generalized to multiple terms and to have coefficient matrices on both sides of , yielding

For and this equation can be solved in flops. For , no flops algorithm is known and deriving efficient numerical methods remains an open problem. The equation arises in stochastic finite element discretizations of partial differential equations with random inputs, where the matrices and are large and sparse and, depending on the statistical properties of the random inputs, can be arbitrarily large.

## References

This is a minimal set of references, which contain further useful references within.

- Peter Lancaster, Explicit Solutions of Linear Matrix Equations, SIAM Review, 12(4), 544–566, 1970.
- Nicholas J. Higham, Accuracy and Stability of Numerical Algorithms, second edition, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2002. Chapter 16.
- Nicholas J. Higham, Functions of Matrices: Theory and Computation, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2008.
- J. M. Varah, On the Separation of Two Matrices, SIAM J. Numer. Anal. 16 (2), 216–222, 1979.

## Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.