The Sylvester equation is the linear matrix equation
where , , and . It is named after James Joseph Sylvester (1814–1897), who considered the homogeneous version of the equation, in 1884. Special cases of the equation are (a standard linear system), (matrix commutativity), (an eigenvalue–eigenvector equation), and (matrix inversion).
In the case where , taking the trace of both sides of the equation gives
so a solution can exist only when has zero trace. Hence , for example, has no solution.
To determine when the Sylvester equation has a solution we will transform it into a simpler form. Let and be Schur decompositions, where and are unitary and and are upper triangular. Premultiplying the Sylvester equation by , postmultiplying by , and setting and , we obtain
which is a Sylvester equation with upper triangular coefficient matrices. Equating the th columns on both sides leads to
As long as the triangular matrices are nonsingular for all we can uniquely solve for , , …, in turn. Hence the Sylvester equation has a unique solution if for all and , that is, if and have no eigenvalue in common.
Since the Sylvester is a linear equation it must be possible to express it in the standard form “”. This can be done by applying the vec operator, which yields
where is the Kronecker product. Using the Schur transformations above it is easy to show that the eigenvalues of the coefficient matrix are given in terms of those of and by
so the coefficient matrix is nonsingular when for all and .
By considering the derivative of , it can be shown that if the eigenvalues of and have negative real parts (that is, and are stable matrices) then
is the unique solution of .
An important application of the Sylvester equation is in block diagonalization. Consider the block upper triangular matrix
If we can find a nonsingular matrix such that then certain computations with become much easier. For example, for any function ,
so computing reduces to computing and . Setting
and noting that is just with the sign of the (1,2) block reversed, we find that
Hence block diagonalizes if satisfies the Sylvester equation , which we know is possible if the eigenvalues of and are distinct. This restriction is unsurprising, as without it we could use this construction to diagonalize a Jordan block, which of course is impossible.
For another way in which Sylvester equations arises consider the expansion for square matrices and , from which it follows that is the Fréchet derivative of the function at in the direction , written . Consequently, Newton’s method for the square root requires the solution of Sylvester equations, though in practice certain simplifications can be made to avoid their appearance. We can find the Fréchet derivative of by applying the chain rule to , which gives . Therefore is the solution to the Sylvester equation . Consequently, the Sylvester equation plays a role in the perturbation theory for matrix square roots.
Sylvester equations also arise in the Schur–Parlett algorithm for computing matrix functions, which reduces a matrix to triangular Schur form and then solves for , blockwise, by a recurrence.
How can we solve the Sylvester equation? One possibility is to solve by LU factorization with partial pivoting. However, the coefficient matrix is and LU factorization cannot exploit the Kronecker product structure, so this approach is prohibitively expensive unless and are small. It is more efficient to compute Schur decompositions of and , transform the problem, and solve a sequence of triangular systems, as described above in our derivation of the conditions for the existence of a unique solution. This method was developed by Bartels and Stewart in 1972 and it is implemented in the MATLAB function
In recent years research has focused particularly on solving Sylvester equations in which and are large and sparse and has low rank, which arise in applications in control theory and model reduction, for example. In this case it is usually possible to find good low rank approximations to and iterative methods based on Krylov subspaces have been very successful.
Sensitivity and the Separation
Define the separation of and by
The separation is positive if and have no eigenvalue in common, which we now assume. If is the solution to then
so is bounded by
It is not hard to show that , where is the matrix in . This bound on is a generalization of for .
The separation features in a perturbation bound for the Sylvester equation. If
While we have the upper bound , this inequality can be extremely weak for nonnormal matrices, so two matrices can have a small separation even if their eigenvalues are well separated. To illustrate, let denote the upper triangular matrix with on the diagonal and in all entries above the diagonal. The following table shows the values of for several values of .
Even though the eigenvalues of and are apart, the separation is at the level of the unit roundoff for as small as .
The sep function was originally introduced by Stewart in the 1970s as a tool for studying the sensitivity of invariant subspaces.
Variations and Generalizations
The Sylvester equation has many variations and special cases, including the Lyapunov equation , the discrete Sylvester equation , and versions of all these for operators. It has also been generalized to multiple terms and to have coefficient matrices on both sides of , yielding
For and this equation can be solved in flops. For , no flops algorithm is known and deriving efficient numerical methods remains an open problem. The equation arises in stochastic finite element discretizations of partial differential equations with random inputs, where the matrices and are large and sparse and, depending on the statistical properties of the random inputs, can be arbitrarily large.
This is a minimal set of references, which contain further useful references within.
- Peter Lancaster, Explicit Solutions of Linear Matrix Equations, SIAM Review, 12(4), 544–566, 1970.
- Nicholas J. Higham, Accuracy and Stability of Numerical Algorithms, second edition, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2002. Chapter 16.
- Nicholas J. Higham, Functions of Matrices: Theory and Computation, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2008.
- J. M. Varah, On the Separation of Two Matrices, SIAM J. Numer. Anal. 16 (2), 216–222, 1979.
Related Blog Posts
This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.
One thought on “What Is the Sylvester Equation?”
Thank you for this post, I bumped into Sylvester equation while modelling biological oscillators as linear time invariant systems. If you have a set of aligned time series for different entities, each one periodical, you can transform in frequency domain and convert the ODE system into a matrix equation, which was the Sylvester equation. After finding this I could find solutions and characterise the entities of the equation. I am sure the latter is trivial for people doing control, for me it was the key to get my method to work.