The Sylvester equation is the linear matrix equation
where ,
, and
. It is named after James Joseph Sylvester (1814–1897), who considered the homogeneous version of the equation,
in 1884. Special cases of the equation are
(a standard linear system),
(matrix commutativity),
(an eigenvalue–eigenvector equation), and
(matrix inversion).
In the case where , taking the trace of both sides of the equation gives
so a solution can exist only when has zero trace. Hence
, for example, has no solution.
To determine when the Sylvester equation has a solution we will transform it into a simpler form. Let and
be Schur decompositions, where
and
are unitary and
and
are upper triangular. Premultiplying the Sylvester equation by
, postmultiplying by
, and setting
and
, we obtain
which is a Sylvester equation with upper triangular coefficient matrices. Equating the th columns on both sides leads to
As long as the triangular matrices are nonsingular for all
we can uniquely solve for
,
, …,
in turn. Hence the Sylvester equation has a unique solution if
for all
and
, that is, if
and
have no eigenvalue in common.
Since the Sylvester is a linear equation it must be possible to express it in the standard form “”. This can be done by applying the vec operator, which yields
where is the Kronecker product. Using the Schur transformations above it is easy to show that the eigenvalues of the coefficient matrix are given in terms of those of
and
by
so the coefficient matrix is nonsingular when for all
and
.
By considering the derivative of , it can be shown that if the eigenvalues of
and
have negative real parts (that is,
and
are stable matrices) then
is the unique solution of .
Applications
An important application of the Sylvester equation is in block diagonalization. Consider the block upper triangular matrix
If we can find a nonsingular matrix such that
then certain computations with
become much easier. For example, for any function
,
so computing reduces to computing
and
. Setting
and noting that is just
with the sign of the (1,2) block reversed, we find that
Hence block diagonalizes
if
satisfies the Sylvester equation
, which we know is possible if the eigenvalues of
and
are distinct. This restriction is unsurprising, as without it we could use this construction to diagonalize a
Jordan block, which of course is impossible.
For another way in which Sylvester equations arises consider the expansion for square matrices
and
, from which it follows that
is the Fréchet derivative of the function
at
in the direction
, written
. Consequently, Newton’s method for the square root requires the solution of Sylvester equations, though in practice certain simplifications can be made to avoid their appearance. We can find the Fréchet derivative of
by applying the chain rule to
, which gives
. Therefore
is the solution to the Sylvester equation
. Consequently, the Sylvester equation plays a role in the perturbation theory for matrix square roots.
Sylvester equations also arise in the Schur–Parlett algorithm for computing matrix functions, which reduces a matrix to triangular Schur form and then solves
for
, blockwise, by a recurrence.
Solution Methods
How can we solve the Sylvester equation? One possibility is to solve by LU factorization with partial pivoting. However, the coefficient matrix is
and LU factorization cannot exploit the Kronecker product structure, so this approach is prohibitively expensive unless
and
are small. It is more efficient to compute Schur decompositions of
and
, transform the problem, and solve a sequence of triangular systems, as described above in our derivation of the conditions for the existence of a unique solution. This method was developed by Bartels and Stewart in 1972 and it is implemented in the MATLAB function
sylvester
.
In recent years research has focused particularly on solving Sylvester equations in which and
are large and sparse and
has low rank, which arise in applications in control theory and model reduction, for example. In this case it is usually possible to find good low rank approximations to
and iterative methods based on Krylov subspaces have been very successful.
Sensitivity and the Separation
Define the separation of and
by
The separation is positive if and
have no eigenvalue in common, which we now assume. If
is the solution to
then
so is bounded by
It is not hard to show that , where
is the matrix in
. This bound on
is a generalization of
for
.
The separation features in a perturbation bound for the Sylvester equation. If
then
where
While we have the upper bound , this inequality can be extremely weak for nonnormal matrices, so two matrices can have a small separation even if their eigenvalues are well separated. To illustrate, let
denote the
upper triangular matrix with
on the diagonal and
in all entries above the diagonal. The following table shows the values of
for several values of
.
Even though the eigenvalues of and
are
apart, the separation is at the level of the unit roundoff for
as small as
.
The sep function was originally introduced by Stewart in the 1970s as a tool for studying the sensitivity of invariant subspaces.
Variations and Generalizations
The Sylvester equation has many variations and special cases, including the Lyapunov equation , the discrete Sylvester equation
, and versions of all these for operators. It has also been generalized to multiple terms and to have coefficient matrices on both sides of
, yielding
For and
this equation can be solved in
flops. For
, no
flops algorithm is known and deriving efficient numerical methods remains an open problem. The equation arises in stochastic finite element discretizations of partial differential equations with random inputs, where the matrices
and
are large and sparse and, depending on the statistical properties of the random inputs,
can be arbitrarily large.
References
This is a minimal set of references, which contain further useful references within.
- Peter Lancaster, Explicit Solutions of Linear Matrix Equations, SIAM Review, 12(4), 544–566, 1970.
- Nicholas J. Higham, Accuracy and Stability of Numerical Algorithms, second edition, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2002. Chapter 16.
- Nicholas J. Higham, Functions of Matrices: Theory and Computation, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2008.
- J. M. Varah, On the Separation of Two Matrices, SIAM J. Numer. Anal. 16 (2), 216–222, 1979.
Related Blog Posts
This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.
Thank you for this post, I bumped into Sylvester equation while modelling biological oscillators as linear time invariant systems. If you have a set of aligned time series for different entities, each one periodical, you can transform in frequency domain and convert the ODE system into a matrix equation, which was the Sylvester equation. After finding this I could find solutions and characterise the entities of the equation. I am sure the latter is trivial for people doing control, for me it was the key to get my method to work.