What Is the Sylvester Equation?

The Sylvester equation is the linear matrix equation

AX - XB = C,

where A\in\mathbb{C}^{m\times m}, B\in\mathbb{C}^{n\times n}, and X,C\in\mathbb{C}^{m\times n}. It is named after James Joseph Sylvester (1814–1897), who considered the homogeneous version of the equation, AX - XB = 0 in 1884. Special cases of the equation are Ax = b (a standard linear system), AX  = XA (matrix commutativity), Ax = \lambda x (an eigenvalue–eigenvector equation), and AX = I (matrix inversion).

In the case where B = A, taking the trace of both sides of the equation gives

\mathrm{trace}(C) = \mathrm{trace}(AX - XA) = \mathrm{trace}(AX) -  \mathrm{trace} (XA) = 0,

so a solution can exist only when C has zero trace. Hence AX - XA = I, for example, has no solution.

To determine when the Sylvester equation has a solution we will transform it into a simpler form. Let A = URU^* and B = VSV^* be Schur decompositions, where U and V are unitary and R and S are upper triangular. Premultiplying the Sylvester equation by U^*, postmultiplying by V, and setting Y = U^*XV and D = U^*CV, we obtain

RY - YS = D,

which is a Sylvester equation with upper triangular coefficient matrices. Equating the jth columns on both sides leads to

(R - s_{jj}I) y_j = d_j - \displaystyle\sum_{k=1}^{j-1} s_{kj} y_k, \quad j = 1\colon n.

As long as the triangular matrices R - s_{jj}I are nonsingular for all j we can uniquely solve for y_1, y_2, …, y_n in turn. Hence the Sylvester equation has a unique solution if r_{ii} \ne s_{jj} for all i and j, that is, if A and B have no eigenvalue in common.

Since the Sylvester is a linear equation it must be possible to express it in the standard form “Ax = b”. This can be done by applying the vec operator, which yields

\qquad\qquad\qquad\qquad\qquad    (I_n \otimes A - B^T \otimes I_m) \mathrm{vec}(X) = \mathrm{vec}(C),    \qquad\qquad\qquad\qquad\qquad(*)

where \otimes is the Kronecker product. Using the Schur transformations above it is easy to show that the eigenvalues of the coefficient matrix are given in terms of those of A and B by

\lambda_{ij} (I_n\otimes A - B^T\otimes I_m) = \lambda_i(A) - \lambda_j(B),   \quad i=1\colon m,  \quad j=1\colon n,

so the coefficient matrix is nonsingular when \lambda_i(A) \ne \lambda_j(B) for all i and j.

By considering the derivative of Z(t) = \mathrm{e}^{At}C\mathrm{e}^{-Bt}, it can be shown that if the eigenvalues of A and -B have negative real parts (that is, A and -B are stable matrices) then

X = -\displaystyle\int_0^{\infty} \mathrm{e}^{At} C \mathrm{e}^{-Bt} \, \mathrm{d}t

is the unique solution of AX - XB = C.


An important application of the Sylvester equation is in block diagonalization. Consider the block upper triangular matrix

T = \begin{bmatrix}       A & C\\       0 & B    \end{bmatrix}.

If we can find a nonsingular matrix Z such that Z^{-1}TZ = \mathrm{diag}(A,B) then certain computations with T become much easier. For example, for any function f,

f(T) = f(Z \mathrm{diag}(A,B) Z^{-1})          = Zf(\mathrm{diag}(A,B)) Z^{-1}          = Z\mathrm{diag}(f(A),f(B)) Z^{-1},

so computing f(T) reduces to computing f(A) and f(B). Setting

Z = \begin{bmatrix}       I & -X\\       0 & I    \end{bmatrix}.

and noting that Z^{-1} is just Z with the sign of the (1,2) block reversed, we find that

Z^{-1} TZ = \begin{bmatrix}       A & -AX + XB + C\\       0 & B    \end{bmatrix}.

Hence Z block diagonalizes T if X satisfies the Sylvester equation AX - XB = C, which we know is possible if the eigenvalues of A and B are distinct. This restriction is unsurprising, as without it we could use this construction to diagonalize a 2\times 2 Jordan block, which of course is impossible.

For another way in which Sylvester equations arises consider the expansion (X+E)^2 = X^2 + XE + EX + E^2 for square matrices X and E, from which it follows that XE + EX is the Fréchet derivative of the function x^2 at X in the direction E, written L_{x^2}(X,E). Consequently, Newton’s method for the square root requires the solution of Sylvester equations, though in practice certain simplifications can be made to avoid their appearance. We can find the Fréchet derivative of x^{1/2} by applying the chain rule to \bigl(x^{1/2}\bigr)^2 = x, which gives L_{x^2}\left(X^{1/2}, L_{x^{1/2}}(X,E)\right) = E. Therefore Z = L_{x^{1/2}}(X,E) is the solution to the Sylvester equation X^{1/2} Z + Z X^{1/2}  = E. Consequently, the Sylvester equation plays a role in the perturbation theory for matrix square roots.

Sylvester equations also arise in the Schur–Parlett algorithm for computing matrix functions, which reduces a matrix to triangular Schur form T and then solves TF-FT = 0 for F = f(T), blockwise, by a recurrence.

Solution Methods

How can we solve the Sylvester equation? One possibility is to solve (*) by LU factorization with partial pivoting. However, the coefficient matrix is mn\times mn and LU factorization cannot exploit the Kronecker product structure, so this approach is prohibitively expensive unless m and n are small. It is more efficient to compute Schur decompositions of A and B, transform the problem, and solve a sequence of triangular systems, as described above in our derivation of the conditions for the existence of a unique solution. This method was developed by Bartels and Stewart in 1972 and it is implemented in the MATLAB function sylvester.

In recent years research has focused particularly on solving Sylvester equations in which A and B are large and sparse and C has low rank, which arise in applications in control theory and model reduction, for example. In this case it is usually possible to find good low rank approximations to X and iterative methods based on Krylov subspaces have been very successful.

Sensitivity and the Separation

Define the separation of A and B by

\mathrm{sep}(A,B) =       \displaystyle\min_{Z\ne0} \displaystyle\frac{ \|AZ-ZB\|_F }{ \|Z\|_F }.

The separation is positive if A and B have no eigenvalue in common, which we now assume. If X is the solution to AX - XB = C then

\notag   \mathrm{sep}(A,B) \le \displaystyle\frac{ \|AX-XB\|_F }{ \|X\|_F }                      = \frac{\|C\|_F}{\|X\|_F},

so X is bounded by

\notag   \|X\|_F \le \displaystyle\frac{\|C\|_F}{\mathrm{sep}(A,B)}.

It is not hard to show that \mathrm{sep}(A,B)^{-1} = \|P^{-1}\|_2, where P is the matrix in (*). This bound on \|X\|_F is a generalization of \|x\|_2 \le \|A^{-1}\|_2 \|b\|_2 for Ax = b.

The separation features in a perturbation bound for the Sylvester equation. If

\notag     (A+\Delta A)(X+\Delta X) - (X+\Delta X)(B+\Delta B) = C+\Delta C,


\notag      \displaystyle\frac{ \|\Delta X\|_F }{ \|X\|_F }      \le 2\sqrt{3}\, \mathrm{sep}(A,B)^{-1} (\|A\|_F + \|B\|_F) \epsilon           + O(\epsilon^2),


\notag    \epsilon = \max \left\{ \displaystyle\frac{\|\Delta A\|_F}{\|A\|_F},                            \frac{\|\Delta B\|_F}{\|B\|_F},                            \frac{\|\Delta C\|_F}{\|C\|_F} \right\}.

While we have the upper bound \mathrm{sep}(A,B) \le \min_{i,j} |\lambda_i(A) - \lambda_j(B)|, this inequality can be extremely weak for nonnormal matrices, so two matrices can have a small separation even if their eigenvalues are well separated. To illustrate, let T(\alpha) denote the n\times n upper triangular matrix with \alpha on the diagonal and -1 in all entries above the diagonal. The following table shows the values of \mathrm{sep}(T(1),T(1.1)) for several values of n.


Even though the eigenvalues of A and B are 0.1 apart, the separation is at the level of the unit roundoff for n as small as 8.

The sep function was originally introduced by Stewart in the 1970s as a tool for studying the sensitivity of invariant subspaces.

Variations and Generalizations

The Sylvester equation has many variations and special cases, including the Lyapunov equation AX + XA^* = C, the discrete Sylvester equation X + AXB = C, and versions of all these for operators. It has also been generalized to multiple terms and to have coefficient matrices on both sides of X, yielding

\displaystyle\sum_{i=1}^k A_i X B_i = C.

For k\le 2 and m=n this equation can be solved in O(n^3) flops. For k > 2, no O(n^3) flops algorithm is known and deriving efficient numerical methods remains an open problem. The equation arises in stochastic finite element discretizations of partial differential equations with random inputs, where the matrices A_i and B_i are large and sparse and, depending on the statistical properties of the random inputs, k can be arbitrarily large.


This is a minimal set of references, which contain further useful references within.

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

This entry was posted in what-is. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s