## What Is the Sylvester Equation?

The Sylvester equation is the linear matrix equation

$AX - XB = C,$

where $A\in\mathbb{C}^{m\times m}$, $B\in\mathbb{C}^{n\times n}$, and $X,C\in\mathbb{C}^{m\times n}$. It is named after James Joseph Sylvester (1814–1897), who considered the homogeneous version of the equation, $AX - XB = 0$ in 1884. Special cases of the equation are $Ax = b$ (a standard linear system), $AX = XA$ (matrix commutativity), $Ax = \lambda x$ (an eigenvalue–eigenvector equation), and $AX = I$ (matrix inversion).

In the case where $B = A$, taking the trace of both sides of the equation gives

$\mathrm{trace}(C) = \mathrm{trace}(AX - XA) = \mathrm{trace}(AX) - \mathrm{trace} (XA) = 0,$

so a solution can exist only when $C$ has zero trace. Hence $AX - XA = I$, for example, has no solution.

To determine when the Sylvester equation has a solution we will transform it into a simpler form. Let $A = URU^*$ and $B = VSV^*$ be Schur decompositions, where $U$ and $V$ are unitary and $R$ and $S$ are upper triangular. Premultiplying the Sylvester equation by $U^*$, postmultiplying by $V$, and setting $Y = U^*XV$ and $D = U^*CV$, we obtain

$RY - YS = D,$

which is a Sylvester equation with upper triangular coefficient matrices. Equating the $j$th columns on both sides leads to

$(R - s_{jj}I) y_j = d_j - \displaystyle\sum_{k=1}^{j-1} s_{kj} y_k, \quad j = 1\colon n.$

As long as the triangular matrices $R - s_{jj}I$ are nonsingular for all $j$ we can uniquely solve for $y_1$, $y_2$, …, $y_n$ in turn. Hence the Sylvester equation has a unique solution if $r_{ii} \ne s_{jj}$ for all $i$ and $j$, that is, if $A$ and $B$ have no eigenvalue in common.

Since the Sylvester is a linear equation it must be possible to express it in the standard form “$Ax = b$”. This can be done by applying the vec operator, which yields

$\qquad\qquad\qquad\qquad\qquad (I_n \otimes A - B^T \otimes I_m) \mathrm{vec}(X) = \mathrm{vec}(C), \qquad\qquad\qquad\qquad\qquad(*)$

where $\otimes$ is the Kronecker product. Using the Schur transformations above it is easy to show that the eigenvalues of the coefficient matrix are given in terms of those of $A$ and $B$ by

$\lambda_{ij} (I_n\otimes A - B^T\otimes I_m) = \lambda_i(A) - \lambda_j(B), \quad i=1\colon m, \quad j=1\colon n,$

so the coefficient matrix is nonsingular when $\lambda_i(A) \ne \lambda_j(B)$ for all $i$ and $j$.

By considering the derivative of $Z(t) = \mathrm{e}^{At}C\mathrm{e}^{-Bt}$, it can be shown that if the eigenvalues of $A$ and $-B$ have negative real parts (that is, $A$ and $-B$ are stable matrices) then

$X = -\displaystyle\int_0^{\infty} \mathrm{e}^{At} C \mathrm{e}^{-Bt} \, \mathrm{d}t$

is the unique solution of $AX - XB = C$.

## Applications

An important application of the Sylvester equation is in block diagonalization. Consider the block upper triangular matrix

$T = \begin{bmatrix} A & C\\ 0 & B \end{bmatrix}.$

If we can find a nonsingular matrix $Z$ such that $Z^{-1}TZ = \mathrm{diag}(A,B)$ then certain computations with $T$ become much easier. For example, for any function $f$,

$f(T) = f(Z \mathrm{diag}(A,B) Z^{-1}) = Zf(\mathrm{diag}(A,B)) Z^{-1} = Z\mathrm{diag}(f(A),f(B)) Z^{-1},$

so computing $f(T)$ reduces to computing $f(A)$ and $f(B)$. Setting

$Z = \begin{bmatrix} I & -X\\ 0 & I \end{bmatrix}.$

and noting that $Z^{-1}$ is just $Z$ with the sign of the (1,2) block reversed, we find that

$Z^{-1} TZ = \begin{bmatrix} A & -AX + XB + C\\ 0 & B \end{bmatrix}.$

Hence $Z$ block diagonalizes $T$ if $X$ satisfies the Sylvester equation $AX - XB = C$, which we know is possible if the eigenvalues of $A$ and $B$ are distinct. This restriction is unsurprising, as without it we could use this construction to diagonalize a $2\times 2$ Jordan block, which of course is impossible.

For another way in which Sylvester equations arises consider the expansion $(X+E)^2 = X^2 + XE + EX + E^2$ for square matrices $X$ and $E$, from which it follows that $XE + EX$ is the Fréchet derivative of the function $x^2$ at $X$ in the direction $E$, written $L_{x^2}(X,E)$. Consequently, Newton’s method for the square root requires the solution of Sylvester equations, though in practice certain simplifications can be made to avoid their appearance. We can find the Fréchet derivative of $x^{1/2}$ by applying the chain rule to $\bigl(x^{1/2}\bigr)^2 = x$, which gives $L_{x^2}\left(X^{1/2}, L_{x^{1/2}}(X,E)\right) = E$. Therefore $Z = L_{x^{1/2}}(X,E)$ is the solution to the Sylvester equation $X^{1/2} Z + Z X^{1/2} = E$. Consequently, the Sylvester equation plays a role in the perturbation theory for matrix square roots.

Sylvester equations also arise in the Schur–Parlett algorithm for computing matrix functions, which reduces a matrix to triangular Schur form $T$ and then solves $TF-FT = 0$ for $F = f(T)$, blockwise, by a recurrence.

## Solution Methods

How can we solve the Sylvester equation? One possibility is to solve $(*)$ by LU factorization with partial pivoting. However, the coefficient matrix is $mn\times mn$ and LU factorization cannot exploit the Kronecker product structure, so this approach is prohibitively expensive unless $m$ and $n$ are small. It is more efficient to compute Schur decompositions of $A$ and $B$, transform the problem, and solve a sequence of triangular systems, as described above in our derivation of the conditions for the existence of a unique solution. This method was developed by Bartels and Stewart in 1972 and it is implemented in the MATLAB function sylvester.

In recent years research has focused particularly on solving Sylvester equations in which $A$ and $B$ are large and sparse and $C$ has low rank, which arise in applications in control theory and model reduction, for example. In this case it is usually possible to find good low rank approximations to $X$ and iterative methods based on Krylov subspaces have been very successful.

## Sensitivity and the Separation

Define the separation of $A$ and $B$ by

$\mathrm{sep}(A,B) = \displaystyle\min_{Z\ne0} \displaystyle\frac{ \|AZ-ZB\|_F }{ \|Z\|_F }.$

The separation is positive if $A$ and $B$ have no eigenvalue in common, which we now assume. If $X$ is the solution to $AX - XB = C$ then

$\notag \mathrm{sep}(A,B) \le \displaystyle\frac{ \|AX-XB\|_F }{ \|X\|_F } = \frac{\|C\|_F}{\|X\|_F},$

so $X$ is bounded by

$\notag \|X\|_F \le \displaystyle\frac{\|C\|_F}{\mathrm{sep}(A,B)}.$

It is not hard to show that $\mathrm{sep}(A,B)^{-1} = \|P^{-1}\|_2$, where $P$ is the matrix in $(*)$. This bound on $\|X\|_F$ is a generalization of $\|x\|_2 \le \|A^{-1}\|_2 \|b\|_2$ for $Ax = b$.

The separation features in a perturbation bound for the Sylvester equation. If

$\notag (A+\Delta A)(X+\Delta X) - (X+\Delta X)(B+\Delta B) = C+\Delta C,$

then

$\notag \displaystyle\frac{ \|\Delta X\|_F }{ \|X\|_F } \le 2\sqrt{3}\, \mathrm{sep}(A,B)^{-1} (\|A\|_F + \|B\|_F) \epsilon + O(\epsilon^2),$

where

$\notag \epsilon = \max \left\{ \displaystyle\frac{\|\Delta A\|_F}{\|A\|_F}, \frac{\|\Delta B\|_F}{\|B\|_F}, \frac{\|\Delta C\|_F}{\|C\|_F} \right\}.$

While we have the upper bound $\mathrm{sep}(A,B) \le \min_{i,j} |\lambda_i(A) - \lambda_j(B)|$, this inequality can be extremely weak for nonnormal matrices, so two matrices can have a small separation even if their eigenvalues are well separated. To illustrate, let $T(\alpha)$ denote the $n\times n$ upper triangular matrix with $\alpha$ on the diagonal and $-1$ in all entries above the diagonal. The following table shows the values of $\mathrm{sep}(T(1),T(1.1))$ for several values of $n$.

Even though the eigenvalues of $A$ and $B$ are $0.1$ apart, the separation is at the level of the unit roundoff for $n$ as small as $8$.

The sep function was originally introduced by Stewart in the 1970s as a tool for studying the sensitivity of invariant subspaces.

## Variations and Generalizations

The Sylvester equation has many variations and special cases, including the Lyapunov equation $AX + XA^* = C$, the discrete Sylvester equation $X + AXB = C$, and versions of all these for operators. It has also been generalized to multiple terms and to have coefficient matrices on both sides of $X$, yielding

$\displaystyle\sum_{i=1}^k A_i X B_i = C.$

For $k\le 2$ and $m=n$ this equation can be solved in $O(n^3)$ flops. For $k > 2$, no $O(n^3)$ flops algorithm is known and deriving efficient numerical methods remains an open problem. The equation arises in stochastic finite element discretizations of partial differential equations with random inputs, where the matrices $A_i$ and $B_i$ are large and sparse and, depending on the statistical properties of the random inputs, $k$ can be arbitrarily large.

## References

This is a minimal set of references, which contain further useful references within.