What is the Cayley–Hamilton Theorem?

The Cayley–Hamilton Theorem says that a square matrix A satisfies its characteristic equation, that is p(A) = 0 where p(t) = \det(tI-A) is the characteristic polynomial. This statement is not simply the substitution “p(A) = \det(A - A) = 0”, which is not valid since t must remain a scalar inside the \det term. Rather, for an n\times n A, the characteristic polynomial has the form

\notag    p(t) = t^n + a_{n-1}t^{n-1} + \cdots + a_1 t + a_0

and the Cayley–Hamilton theorem says that

\notag    p(A) = A^n + a_{n-1}A^{n-1} + \cdots + a_1 A + a_0 I = 0.

Various proofs of the theorem are available, of which we give two. The first is the most natural for anyone familiar with the Jordan canonical form. The second is more elementary but less obvious.

First proof.

Consider a 4\times 4 Jordan block with eigenvalue \lambda:

\notag     J =      \begin{bmatrix}    \lambda & 1       & 0        & 0       \\            & \lambda & 1        & 0\\            &         & \lambda  & 1\\            &         &          & \lambda         \end{bmatrix}.

We have

\notag     J - \lambda I =      \begin{bmatrix}          0 & 1       & 0        & 0       \\            &  0      & 1        & 0\\            &         & 0        & 1\\            &         &          & 0         \end{bmatrix}, \quad     (J - \lambda I)^2 =      \begin{bmatrix}            0  & 0  & 1 & 0 \\               & 0  & 0 & 1 \\               &    & 0 & 0 \\               &    &   & 0         \end{bmatrix}, \quad     (J - \lambda I)^3 =      \begin{bmatrix}            0  & 0  & 0 & 1 \\               & 0  & 0 & 0 \\               &    & 0 & 0 \\               &    &   & 0         \end{bmatrix},

and then (J - \lambda I)^4 = 0. In general, for an n \times n Jordan block J with eigenvalue \lambda, (J - \lambda I)^k is zero apart from a kth superdiagonal of ones for k\le n-1, and (J - \lambda I)^n = 0.

Let A have the Jordan canonical form A = Z JZ^{-1}, where J = \mathrm{diag}(J_1, \dots, J_k) and each J_i is an m_i\times m_i Jordan block with eigenvalue \lambda_i. The characteristic polynomial of A can be factorized as p(t) = (t-\lambda_1)^{m_1}(t-\lambda_2)^{m_2}\dots(t-\lambda_k)^{m_k}. Note that A^i = Z J^i Z^{-1} for all i, and hence q(A) = Z q(J) Z^{-1} for any polynomial q. Then

\notag  Z^{-1}p(A)Z = p(J)        = \mathrm{diag}\bigl( p(J_1), p(J_2), \dots, p(J_k) \bigr),

and p(J_i) is zero because it contains a factor (J_i - \lambda_i I\bigr)^{m_i} and this factor is zero, as noted above. Hence Z^{-1}p(A)Z = 0 and therefore p(A) = 0.

Second Proof

Recall that the adjugate \mathrm{adj}(B) of an n\times n matrix B is the transposed matrix of cofactors, where a cofactor is a signed sum of products of n-1 entries of B, and that B\mathrm{adj}(B) = \det(B)I. With B = tI - A, each entry of \mathrm{adj}(B) is a polynomial of degree n-1 in t, so B^{-1} = \mathrm{adj}(B)/\det(B) can be written

\notag    (tI - A)^{-1} = \displaystyle\frac{\mathrm{adj}(tI - A)}{\det(tI - A)}                  = \displaystyle\frac{t^{n-1}C_{n-1} + \cdots + tC_1 + C_0}{p(t)},

for some matrices C_{n-1}, \dots, C_0 not depending on t. Rearranging, we obtain

\notag   (tI - A) \bigl(t^{n-1}C_{n-1} + \cdots + tC_1 + C_0 \bigr) =  p(t)I,

and equating coefficients of t^n, …, t^0 gives

\notag \begin{aligned}   C_{n-1} &= I,\\   C_{n-2} - A C_{n-1} &= a_{n-1}I,\\          & \vdots\\   C_0 - A C_1 &= a_1 I,\\   - A C_0 &= a_0I. \end{aligned}

Premultiplying the first equation by A^n, the second by A^{n-1}, and so on, and adding, gives

\notag     0 = A^n + a_{n-1}A^{n-1} + \cdots + a_1 A + a_0 I = p(A),

as required. This proof is by Buchheim (1884).

Applications and Generalizations

A common use of the Cayley–Hamilton theorem is to show that A^{-1} is expressible as a linear combination of I, A, …, A^{n-1}. Indeed for a nonsingular A, p(A) = 0 implies that

\notag    A^{-1} = -\displaystyle\frac{1}{a_0}             \bigl( A^{n-1} + a_{n-1}A^{n-2} + \cdots + a_1 I \bigr),

since a_0 = \det(A) \ne 0.

Similarly, A^k for any k \ge n can be expressed as a linear combination of I, A, …, A^{n-1}. An interesting implication is that any matrix power series is actually a polynomial in the matrix. Thus the matrix exponential \mathrm{e}^A = I + A + A^2/2! + \cdots can be written \mathrm{e}^A = c_{n-1} A^{n-1} + \cdots + C_1A + c_0I for some scalars c_{n-1}, …, c_0. However, the c_i depend on A, which reduces the usefulness of the polynomial representation. A rare example of an explicit expression of this form is Rodrigues’s formula for the exponential of a skew-symmetric matrix A \in \mathbb{R}^{3\times 3}:

\notag      \mathrm{e}^A = I + \displaystyle\frac{\sin\theta}{\theta} A +                \displaystyle\frac{1-\cos\theta}{\theta^2} A^2,

where \theta = \sqrt{\|A\|_F^2/2}.

Cayley used the Cayley–Hamilton theorem to find square roots of a 2\times 2 matrix. If X^2 = A then applying the theorem to X gives X^2 - \mathrm{trace}(X)X + \det(X)I = 0, or

\notag \qquad\qquad\qquad\qquad   A - \mathrm{trace}(X)X + \det(X)I = 0, \qquad\qquad\qquad\qquad (*)

which gives

X = \displaystyle\frac{A + \det(X)I}{\mathrm{trace}(X)}.

Now \det(X) = \sqrt{\det(A)} and taking the trace in (*) gives an equation for \mathrm{trace}(X), leading to

\notag         X = \displaystyle\frac{ A + \sqrt{\det(A)} \,I}                  {\sqrt{\mathrm{trace}(A) + 2 \sqrt{\det(A)}}}.

With appropriate choices of signs for the square roots this formula gives all four square roots of A when A has distinct eigenvalues, but otherwise the formula can break down.

Expressions obtained from the Cayley–Hamilton theorem are of little practical use for general matrices, because algorithms that compute the coefficients a_i of the characteristic polynomial are typically numerically unstable.

The Cayley–Hamilton theorem has been generalized in various directions. The theorem can be interpreted as saying that the powers A^i for all nonnegative i generate a vector space of dimension at most n. Gerstenhaber (1961) proved that if A and B are two commuting n\times n matrices then the matrices A^iB^j, for all nonnegative i and j, generate a vector space of dimension at most n. It is conjectured that this result extends to three matrices.

Historical Note

The Cayley–Hamilton theorem appears in the 1858 memoir in which Cayley introduced matrix algebra. Cayley gave a proof for n = 2 and stated that he had verified the result for n = 3, adding “I have not thought it necessary to undertake the labour of a formal proof of the theorem in the general case of a matrix of any degree.” Hamilton had proved the result for quaternions in 1853. Cayley actually discovered a more general version of the Cayley–Hamilton theorem, which appears in an 1857 letter to Sylvester but not in any of his published work: if the square matrices A and B commute and f(x,y) = \det(x A - y B) then f(B,A) = 0.


Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

This entry was posted in what-is. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s