What Is the Gerstenhaber Problem?

When Cayley introduced matrix algebra in 1858, he did much more than merely arrange numbers in a rectangular array. His definitions of addition, multiplication, and inversion produced an algebraic structure that has proved to be immensely useful, and which still holds many mysteries today.

An n\times n matrix has n^2 parameters, so the vector space \mathbb{R}^{n\times n} of real matrices has dimension n^2, with a basis given by the matrices E_{ij} = e_i^{}e_j^T, where e_i is the vector that is zero except for a 1 in the ith entry. In his original paper, Cayley noticed the important property that the powers of a particular matrix A can never span \mathbb{R}^{n\times n}. The Cayley–Hamilton theorem says that A satisfies its own characteristic equation, that is, p(A) = 0 where p(t) = \det(t I - A) is the characteristic polynomial of A. This means that A^n can be expressed as a linear combination of I, A, …, A^{n-1}, so the powers of A span a vector space of dimension at most n.

Gerstenhaber proved a generalization of this property in 1961: if A and B are two commuting n\times n matrices then the matrices A^iB^j, for all nonnegative i and j, generate a vector space of dimension at most n. This result is much more difficult to prove than the Cayley–Hamilton theorem. Gerstenhaber’s proof was based on algebraic geometry, but purely matrix-theoretic proofs have been given.

A natural question, called the Gerstenhaber problem, is: does this result extend to three matrices, that is, does the vector space

S_n = \{\, A^iB^jC^k: 0\le i,j,k \le n-1 \,\}

have dimension at most n for any n\times n matrices A, B, and C that commute with each other? (We can limit the powers to n-1 by the Cayley–Hamilton theorem.) The problem is defined over any field, but here I focus on the reals.

Before considering the three matrix case let us note that the answer is known to be “no” for four commuting 4\times 4 matrices A, B, C, and D. Indeed let

A = e_1^{}e_3^T, \quad  B = e_1^{}e_4^T, \quad  C = e_2^{}e_3^T, \quad  D = e_2^{}e_4^T

and note that all possible products of two of these matrices are zero, so the matrices commute pairwise. Yet I = A^0, A, B, C, D are clearly five linearly independent matrices. Hence Gerstenhaber’s result does not extend to four matrices. It also does not extend to five or more matrices because it is known that the failure of the result for one value of n implies failure for all larger n. The question, then, is whether the three matrix case is like the two matrix case or the case for four or more matrices.

A great deal of effort has been put into proving or disproving the Gerstenhaber problem, so far without success. Here are two known facts.

  • The result holds for all n\le 11.
  • By a 1905 result of Schur, the dimension of S_n is at most 1 + \lfloor   n^2/4\rfloor, which is less than n^2 but still much bigger than n. (Here, \lfloor \cdot \rfloor is the floor function.)

One approach to investigating this problem is to look for a counterexample computationally. For some n\ge 12, choose three commuting n\times n matrices A, B, and C, select m\ge n monomials

X_i =  A^{i_p} B^{j_p} C^{k_p}, \quad 1\le p \le m,

form the matrix

Y = [\mathrm{vec}(X_1),             \mathrm{vec}(X_2), \dots,             \mathrm{vec}(X_m)],

where \mathrm{vec} converts a matrix into a vector by stacking the columns on top of each other, then compute \mathrm{rank}(Y), which is a lower bound on \mathrm{dim}(S_n), and check whether it is greater than n.

This simple-minded approach has some obvious difficulties. How do we choose A, B, and C? How do we choose the powers? How do we avoid overflow and underflow and compute a reliable rank, given that we might be dealing with large powers of large matrices?

Holbrook and O’Meara (2020), who have written several papers on the Gerstenhaber problem, which they call the GP problem, state that they “firmly believe the GP will turn out to have a negative answer” and they have developed a sophisticated approach to searching for a case with \mathrm{dim}(S_n) > n, which they call a “Eureka”. They first note that A, B and C can be assumed to be nilpotent. This means that all three matrices must be defective, because a nondefective nilpotent matrix is zero. Next they note that since commuting matrices are simultaneously unitarily triangularizable, A, B, and C can be assumed to be strictly upper triangular. Then they note that A can be assumed to be in Weyr canonical form.

The Weyr canonical form is a dual of the Jordan canonical form in which the Jordan matrix is replaced by a Weyr matrix, which is a direct sum of basic Weyr matrices. A basic Weyr matrix has one distinct eigenvalue and is upper block bidiagonal with diagonal blocks that are multiples of the identity matrix and superdiagonal blocks that are rectangular identity matrices. The difference between Jordan and Weyr matrices is illustrated by the example

J(\lambda) =     \left[\begin{array}{ccc|cc}     \lambda &  1      & 0 &&\\     0       & \lambda & 1 &&\\     0       & 0       & \lambda &&\\\hline             &         &   &\lambda & 1\\             &         &   & 0 &\lambda      \end{array}\right], \quad     W(\lambda) =     \left[\begin{array}{cc|cc|c}     \lambda &  0      & 1 & 0&\\     0       & \lambda & 0 & 1&\\\hline             &         & \lambda & 0 &1\\             &         & 0  &\lambda & 0\\\hline             &         &   & 0 &\lambda      \end{array}\right],

where the Jordan matrix J(\lambda) and Weyr matrix W(\lambda) are related by J(\lambda) = P^TW(\lambda)P for some permutation matrix P. There is an elegant way of relating the block sizes in the Jordan and Weyr matrices via a Young diagram. Form an array whose kth column contains j_k dots, where j_k is the size of the jth diagonal block of J:

\begin{array}{c|ccc}         &  j_1 & j_2\\\hline   w_1 &  \bullet & \bullet\\   w_2 &  \bullet & \bullet\\   w_3 &  \bullet & \end{array}

Then the number of dots in the kth row is the size of the kth diagonal block in the Weyr form.

The reason for using the Weyr form is that whereas any matrix that commutes with a Jordan matrix has Toeplitz blocks, any matrix that commutes with a Weyr matrix is block upper triangular and is uniquely determined by the first block row. By choosing a Weyr form for A, commuting matrices B and C can be built up in a systematic way.

Thanks to a result of O’Meara (2020) it suffices to compute modulo a prime p, so the computations can be done in exact arithmetic, removing the need to worry about rounding errors or overflow.

Holbrook and O’Meara have mostly tried matrices up to dimension 50, but they feel that “the Loch Ness monster probably lives in deeper water, closer to 100\times 100”. Their MATLAB codes (40 nicely commented but not optimized M-files) are available on request from the address given in their preprint. If you have the time to spare you might want to experiment with the codes and try to find a Eureka.


This is a minimal set of references, which contain further useful references within.

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

This entry was posted in what-is. Bookmark the permalink.

4 Responses to What Is the Gerstenhaber Problem?

  1. Rob Craigen says:

    Hello Nick. I confess I do not like your counterexample of 4 matrices. It seems to me the only thing going on in it is that we have a convention which breaks some of the patterns of algebra in which we assign the value of X^0 to be I, the identity matrix, when X is a nonzero square matrix. I should think this ought to be explicitly excluded from consideration here because this convention does not respect the nature of the properties under consideration. One does not derive that X^0 = I from matrix algebra — it is stipulated by fiat because it is consistent with other algebraic facts.

    The same would be true for sets of 3 matrices except for the combinatorial constraints on matrices of order 0, which is a “gotcha” on sloppily oversimplified statements of the result. I don’t know the original statement but perhaps it is the original statement which is oversimplified and doesn’t anticipate such degenerate constructions.

    • Nick Higham says:

      The fact that $A^0 = I$ is mentioned in Cayley’s original 1858 paper on matrix algebra. It must hold, as a consequence of the definition of inverse and index laws: $I = A^{-1} A = A^{-1 +1} = A^0$.

  2. J. M. says:

    Shouldn’t the $(3,3)$ entry in $J(\lambda)$ be $\lambda$ and not $1$?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s