January 2023 – Nick Higham

The trace of an $n\times n$ matrix is the sum of its diagonal elements: $\mathrm{trace}(A) = \sum_{i=1}^n a_{ii}$ . The trace is linear, that is, $\mathrm{trace}(A+B) = \mathrm{trace}(A) + \mathrm{trace}(B)$ , and $\mathrm{trace}(A) = \mathrm{trace}(A^T)$ .

A key fact is that the trace is also the sum of the eigenvalues. The proof is by considering the characteristic polynomial $p(t) = \det(t I - A) = t^n + a_{n-1}t^{n-1} + \dots + a_1 t + a_0$ . The roots of $p$ are the eigenvalues $\lambda_1, \lambda_2, \dots, \lambda_n$ of $A$ , so $p$ can be factorized

$\notag p(t) = (t - \lambda_1) (t - \lambda_2) \dots (t - \lambda_n),$

and so $a_{n-1} = -(\lambda_1 + \lambda_2 + \cdots + \lambda_n)$ . The Laplace expansion of $\det(t I - A)$ shows that the coefficient of $t^{n-1}$ is $-(a_{11} + a_{22} + \cdots + a_{nn})$ . Equating these two expressions for $a_{n-1}$ gives

$\notag \mathrm{trace}(A) = \displaystyle\sum_{i=1}^n \lambda_i. \qquad\qquad(1)$

A consequence of (1) is that any transformation that preserves the eigenvalues preserves the trace. Therefore the trace is unchanged under similarity transformations: $\mathrm{trace}(X^{-1}AX) = \mathrm{trace}(A)$ for any nonsingular $X$ .

An an example of how the trace can be useful, suppose $A$ is a symmetric and orthogonal $n\times n$ matrix, so that its eigenvalues are $\pm 1$ . If there are $p$ eigenvalues $1$ and $q$ eigenvalues $-1$ then $\mathrm{trace}(A) = p - q$ and $n = p + q$ . Therefore $p = (n + \mathrm{trace}(A))/2$ and $q = (n - \mathrm{trace}(A))/2$ .

Another important property is that for an $m\times n$ matrix $A$ and an $n\times m$ matrix $B$ ,

$\notag \mathrm{trace}(AB) = \mathrm{trace}(BA) \qquad\qquad(2)$

(despite the fact that $AB \ne BA$ in general). The proof is simple:

$\notag \begin{aligned} \mathrm{trace}(AB) &= \displaystyle\sum_{i=1}^m (AB)_{ii} = \sum_{i=1}^m \sum_{k=1}^n a_{ik} b_{ki} = \sum_{k=1}^n \sum_{i=1}^m b_{ki} a_{ik} \\ & = \sum_{k=1}^n (BA)_{kk} = \mathrm{trace}(BA). \end{aligned}$

This simple fact can have non-obvious consequences. For example, consider the equation $AX - XA = I$ in $n\times n$ matrices. Taking the trace gives $0 = \mathrm{trace}(AX) - \mathrm{trace}(XA) = \mathrm{trace}(AX - XA) = \mathrm{trace}(I) = n$ , which is a contradiction. Therefore the equation has no solution.

The relation (2) gives $\mathrm{trace}(ABC) = \mathrm{trace}((AB)C) = \mathrm{trace}(C(AB)) = \mathrm{trace}(CAB)$ for $n\times n$ matrices $A$ , $B$ , and $C$ , that is,

$\notag \mathrm{trace}(ABC) = \mathrm{trace}(CAB). \qquad\qquad(3)$

So we can cyclically permute terms in a matrix product without changing the trace.

As an example of the use of (2) and (3), if $x$ and $y$ are $n$ -vectors then $\mathrm{trace}(xy^T) = \mathrm{trace}(y^Tx) = y^Tx$ . If $A$ is an $n\times n$ matrix then $\mathrm{trace}(xy^TA)$ can be evaluated without forming the matrix $xy^TA$ since, by (3), $\mathrm{trace}(xy^TA) = \mathrm{trace}(y^TAx) = y^TAx$ .

The trace is useful in calculations with the Frobenius norm of an $m\times n$ matrix:

$\notag \|A\|_F = \left(\displaystyle\sum_{i=1}^m \sum_{j=1}^n |a_{ij}|^2\right)^{1/2} = \bigr(\mathrm{trace}(A^*A)\bigr)^{1/2},$

where $*$ denotes the conjugate transpose. For example, we can generalize the formula $|x+\mathrm{i}y|^2 = x^2 + y^2$ for a complex number to an $m\times n$ matrix $A$ by splitting $A$ into its Hermitian and skew-Hermitian parts:

$\notag A = \frac{1}{2}(A+A^*) + \frac{1}{2}(A-A^*) \equiv B + C,$

where $B = B^*$ and $C = -C^*$ . Then

$\notag \begin{aligned} \|A\|_F^2 = \|B + C\|_F^2 &= \mathrm{trace}((B+C)^*(B+C))\\ &= \mathrm{trace}(B^*B + C^*C) + \mathrm{trace}(B^*C + C^*B)\\ &= \mathrm{trace}(B^*B + C^*C) + \mathrm{trace}(BC - CB)\\ &= \mathrm{trace}(B^*B + C^*C)\\ &= \|B\|_F^2 + \|C\|_F^2. \end{aligned}$

If a matrix is not explicitly known but we can compute matrix–vector products with it then the trace can be estimated by

$\notag \mathrm{trace}(A) \approx x^TAx,$

where the vector $x$ has elements independently drawn from the standard normal distribution with mean $0$ and variance $1$ . The expectation of this estimate is

$\notag \begin{aligned} E\bigl( x^TAx \bigr) &= E\bigl( \mathrm{trace}(x^TAx) \bigr) = E\bigl( \mathrm{trace}(Axx^T) \bigr) = \mathrm{trace} \bigl( E(Axx^T) \bigr) \\ &= \mathrm{trace}\bigl(A E(xx^T) \bigr) = \mathrm{trace}(A), \end{aligned}$

since $E(x_ix_j) = 0$ for $i\ne j$ and $E(x_i^2) = 1$ for all $i$ , so $E(xx^T) = I$ . This stochastic estimate, which is due to Hutchinson, is therefore unbiased.

References

Haim Avron and Sivan Toledo, Randomized Algorithms for Estimating the Trace of an Implicit Symmetric Positive Semi-definite Matrix, J. ACM 58, 8:1-8:34, 2011.

Related Blog Posts

What Is a Matrix Norm? (2021)
What Is an Eigenvalue? (2022)

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

A matrix $A \in\mathbb{C}^{n\times n}$ is diagonalizable if there exists a nonsingular matrix $X\in\mathbb{C}^{n\times n}$ such that $X^{-1}AX$ is diagonal. In other words, a diagonalizable matrix is one that is similar to a diagonal matrix.

The condition $X^{-1}AX = D = \mathrm{diag}(\lambda_i)$ is equivalent to $AX = XD$ with $X$ nonsingular, that is, $Ax_i = \lambda_ix_i$ , $i=1\colon n$ , where $X = [x_1,x_2,\dots, x_n]$ . Hence $A$ is diagonalizable if and only if it has a complete set of linearly independent eigenvectors.

A Hermitian matrix is diagonalizable because the eigenvectors can be taken to be mutually orthogonal. The same is true for a normal matrix (one for which $A^*A = AA^*$ ). A matrix with distinct eigenvalues is also diagonalizable.

Theorem 1.

If $A\in\mathbb{C}^{n\times n}$ has distinct eigenvalues then it is diagonalizable.

Proof. Let $A$ have eigenvalues $\lambda_1,\lambda_2,\dots,\lambda_n$ with corresponding eigenvectors $x_1,x_2,\dots,x_n$ . Suppose that $y = \sum_{i=1}^n \alpha_i x_i = 0$ for some $\alpha_1,\alpha_2,\dots,\alpha_n$ . Then

$\notag \begin{aligned} 0 &= (A-\lambda_2 I)\cdots (A-\lambda_n I)y = \sum_{i=1}^n \alpha_i (A-\lambda_2 I)\cdots (A-\lambda_n I) x_i\\ &= \sum_{i=1}^n \alpha_i (\lambda_i-\lambda_2)\cdots(\lambda_i-\lambda_n)x_i = \alpha_1 (\lambda_1-\lambda_2)\cdots(\lambda_1-\lambda_n)x_1, \end{aligned}$

which implies $\alpha_1 = 0$ since $\lambda_1\ne \lambda_j$ for $j\ge 2$ and $x_1\ne0$ . Premultiplying $y = \sum_{i=2}^n \alpha_i x_i = 0$ by $\prod_{j=3}^n(A-\lambda_j I)$ shows, in the same way, that $\alpha_2 = 0$ . Continuing in this way we find that $\alpha_1 = \alpha_2 = \cdots = \alpha_n = 0$ . Therefore the $x_i$ are linearly independent and hence $A$ is diagonalizable.

A matrix can have repeated eigenvalues and be diagonalizable, as diagonal matrices with repeated diagonal entries show. What is needed for diagonalizability is that every $k$ -times repeated eigenvalue has $k$ linearly independent eigenvectors associated with it. Equivalently, the algebraic and geometric multiplicities of every eigenvalue must be equal, that is, the eigenvalues must all be semisimple. Another equivalent condition is that the degree of the minimal polynomial is equal to the number of distinct eigenvalues.

The simplest example of a matrix that is not diagonalizable is $\bigl[\begin{smallmatrix} 0 & 1 \\ 0 & 0 \end{smallmatrix}\bigr]$ . This matrix is a $2\times 2$ Jordan block with the eigenvalue $0$ . Diagonalizability is easily understood in terms of the Jordan canonical form: $A$ is diagonalizable if and only if all the Jordan blocks in its Jordan form are $1\times 1$ .

Most matrices are diagonalizable, in the sense that the diagonalizable matrices are dense in $\mathbb{C}^{n\times n}$ , that is, any matrix in $\mathbb{C}^{n\times n}$ is arbitrarily close to a diagonalizable matrix. This property is useful because it can be convenient to prove a result by first proving it for diagonalizable matrices and then arguing that by continuity the result holds for a general matrix.

Is a rank- $1$ matrix $A = xy^*$ diagonalizable, where $x,y\in\mathbb{C}^n$ are nonzero? There are $n-1$ zero eigenvalues with eigenvectors any set of linearly independent vectors orthogonal to $y$ . If $y^*x \ne 0$ then $y^*x$ is the remaining eigenvalue, with eigenvector $x$ , which is linearly independent of the eigenvectors for $0$ , and $A$ is diagonalizable. If $y^*x = 0$ then all the eigenvalues of $A$ are zero and so $A$ cannot be diagonalizable, as the only diagonalizable matrix whose eigenvalues are all zero is the zero matrix. For the matrix $\bigl[\begin{smallmatrix} 0 & 1 \\ 0 & 0 \end{smallmatrix}\bigr]$ mentioned above, $x = \bigl[{1 \atop 0} \bigr]$ and $y = \bigl[{0 \atop 1} \bigr]$ , so $y^*x = 0$ , confirming that this matrix is not diagonalizable.