# What Is the Trace of a Matrix?

The trace of an $n\times n$ matrix is the sum of its diagonal elements: $\mathrm{trace}(A) = \sum_{i=1}^n a_{ii}$. The trace is linear, that is, $\mathrm{trace}(A+B) = \mathrm{trace}(A) + \mathrm{trace}(B)$, and $\mathrm{trace}(A) = \mathrm{trace}(A^T)$.

A key fact is that the trace is also the sum of the eigenvalues. The proof is by considering the characteristic polynomial $p(t) = \det(t I - A) = t^n + a_{n-1}t^{n-1} + \dots + a_1 t + a_0$. The roots of $p$ are the eigenvalues $\lambda_1, \lambda_2, \dots, \lambda_n$ of $A$, so $p$ can be factorized

$\notag p(t) = (t - \lambda_1) (t - \lambda_2) \dots (t - \lambda_n),$

and so $a_{n-1} = -(\lambda_{11} + \lambda_{22} + \cdots + \lambda_{nn})$. The Laplace expansion of $\det(t I - A)$ shows that the coefficient of $t^{n-1}$ is $-(a_{11} + a_{22} + \cdots + a_{nn})$. Equating these two expressions for $a_{n-1}$ gives

$\notag \mathrm{trace}(A) = \displaystyle\sum_{i=1}^n \lambda_i. \qquad\qquad(1)$

A consequence of (1) is that any transformation that preserves the eigenvalues preserves the trace. Therefore the trace is unchanged under similarity transformations: $\mathrm{trace}(X^{-1}AX) = \mathrm{trace}(A)$ for any nonsingular $X$.

An an example of how the trace can be useful, suppose $A$ is a symmetric and orthogonal $n\times n$ matrix, so that its eigenvalues are $\pm 1$. If there are $p$ eigenvalues $1$ and $q$ eigenvalues $-1$ then $\mathrm{trace}(A) = p - q$ and $n = p + q$. Therefore $p = (n + \mathrm{trace}(A))/2$ and $q = (n - \mathrm{trace}(A))/2$.

Another important property is that for an $m\times n$ matrix $A$ and an $n\times m$ matrix $B$,

$\notag \mathrm{trace}(AB) = \mathrm{trace}(BA) \qquad\qquad(2)$

(despite the fact that $AB \ne BA$ in general). The proof is simple:

\notag \begin{aligned} \mathrm{trace}(AB) &= \displaystyle\sum_{i=1}^m (AB)_{ii} = \sum_{i=1}^m \sum_{k=1}^n a_{ik} b_{ki} = \sum_{k=1}^n \sum_{i=1}^m b_{ki} a_{ik} \\ & = \sum_{k=1}^n (BA)_{kk} = \mathrm{trace}(BA). \end{aligned}

This simple fact can have non-obvious consequences. For example, consider the equation $AX - XA = I$ in $n\times n$ matrices. Taking the trace gives $0 = \mathrm{trace}(AX) - \mathrm{trace}(XA) = \mathrm{trace}(AX - XA) = \mathrm{trace}(I) = n$, which is a contradiction. Therefore the equation has no solution.

The relation (2) gives $\mathrm{trace}(ABC) = \mathrm{trace}((AB)C) = \mathrm{trace}(C(AB)) = \mathrm{trace}(CAB)$ for $n\times n$ matrices $A$, $B$, and $C$, that is,

$\notag \mathrm{trace}(ABC) = \mathrm{trace}(CAB). \qquad\qquad(3)$

So we can cyclically permute terms in a matrix product without changing the trace.

As an example of the use of (2) and (3), if $x$ and $y$ are $n$-vectors then $\mathrm{trace}(xy^T) = \mathrm{trace}(y^Tx) = y^Tx$. If $A$ is an $n\times n$ matrix then $\mathrm{trace}(xy^TA)$ can be evaluated without forming the matrix $xy^TA$ since, by (3), $\mathrm{trace}(xy^TA) = \mathrm{trace}(y^TAx) = y^TAx$.

The trace is useful in calculations with the Frobenius norm of an $m\times n$ matrix:

$\notag \|A\|_F = \left(\displaystyle\sum_{i=1}^m \sum_{j=1}^n |a_{ij}|^2\right)^{1/2} = \bigr(\mathrm{trace}(A^*A)\bigr)^{1/2},$

where $*$ denotes the conjugate transpose. For example, we can generalize the formula $|x+\mathrm{i}y|^2 = x^2 + y^2$ for a complex number to an $m\times n$ matrix $A$ by splitting $A$ into its Hermitian and skew-Hermitian parts:

$\notag A = \frac{1}{2}(A+A^*) + \frac{1}{2}(A-A^*) \equiv B + C,$

where $B = B^*$ and $C = -C^*$. Then

\notag \begin{aligned} \|A\|_F^2 = \|B + C\|_F^2 &= \mathrm{trace}((B+C)^*(B+C))\\ &= \mathrm{trace}(B^*B + C^*C) + \mathrm{trace}(B^*C + C^*B)\\ &= \mathrm{trace}(B^*B + C^*C) + \mathrm{trace}(BC - CB)\\ &= \mathrm{trace}(B^*B + C^*C)\\ &= \|B\|_F^2 + \|C\|_F^2. \end{aligned}

If a matrix is not explicitly known but we can compute matrix–vector products with it then the trace can be estimated by

$\notag \mathrm{trace}(A) \approx x^TAx,$

where the vector $x$ has elements independently drawn from the standard normal distribution with mean $0$ and variance $1$. The expectation of this estimate is

\notag \begin{aligned} E\bigl( x^TAx \bigr) &= E\bigl( \mathrm{trace}(x^TAx) \bigr) = E\bigl( \mathrm{trace}(Axx^T) \bigr) = \mathrm{trace} \bigl( E(Axx^T) \bigr) \\ &= \mathrm{trace}\bigl(A E(xx^T) \bigr) = \mathrm{trace}(A), \end{aligned}

since $E(x_ix_j) = 0$ for $i\ne j$ and $E(x_i^2) = 1$ for all $i$, so $E(xx^T) = I$. This stochastic estimate, which is due to Hutchinson, is therefore unbiased.

## Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

# What Is a Diagonalizable Matrix?

A matrix $A \in\mathbb{C}^{n\times n}$ is diagonalizable if there exists a nonsingular matrix $X\in\mathbb{C}^{n\times n}$ such that $X^{-1}AX$ is diagonal. In other words, a diagonalizable matrix is one that is similar to a diagonal matrix.

The condition $X^{-1}AX = D = \mathrm{diag}(\lambda_i)$ is equivalent to $AX = XD$ with $X$ nonsingular, that is, $Ax_i = \lambda_ix_i$, $i=1\colon n$, where $X = [x_1,x_2,\dots, x_n]$. Hence $A$ is diagonalizable if and only if it has a complete set of linearly independent eigenvectors.

A Hermitian matrix is diagonalizable because the eigenvectors can be taken to be mutually orthogonal. The same is true for a normal matrix (one for which $A^*A = AA^*$). A matrix with distinct eigenvalues is also diagonalizable.

Theorem 1.

If $A\in\mathbb{C}^{n\times n}$ has distinct eigenvalues then it is diagonalizable.

Proof. Let $A$ have eigenvalues $\lambda_1,\lambda_2,\dots,\lambda_n$ with corresponding eigenvectors $x_1,x_2,\dots,x_n$. Suppose that $y = \sum_{i=1}^n \alpha_i x_i = 0$ for some $\alpha_1,\alpha_2,\dots,\alpha_n$. Then

\notag \begin{aligned} 0 &= (A-\lambda_2 I)\cdots (A-\lambda_n I)y = \sum_{i=1}^n \alpha_i (A-\lambda_2 I)\cdots (A-\lambda_n I) x_i\\ &= \sum_{i=1}^n \alpha_i (\lambda_i-\lambda_2)\cdots(\lambda_i-\lambda_n)x_i = \alpha_1 (\lambda_1-\lambda_2)\cdots(\lambda_1-\lambda_n)x_1, \end{aligned}

which implies $\alpha_1 = 0$ since $\lambda_1\ne \lambda_j$ for $j\ge 2$ and $x_1\ne0$. Premultiplying $y = \sum_{i=2}^n \alpha_i x_i = 0$ by $\prod_{j=3}^n(A-\lambda_j I)$ shows, in the same way, that $\alpha_2 = 0$. Continuing in this way we find that $\alpha_1 = \alpha_2 = \cdots = \alpha_n = 0$. Therefore the $x_i$ are linearly independent and hence $A$ is diagonalizable.

A matrix can have repeated eigenvalues and be diagonalizable, as diagonal matrices with repeated diagonal entries show. What is needed for diagonalizability is that every $k$-times repeated eigenvalue has $k$ linearly independent eigenvectors associated with it. Equivalently, the algebraic and geometric multiplicities of every eigenvalue must be equal, that is, the eigenvalues must all be semisimple. Another equivalent condition is that the degree of the minimal polynomial is equal to the number of distinct eigenvalues.

The simplest example of a matrix that is not diagonalizable is $\bigl[\begin{smallmatrix} 0 & 1 \\ 0 & 0 \end{smallmatrix}\bigr]$. This matrix is a $2\times 2$ Jordan block with the eigenvalue $0$. Diagonalizability is easily understood in terms of the Jordan canonical form: $A$ is diagonalizable if and only if all the Jordan blocks in its Jordan form are $1\times 1$.

Most matrices are diagonalizable, in the sense that the diagonalizable matrices are dense in $\mathbb{C}^{n\times n}$, that is, any matrix in $\mathbb{C}^{n\times n}$ is arbitrarily close to a diagonalizable matrix. This property is useful because it can be convenient to prove a result by first proving it for diagonalizable matrices and then arguing that by continuity the result holds for a general matrix.

Is a rank-$1$ matrix $A = xy^*$ diagonalizable, where $x,y\in\mathbb{C}^n$ are nonzero? There are $n-1$ zero eigenvalues with eigenvectors any set of linearly independent vectors orthogonal to $y$. If $y^*x \ne 0$ then $y^*x$ is the remaining eigenvalue, with eigenvector $x$, which is linearly independent of the eigenvectors for $0$, and $A$ is diagonalizable. If $y^*x = 0$ then all the eigenvalues of $A$ are zero and so $A$ cannot be diagonalizable, as the only diagonalizable matrix whose eigenvalues are all zero is the zero matrix. For the matrix $\bigl[\begin{smallmatrix} 0 & 1 \\ 0 & 0 \end{smallmatrix}\bigr]$ mentioned above, $x = \bigl[{1 \atop 0} \bigr]$ and $y = \bigl[{0 \atop 1} \bigr]$, so $y^*x = 0$, confirming that this matrix is not diagonalizable.

## Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

# What Is a Stochastic Matrix?

A stochastic matrix is an $n\times n$ matrix with nonnegative entries and unit row sums. If $A\in\mathbb{R}^{n\times n}$ is stochastic then $Ae = e$, where $e = [1,1,\dots,1]^T$ is the vector of ones. This means that $e$ is an eigenvector of $A$ corresponding to the eigenvalue $1$.

The identity matrix is stochastic, as is any permutation matrix. Here are some other examples of stochastic matrices:

\notag \begin{aligned} A_n &= n^{-1}ee^T, \quad \textrm{in particular}~~ A_3 = \begin{bmatrix} \frac{1}{3} & \frac{1}{3} & \frac{1}{3}\\[3pt] \frac{1}{3} & \frac{1}{3} & \frac{1}{3}\\[3pt] \frac{1}{3} & \frac{1}{3} & \frac{1}{3} \end{bmatrix}, \qquad (1)\\ B_n &= \frac{1}{n-1}(ee^T -I), \quad \textrm{in particular}~~ B_3 = \begin{bmatrix} 0 & \frac{1}{2} & \frac{1}{2}\\[2pt] \frac{1}{2} & 0 & \frac{1}{2}\\[2pt] \frac{1}{2} & \frac{1}{2} & 0 \end{bmatrix}, \qquad (2)\\ C_n &= \begin{bmatrix} 1 & & & \\ \frac{1}{2} & \frac{1}{2} & & \\ \vdots & \vdots &\ddots & \\ \frac{1}{n} & \frac{1}{n} &\cdots & \frac{1}{n} \end{bmatrix}. \qquad (3)\\ \end{aligned}

For any matrix $A$, the spectral radius $\rho(A) = \max\{ |\lambda| : \lambda \mathrm{~is~an~eigenvalue~of~} A \}$ is bounded by $\rho(A) \le \|A\|$ for any norm. For a stochastic matrix, taking the $\infty$-norm (the maximum row sum of absolute values) gives $\rho(A) \le 1$, so since we know that $1$ is an eigenvalue, $\rho(A) = 1$. It can be shown that $1$ is a semisimple eigenvalue, that is, if there are $k$ eigenvalues equal to $1$ then there are $k$ linearly independent eigenvectors corresponding to $1$ (Meyer, 2000, p. 696).

A lower bound on the spectral radius can be obtained from Gershgorin’s theorem. The $i$th Gershgorin disc is defined by $|\lambda - a_{ii}| \le \sum_{j\ne i} |a_{ij}| = 1 - a_{ii}$, which implies $|\lambda| \ge 2a_{ii}-1$. Every eigenvalue $\lambda$ lies in the union of the $n$ discs and so must satisfy

$\notag 2\min_i a_{ii}-1 \le |\lambda| \le 1.$

The lower bound is positive if $A$ is strictly diagonally dominant by rows.

If $A$ and $B$ are stochastic then $AB$ is nonnegative and $ABe = Ae = e$, so $AB$ is stochastic. In particular, any positive integer power of $A$ is stochastic. Does $A^k$ converge as $k\to\infty$? The answer is that it does, and the limit is stochastic, as long as $1$ is the only eigenvalue of modulus $1$, and this will be the case if all the elements of $A$ are positive (by Perron’s theorem). The simplest example of non-convergence is the stochastic matrix

$\notag A = \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix},$

which has eigenvalues $1$ and $-1$. Since $A^2 = I$, all even powers are equal to $I$ and all odd powers are equal to $A$. For the matrix (1), $A_n^k = A_n$ for all $k$, while for (2), $B_n^k \to A_n$ as $k\to\infty$. For (3), $C_n^k$ converges to the matrix with $1$ in every entry of the first column and zeros everywhere else.

Stochastic matrices arise in Markov chains. A transition matrix for a finite homogeneous Markov chain is a matrix whose $(i,j)$ element is the probability of moving from state $i$ to state $j$ over a time step. It has nonnegative entries and the rows sum to $1$, so it is a stochastic matrix. In applications including finance and healthcare, a transition matrix may be estimated for a certain time period, say one year, but a transition matrix for a shorter period, say one month, may be needed. If $A$ is a transition matrix for a time period $p$ then a stochastic $p$th root of $A$, which is a stochastic solution $X$ of the equation $X^p = A$, is a transition matrix for a time period a factor $p$ smaller. Therefore $p = 12$ (years to months) and $p = 7$ (weeks to days) are among the values of interest. Unfortunately, a stochastic $p$th root may not exist. For example, the matrix

$\notag \begin{bmatrix} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 0 & 0 & 1 \end{bmatrix}$

has no $p$th roots at all, let alone stochastic ones. Yet many stochastic matrices do have stochastic roots. The matrix (3) has a stochastic $p$th root for all $p$, as shown by Higham and Lin (2011), who give a detailed analysis of $p$th roots of stochastic matrices. For example, to four decimal places,

$\notag C_6^{1/7} = \begin{bmatrix} 1.000 & & & & & \\ 0.094 & 0.906 & & & & \\ 0.043 & 0.102 & 0.855 & & & \\ 0.027 & 0.050 & 0.103 & 0.820 & & \\ 0.019 & 0.032 & 0.052 & 0.103 & 0.795 & \\ 0.014 & 0.023 & 0.034 & 0.053 & 0.102 & 0.774 \\ \end{bmatrix}.$

A stochastic matrix is sometime called a row-stochastic matrix to distinguish it from a column-stochastic matrix, which is a nonnegative matrix for which $e^TA = e^T$ (so that $A^T$ is row-stochastic). A matrix that is both row-stochastic and column-stochastic is called doubly stochastic. A permutation matrix is an example of a doubly stochastic matrix. If $U$ is a unitary matrix then the matrix with $a_{ij} = |u_{ij}|^2$ is doubly stochastic. A magic square scaled by the magic sum is also doubly stochastic. For example,

>> M = magic(4), A = M/sum(M(1,:)) % A is doubly stochastic.
M =
16     2     3    13
5    11    10     8
9     7     6    12
4    14    15     1
A =
4.7059e-01   5.8824e-02   8.8235e-02   3.8235e-01
1.4706e-01   3.2353e-01   2.9412e-01   2.3529e-01
2.6471e-01   2.0588e-01   1.7647e-01   3.5294e-01
1.1765e-01   4.1176e-01   4.4118e-01   2.9412e-02
>> [sum(A) sum(A')]
ans =
1     1     1     1     1     1     1     1
>> eig(A)'
ans =
1.0000e+00   2.6307e-01  -2.6307e-01   8.5146e-18


## Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

# What Is the Inertia of a Matrix?

The inertia of a real symmetric $n\times n$ matrix $A$ is a triple, written $\mathrm{In}(A) = (i_+(A),i_-(A),i_0(A))$, where $i_+(A)$ is the number of positive eigenvalues of $A$, $i_-(A)$ is the number of negative eigenvalues of $A$, and $i_0(A)$ is the number of zero eigenvalues of $A$.

The rank of $A$ is $i_+(A) + i_-(A)$. The difference $i_+(A) - i_-(A)$ is called the signature.

In general it is not possible to determine the inertia by inspection, but some deductions can be made. If $A$ has both positive and negative diagonal elements then $i_+(A) > 1$ and $i_-(A) > 1$. But in general the diagonal elements do not tell us much about the inertia. For example, here is a matrix that has positive diagonal elements but only one positive eigenvalue (and this example works for any $n$):

>> n = 4; A = -eye(n) + 2*ones(n), eigA = eig(sym(A))'
A =
1     2     2     2
2     1     2     2
2     2     1     2
2     2     2     1
eigA =
[-1, -1, -1, 7]


A congruence transformation of a symmetric matrix $A$ is a transformation $A \to X^T\!AX$ for a nonsingular matrix $X$. The result is clearly symmetric. Sylvester’s law of inertia (1852) says that the inertia is preserved under congruence transformations.

Theorem 1 (Sylvester’s law of inertia).

If $A\in\mathbb{R}^{n\times n}$ is symmetric and $X\in\mathbb{R}^{n\times n}$ is nonsingular then $\mathrm{In}(A) = \mathrm{In}(X^T\!AX)$.

Sylvester’s law gives a way to determine the inertia without computing eigenvalues: find a congruence transformation that transforms $A$ to a matrix whose inertia can be easily determined. A factorization $PAP^T = LDL^T$ does the job, where $P$ is a permutation matrix, $L$ is unit lower triangular, and $D$ is diagonal Then $\mathrm{In}(A) = \mathrm{In}(D)$, and $\mathrm{In}(D)$ can be read off the diagonal of $D$. This factorization does not always exist, and if it does exist is can be numerically unstable. A block $LDL^T$ factorization, in which $D$ is block diagonal with diagonal blocks of size $1$ or $2$, always exists, and its computation is numerically stable with a suitable pivoting strategy such as symmetric rook pivoting.

For the matrix above we can compute a block $LDL^T$ factorization using the MATLAB ldl function:

>> [L,D,P] = ldl(A); D
D =
1.0000e+00   2.0000e+00            0            0
2.0000e+00   1.0000e+00            0            0
0            0  -1.6667e+00            0
0            0            0  -1.4000e+00


Since the leading 2-by-2 block of $D$ has negative determinant and hence one positive eigenvalue and one negative eigenvalue, it follows that $A$ has one positive eigenvalue and three negative eigenvalues.

A congruence transformation preserves the signs of the eigenvalues but not their magnitude. A result of Ostrowski (1959) bounds the ratios of the eigenvalues of the original and transformed matrices. Let the eigenvalues of a symmetric matrix be ordered $\lambda_n \le \cdots \le \lambda_1$.

Theorem (Ostrowski).

For a symmetric $A\in \mathbb{R}^{n\times n}$ and $X\in\mathbb{R}^{n\times n}$,

$\lambda_k(X^*AX) = \theta_k \lambda_k(A), \quad k=1\colon n,$

where $\lambda_n(X^*X) \le \theta_k \le \lambda_1(X^*X)$.

The theorem shows that the further $X$ is from being orthogonal the greater the potential change in the eigenvalues.

Finally, we note that everything here generalizes to complex Hermitian matrices by replacing transpose by conjugate transpose.

## Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

# What Is Gershgorin’s Theorem?

For a given $n\times n$ matrix, Gershgorin’s theorem defines $n$ discs in the complex plane whose union contains the eigenvalues of the matrix. The theorem can provide approximations to eigenvalues. It can also provide qualitative information, such as that all the eigenvalues lie in a particular half-plane.

Theorem 1 (Gershgorin’s theorem).

The eigenvalues of $A\in\mathbb{C}^{n\times n}$ lie in the union of the $n$ discs in the complex plane

$\notag D_i = \Big\{ z\in\mathbb{C}: |z-a_{ii}| \le \displaystyle\sum_{j\ne i} |a_{ij}|\Big\}, \quad i=1\colon n.$

Proof. Let $\lambda$ be an eigenvalue of $A$ and $x$ a corresponding eigenvector and let $|x_k| = \|x\|_\infty$. From the $k$th equation in $Ax=\lambda x$ we have

$\notag \sum_{j=1 \atop j\ne k}^n a_{kj}x_j = (\lambda-a_{kk}) x_k.$

Hence

$\notag |\lambda-a_{kk}| \le \sum_{j=1 \atop j\ne k}^n |a_{kj}||x_j|/|x_k|,$

and since $|x_j|/|x_k|\le 1$ it follows that $\lambda$ belongs to the $k$th disc, $D_k$.

The Gershgorin discs $D_i$ are defined in terms of a summation over the rows of $A$, but since the eigenvalues of $A$ are the same as those of $A^T$ the same result holds with summation over the columns.

A consequence of the theorem is that if $0$ does not belong to any of the Gershgorin discs then $A$ is nonsingular. This is equivalent to the well-known result that a strictly diagonally dominant matrix is nonsingular.

Another consequence of the theorem is that if $A$ is symmetric and all the discs lie in the open right half-plane then the eigenvalues are positive and so $A$ is positive definite. This condition is equivalent to $A$ having positive diagonal elements and being strictly diagonally dominant.

The Gershgorin discs for the $5\times 5$ matrix

$\notag \left[\begin{array}{ccccc} 5/4 & 1 & 3/4 & 1/2 & 1/4 \\ 1 & 0 & 0 & 0 & 0\\ -1 & 1 & 0 & 0 & 0\\ 0 & 0 & 1 & 3 & 0\\ 0 & 0 & 0 & 1/2 & 5 \end{array}\right]$

are shown here:

The eigenvalues—three real and one complex conjugate pair—are the black dots. It happens that each disc contains an eigenvalue, but this is not always the case. For the matrix

$\notag \left[\begin{array}{cc} 2 & -1\\ 2 & 0 \end{array}\right]$

the discs are

and the blue disc does not contain an eigenvalue. The next result, which is proved by a continuity argument, provides additional information that increases the utility of Gershgorin’s theorem. In particular it says that if a disc is disjoint from the other discs then it contains an eigenvalue.

Theorem 2.

If $k$ of the Gershgorin discs of $A\in\mathbb{C}^{n\times n}$ are disjoint from the other discs then their union contains $k$ eigenvalues of $A$.

Theorem 2 tells us that the rightmost disc in our $5\times 5$ example contains one eigenvalue, $\lambda$, since it is disjoint from the other discs, and the union of the other four discs contains four eigenvalues. Furthermore, $\lambda$ must be real, because if not it occurs in a complex conjugate pair since the matrix is real, and as the disc is symmetric about the real axis $\overline{\lambda}$ would also lie in the disc, contradicting Theorem 2.

Gershgorin’s theorem is most useful for matrices that are close to being diagonal. A technique that can produce improved eigenvalue estimates is to apply the theorem to $D^{-1}AD$ for some nonsingular diagonal matrix $D$. This similarity transformation does not change the eigenvalues but it does change the discs, and the aim is to choose $D$ to reduce the radiuses of the discs. Consider our $5\times 5$ example. We know that there is one eigenvalue $\lambda$ in the rightmost disc and that it is real, so $4.5 \le \lambda \le 5.5$. For $D = \mathrm{diag}(1,1,1,1,1/4)$ the rightmost disc shrinks and remains distinct from the others and we obtain the sharper bounds $4.875 \le \lambda \le 5.126$. The discs for $D^{-1}AD$ are shown here:

## Notes

Most books on matrix analysis or numerical linear algebra include Gershgorin’s theorem.

Eigenvalue inclusion regions have been developed with discs replaced by more complicated shapes, such as Brauer’s ovals of Cassini.

Varga’s 2004 book is devoted to Gershgorin’s theorem and related results. It reproduces Gershgorin’s 1931 paper in an appendix.

## References

• S. Gerschgorin. Uber die Abgrenzung der Eigenwerte einer Matrix. Izv. Akad. Nauk. SSSR, 1:749–754, 1931.
• Richard S. Varga. Geršgorin and His Circles. Springer-Verlag, Berlin, Germany, 2004.

## Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

# What Is a Nilpotent Matrix?

An $n\times n$ matrix $A$ is nilpotent if $A^k =0$ for some positive integer $k$. A nonzero nilpotent matrix must have both positive and negative entries in order for cancellation to take place in the matrix powers. The smallest $k$ for which $A^k =0$ is called the index of nilpotency. The index does not exceed $n$, as we will see below.

Here are some examples of nilpotent matrices.

\notag \begin{aligned} A_1 &= \begin{bmatrix}0 & 1 \\ 0 & 0 \end{bmatrix}, \quad A_1^2=0,\\ A_2 &= \begin{bmatrix}0 & 1 & 1\\ 0 & 0 & 1\\ 0 & 0 & 0 \end{bmatrix}, \quad A_2^3 = 0,\\ A_3 &= \begin{bmatrix}1 & -1 \\ 1 & -1 \end{bmatrix}, \quad A_3^2 = 0,\\ A_4 &= \begin{bmatrix}2 & 2 & 4\\ -4 & -4 & -8\\ 1 & 1 & 2 \end{bmatrix}, \quad A_4^2 = 0. \end{aligned}

Matrix $A_1$ is the $2\times 2$ instance of the upper bidiagonal $p\times p$ matrix

$\notag N = \begin{bmatrix} 0 & 1 & & \\ & 0 & \ddots & \\ & & \ddots & 1 \\ & & & 0 \end{bmatrix}, \qquad (1)$

for which

$\notag N^2 = \begin{bmatrix} 0 & 0 & 1 & & \\ & 0 & \ddots & \ddots & \\ & & & \ddots & 1 \\ & & & \ddots & 0 \\ & & & & 0 \end{bmatrix}, \quad \dots, \quad N^{p-1} = \begin{bmatrix} 0 & 0 & \dots & 0 & 1 \\ & 0 & \ddots & & 0 \\ & & \ddots & \ddots & \vdots \\ & & & 0 & 0 \\ & & & & 0 \end{bmatrix}$

and $N^p = 0$. The superdiagonal of ones moves up to the right with each increase in the index of the power until it disappears off the top right corner of the matrix.

Matrix $A_4$ has rank $1$ and was constructed using a general formula: if $A = xy^T$ with $y^Tx = 0$ then $A^2 = xy^T xy^T = (y^Tx) xy^T = 0$. We simply took orthogonal vectors $x =[2, -4, 1]^T$ and $y = [1, 1, 2]^T$.

If $A$ is nilpotent then every eigenvalue is zero, since $Ax = \lambda x$ with $x\ne 0$ implies $0 = A^nx = \lambda^n x$ or $\lambda = 0$. Consequently, the trace and determinant of a nilpotent matrix are both zero.

If $A$ is nilpotent and Hermitian or symmetric, or more generally normal ($A^*A = AA^*$), then $A = 0$, since such a matrix has a spectral decomposition $A = Q \mathrm{diag}(\lambda_i)Q^*$ and the matrix $\mathrm{diag}(\lambda_i)$ is zero. It is only for nonnormal matrices that nilpotency is a nontrivial property, and the best way to understand it is with the Jordan canonical form (JCF). The JCF of a matrix with only zero eigenvalues has the form $A = XJX^{-1}$, where $J = \mathrm{diag}(J_{m_1},J_{m_2}, \dots, J_{m_p})$, where $J_{m_i}$ is of the form (1) and hence $J_{m_i}^{m_i} = 0$. It follows that the index of nilpotency is $k = \max\{\,m_i : i=1\colon p\,\} \le n$.

What is the rank of an $n\times n$ nilpotent matrix $A$? The minimum possible rank is $0$, attained for the zero matrix. The maximum possible rank is $n-1$, attained when the JCF of $A$ has just one Jordan block of size $n$. Any rank between $0$ and $n-1$ is possible: rank $j$ is attained when there is a Jordan block of size $j+1$ and all other blocks are $1\times 1$.

Finally, while a nilpotent matrix is obviously not invertible, like every matrix it has a Moore–Penrose pseudoinverse. The pseudoinverse of a Jordan block with eigenvalue zero is just the transpose of the block: $N^+ = N^T$ for $N$ in (1).

## Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

# What Is an Eigenvalue?

An eigenvalue of a square matrix $A$ is a scalar $\lambda$ such that $Ax = \lambda x$ for some nonzero vector $x$. The vector $x$ is an eigenvector of $A$ and it has the distinction of being a direction that is not changed on multiplication by $A$.

An $n\times n$ matrix has $n$ eigenvalues. This can be seen by noting that $Ax = \lambda x$ is equivalent to $(\lambda I - A) x = 0$, which means that $\lambda I - A$ is singular, since $x\ne 0$. Hence $\det(\lambda I - A) = 0$. But

$\notag \det(\lambda I - A) = \lambda^n + a_{n-1}\lambda^{n-1} + \cdots + a_1 \lambda + a_0$

is a scalar polynomial of degree $n$ (the characteristic polynomial of $A$) with nonzero leading coefficient and so has $n$ roots, which are the eigenvalues of $A$. Since $\det(\lambda I - A) = \det( (\lambda I - A)^T) = \det(\lambda I - A^T)$, the eigenvalues of $A^T$ are the same as those of $A$.

A real matrix may have complex eigenvalues, but they appear in complex conjugate pairs. Indeed $Ax = \lambda x$ implies $\overline{A}\overline{x} = \overline{\lambda} \overline{x}$, so if $A$ is real then $\overline{\lambda}$ is an eigenvalue of $A$ with eigenvector $\overline{x}$.

Here are some $2\times 2$ matrices and their eigenvalues.

\notag \begin{aligned} A_1 &= \begin{bmatrix}1 & 0 \\ 0 & 1 \end{bmatrix}, \quad \lambda = 1,1,\\ A_2 &= \begin{bmatrix}0 & 1 \\ 0 & 0 \end{bmatrix}, \quad \lambda = 0,0,\\ A_3 &= \begin{bmatrix}0 & 1 \\ -1 & 0 \end{bmatrix}, \quad \lambda = \mathrm{i},\mathrm{-i}. \end{aligned}

Note that $A_1$ and $A_2$ are upper triangular, that is, $a_{ij} = 0$ for $i>j$. For such a matrix the eigenvalues are the diagonal elements.

A symmetric matrix ($A^T = A$) or Hermitian matrix ($A^* = A$, where $A^* = \overline{A}^T$) has real eigenvalues. A proof is $Ax = \lambda x \Rightarrow x^*A^* = \overline{\lambda} x^*$ so premultiplying the first equation by $x^*$ and postmultiplying the second by $x$ gives $x^*Ax = \lambda x^*x$ and $x^*Ax = \overline{\lambda} x^*x$, which means that $(\lambda-\overline{\lambda})x^*x = 0$, or $\lambda=\overline{\lambda}$ since $x^*x \ne 0$. The matrix $A_1$ above is symmetric.

A skew-symmetric matrix ($A^T = -A$) or skew-Hermitian complex matrix ($A^* = -A$) has pure imaginary eigenvalues. A proof is similar to the Hermitian case: $Ax = \lambda x \Rightarrow -x^*A = x^*A^* = \overline{\lambda} x^*$ and so $x^*Ax$ is equal to both $\lambda x^*x$ and $-\overline{\lambda} x^*x$, so $\lambda = -\overline{\lambda}$. The matrix $A_3$ above is skew-symmetric.

In general, the eigenvalues of a matrix $A$ can lie anywhere in the complex plane, subject to restrictions based on matrix structure such as symmetry or skew-symmetry, but they are restricted to the disc centered at the origin with radius $\|A\|$, because for any matrix norm $\|\cdot\|$ it can be shown that every eigenvalue satisfies $|\lambda| \le \|A\|$.

Here are some example eigenvalue distributions, computed in MATLAB. (The eigenvalues are computed at high precision using the Advanpix Multiprecision Computing Toolbox in order to ensure that rounding errors do not affect the plots.) The second and third matrices are real, so the eigenvalues are symmetrically distributed about the real axis. (The first matrix is complex.)

Although this article is about eigenvalues we need to say a little more about eigenvectors. An $n\times n$ matrix $A$ with distinct eigenvalues has $n$ linearly independent eigenvectors. Indeed it is diagonalizable: $A = XDX^{-1}$ for some nonsingular matrix $X$ with $D = \mathrm{diag}(\lambda_i)$ the matrix of eigenvalues. If we write $X$ in terms of its columns as $X = [x_1,x_2,\dots,x_n]$ then $AX = XD$ is equivalent to $Ax_i = \lambda _i x_i$, $i=1\colon n$, so the $x_i$ are eigenvectors of $A$. The matrices $A_1$ and $A_3$ above both have two linearly independent eigenvectors.

If there are repeated eigenvalues there can be less than $n$ linearly independent eigenvectors. The matrix $A_2$ above has only one eigenvector: the vector $\left[\begin{smallmatrix}1 \\ 0 \end{smallmatrix}\right]$ (or any nonzero scalar multiple of it). This matrix is a Jordan block. The matrix $A_1$ shows that a matrix with repeated eigenvalues can have linearly independent eigenvectors.

Here are some questions about eigenvalues.

• What matrix decompositions reveal eigenvalues? The answer is the Jordan canonical form and the Schur decomposition. The Jordan canonical form shows how many linearly independent eigenvectors are associated with each eigenvalue.
• Can we obtain better bounds on where eigenvalues lie in the complex plane? Many results are available, of which the most well-known is Gershgorin’s theorem.
• How can we compute eigenvalues? Various methods are available. The QR algorithm is widely used and is applicable to all types of eigenvalue problems.

Finally, we note that the concept of eigenvalue is more general than just for matrices: it extends to nonlinear operators on finite or infinite dimensional spaces.

## References

Many books include treatments of eigenvalues of matrices. We give just three examples.

• Gene Golub and Charles F. Van Loan, Matrix Computations, fourth edition, Johns Hopkins University Press, Baltimore, MD, USA, 2013.
• Roger A. Horn and Charles R. Johnson, Matrix Analysis, second edition, Cambridge University Press, 2013. My review of the second edition.
• Carl D. Meyer, Matrix Analysis and Applied Linear Algebra, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2000.

## Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

# What Is a Symmetric Indefinite Matrix?

A symmetric indefinite matrix $A$ is a symmetric matrix for which the quadratic form $x^TAx$ takes both positive and negative values. By contrast, for a positive definite matrix $x^TAx > 0$ for all nonzero $x$ and for a negative definite matrix $x^TAx < 0$ for all nonzero $x$.

A neat way to express the indefinitess is that there exist vectors $x$ and $y$ for which $(x^TAx)(y^TAy) < 0$.

A symmetric indefinite matrix has both positive and negative eigenvalues and in some sense is a typical symmetric matrix. For example, a random symmetric matrix is usually indefinite:

>> rng(3); B = rand(4); A = B + B'; eig(A)'
ans =
-8.9486e-01  -6.8664e-02   1.1795e+00   3.9197e+00


In general it is difficult to tell if a symmetric matrix is indefinite or definite, but there is one easy-to-spot sufficient condition for indefinitess: if the matrix has a zero diagonal element that has a nonzero element in its row then it is indefinite. Indeed if $a_{kk} = 0$ then $e_k^TAe_k = a_{kk} = 0$, where $e_k$ is the $k$th unit vector, so $A$ cannot be positive definite or negative definite. The existence of a nonzero element in the row of the zero rules out the matrix being positive semidefinite ($x^TAx \ge 0$ for all $x$) or negative semidefinite ($x^TAx \le 0$ for all $x$).

An example of a symmetric indefinite matrix is a saddle point matrix, which has the block $2\times 2$ form

$\notag C = \begin{bmatrix} A & B^T \\ B & 0 \end{bmatrix},$

where $A$ is symmetric positive definite and $B\ne0$. When $A$ is the identity matrix, $C$ is the augmented system matrix associated with a least squares problem $\min_x \|Bx - d\|_2$. Another example is the $n\times n$ reverse identity matrix $J_n$, illustrated by

$\notag J_4 = \begin{bmatrix} 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \\ 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0 \end{bmatrix},$

which has eigenvalues $\pm1$ (exercise: how many $1$s and how many $-1$s?). A third example is a Toeplitz tridiagonal matrix with zero diagonal:

>> A = full(gallery('tridiag',5,1,0,1)), eig(sym(A))'
A =
0     1     0     0     0
1     0     1     0     0
0     1     0     1     0
0     0     1     0     1
0     0     0     1     0
ans =
[-1, 0, 1, 3^(1/2), -3^(1/2)]


How can we exploit symmetry in solving a linear system $Ax = b$ with a symmetric indefinite matrix $A$? A Cholesky factorization does not exist, but we could try to compute a factorization $A = LDL^T$, where $L$ is unit lower triangular and $D$ is diagonal with both positive and negative diagonal entries. However, this factorization does not always exist and if it does, its computation in floating-point arithmetic can be numerically unstable. The simplest example of nonexistence is the matrix

$\notag \begin{bmatrix} 0 & 1\\ 1 & 1 \end{bmatrix} \ne \begin{bmatrix} 1 & 0 \\ \ell_{21} & 0 \end{bmatrix} \begin{bmatrix} d_{11} & 0 \\ 0 & d_{22}\end{bmatrix} \begin{bmatrix} 1 & \ell_{21}\\ 0 & 1\end{bmatrix}.$

The way round this is to allow $D$ to have $2 \times 2$ blocks. We can compute a block $\mathrm{LDL^T}$ factorization $PAP^T = LDL^T$, were $P$ is a permutation matrix, $L$ is unit lower triangular, and $D$ is block diagonal with diagonal blocks of size $1$ or $2$. Various pivoting strategies, which determine $P$, are possible, but the recommend one is the symmetric rook pivoting strategy of Ashcraft, Grimes, and Lewis (1998), which has the key property of producing a bounded $L$ factor. Solving $Ax = b$ now reduces to substitutions with $L$ and a solve with $D$, which involves solving $2\times 2$ linear systems for the $2\times 2$ blocks and doing divisions for the $1\times 1$ blocks (scalars).

MATLAB implements $\mathrm{LDL^T}$ factorization in its ldl function. Here is an example using Anymatrix:

>> A = anymatrix('core/blockhouse',4), [L,D,P] = ldl(A), eigA = eig(A)'
A =
-4.0000e-01  -8.0000e-01  -2.0000e-01   4.0000e-01
-8.0000e-01   4.0000e-01  -4.0000e-01  -2.0000e-01
-2.0000e-01  -4.0000e-01   4.0000e-01  -8.0000e-01
4.0000e-01  -2.0000e-01  -8.0000e-01  -4.0000e-01
L =
1.0000e+00            0            0            0
0   1.0000e+00            0            0
5.0000e-01  -8.3267e-17   1.0000e+00            0
-2.2204e-16  -5.0000e-01            0   1.0000e+00
D =
-4.0000e-01  -8.0000e-01            0            0
-8.0000e-01   4.0000e-01            0            0
0            0   5.0000e-01  -1.0000e+00
0            0  -1.0000e+00  -5.0000e-01
P =
1     0     0     0
0     1     0     0
0     0     1     0
0     0     0     1
eigA =
-1.0000e+00  -1.0000e+00   1.0000e+00   1.0000e+00


Notice the $2\times 2$ blocks on the diagonal of $D$, each of which contains one negative eigenvalue and one positive eigenvalue. The eigenvalues of $D$ are not the same as those of $A$, but since $A$ and $D$ are congruent they have the same number of positive, zero, and negative eigenvalues.

## Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

# What Is a Toeplitz Matrix?

$T\in\mathbb{C}^{n\times n}$ is a Toeplitz matrix if $t_{ij} = t_{i-j}$ for $2n-1$ parameters $t_{1-n},\dots,t_{n-1}$. A Toeplitz matrix has constant diagonals. For $n = 4$:

$\notag T = \begin{bmatrix} t_0 & t_{-1} & t_{-2} & t_{-3}\\ t_1 & t_0 & t_{-1} & t_{-2}\\ t_2 & t_1 & t_0 & t_{-1}\\ t_3 & t_2 & t_1 & t_0 \end{bmatrix}.$

Toeplitz matrices arise in various problems, including analysis of time series, discretization of constant coefficient differential equations, and discretization of convolution equations $\int a(t-s)x(s)\,\mathrm{d} s = b(t)$.

Since a Toeplitz matrix depends on just $2n-1$ parameters it is reasonable to expect that a linear system $Tx = b$ can be solved in less than the $O(n^3)$ flops that would be required by LU factorization. Indeed methods are available that require only $O(n^2)$ flops; see Golub and Van Loan (2013) for details.

Upper triangular Toeplitz matrices can be written in the form

$\notag T = \sum_{j=1}^n t_{1-j} N^{j-1}, \quad N = \begin{bmatrix} 0 & 1 & & \\ & 0 & \ddots & \\ & & \ddots & 1 \\ & & & 0 \end{bmatrix},$

where $N$ is upper bidiagonal with a superdiagonal of ones and $N^n = 0$. It follows that the product of two upper triangular Toeplitz matrices is again upper triangular Toeplitz, upper triangular Toeplitz matrices commute, and $T^{-1}$ is also an upper triangular Toeplitz matrix (assuming $t_0$ is nonzero, so that $T$ is nonsingular).

Tridiagonal Toeplitz matrices arise frequently:

$\notag T(c,d,e) = \begin{bmatrix} d & e & & \\ c & d & \ddots & \\ & \ddots & \ddots & e \\ & & c & d \end{bmatrix} \in\mathbb{C}^{n\times n}.$

The eigenvalues of $T(c,d,e)$ are

$\notag d + 2 (ce)^{1/2} \cos\biggl( \displaystyle\frac{k \pi}{n+1} \biggr), \quad k = 1:n.$

The Kac–Murdock–Szegö matrix is the symmetric Toeplitz matrix

$\notag A(\rho) = \begin{bmatrix} 1 & \rho & \rho^2 & \dots & \rho^{n-1} \\ \rho & 1 & \rho & \dots & \rho^{n-2} \\ \rho^2 & \rho & 1 & \ddots & \vdots \\ \vdots & \vdots & \ddots & \ddots & \rho \\ \rho^{n-1} & \rho^{n-2} & \dots & \rho & 1 \end{bmatrix}.$

In MATLAB, a Toeplitz matrix can be constructed using toeplitz(c,r), which produces the matrix with first column c and first row r. Example:

>> n = 5; A = toeplitz(1:n,[1 -2:-1:-n])
A =
1    -2    -3    -4    -5
2     1    -2    -3    -4
3     2     1    -2    -3
4     3     2     1    -2
5     4     3     2     1


## References

• Gene Golub and Charles F. Van Loan, Matrix Computations, fourth edition, Johns Hopkins University Press, Baltimore, MD, USA, 2013. Section 4.7.

## Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

# What Is a Circulant Matrix?

An $n\times n$ circulant matrix is defined by $n$ parameters, the elements in the first row, and each subsequent row is a cyclic shift forward of the one above:

$\notag C = \begin{bmatrix} c_1 & c_2 & \dots & c_n \\ c_n & c_1 & \dots & \vdots \\ \vdots & \ddots & \ddots & c_2 \\ c_2 & \dots & c_n & c_1 \\ \end{bmatrix}.$

Circulant matrices have the important property that they are diagonalized by the discrete Fourier transform matrix

$\notag F_n = \Bigl(\exp( -2\pi \mathrm{i} (r-1)(s-1) / n )\Bigr)_{r,s=1}^n,$

which satisfies $F_n^*F_n = nI$, so that $n^{-1/2}F_n$ is a unitary matrix. ($F_n$ is a Vandermonde matrix with points the roots of unity.) Specifically,

$\notag F_n C F_n^{-1} = D = \mathrm{diag}(d_i). \qquad(1)$

Hence circulant matrices are normal ($C^*C = CC^*$). Moreover, the eigenvalues are given by $d = F_n Ce_1$, where $e_1$ is the first unit vector.

Note that one particular eigenpair is immediate, since $Ce = \bigl(\sum_{i=1}^n c_i\bigr) e$, where $e$ is the vector of ones.

The factorization (1) enables a circulant linear system to be solved in $O(n\log n)$ flops, since multiplication by $F_n$ can be done using the fast Fourier transform.

A particular circulant matrix is the (up) shift matrix $K_n$, the $4\times 4$ version of which is

$\notag K_4 = \begin{bmatrix} 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ 1 & 0 & 0 & 0 \end{bmatrix}.$

It is not hard to see that

$C = c_1 I + c_2 K_n + c_3K_n^2 + \cdots + c_n K_n^{n-1}.$

Since powers of $K_n$ commute, it follows that any two circulant matrices commute (this is also clear from (1)). Furthermore, the sum and product of two circulant matrices is a circulant matrix and the inverse of a nonsingular circulant matrix is a circulant matrix.

One important use of circulant matrices is to act as preconditioners for Toeplitz linear systems. Several methods have been proposed for constructing a circulant matrix from a Toeplitz matrix, including copying the central diagonals and wrapping them around and finding the nearest circulant matrix to the Toeplitz matrix. See Chan and Ng (1996) or Chan and Jin (2017) for a summary of work on circulant preconditioners for Toeplitz systems.

An interesting circulant matrix is anymatrix('core/circul_binom',n) in the Anymatrix collection, which is the $n\times n$ circulant matrix whose first row has $i$th element $n \choose ,i-1$. The eigenvalues are $(1 + w^i)^n - 1$, $i=1:n$, where $w = \exp(2\pi\mathrm{i}/n)$. The matrix is singular when $n$ is a multiple of 6, in which case the null space has dimension 2. Example:

>> n = 6; C = anymatrix('core/circul_binom',n), svd(C)
C =
1     6    15    20    15     6
6     1     6    15    20    15
15     6     1     6    15    20
20    15     6     1     6    15
15    20    15     6     1     6
6    15    20    15     6     1
ans =
6.3000e+01
2.8000e+01
2.8000e+01
1.0000e+00
2.0244e-15
7.6607e-16


## Notes

A classic reference on circulant matrices is Davis (1994).

## References

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.