What Is the Trace of a Matrix?

The trace of an n\times n matrix is the sum of its diagonal elements: \mathrm{trace}(A) = \sum_{i=1}^n a_{ii}. The trace is linear, that is, \mathrm{trace}(A+B) = \mathrm{trace}(A) + \mathrm{trace}(B), and \mathrm{trace}(A) = \mathrm{trace}(A^T).

A key fact is that the trace is also the sum of the eigenvalues. The proof is by considering the characteristic polynomial p(t) = \det(t I - A) = t^n + a_{n-1}t^{n-1} + \dots + a_1 t + a_0. The roots of p are the eigenvalues \lambda_1, \lambda_2, \dots, \lambda_n of A, so p can be factorized

\notag  p(t) = (t - \lambda_1) (t - \lambda_2) \dots (t - \lambda_n),

and so a_{n-1} = -(\lambda_{11} + \lambda_{22} + \cdots + \lambda_{nn}). The Laplace expansion of \det(t I - A) shows that the coefficient of t^{n-1} is -(a_{11} + a_{22} + \cdots + a_{nn}). Equating these two expressions for a_{n-1} gives

\notag     \mathrm{trace}(A) = \displaystyle\sum_{i=1}^n \lambda_i.  \qquad\qquad(1)

A consequence of (1) is that any transformation that preserves the eigenvalues preserves the trace. Therefore the trace is unchanged under similarity transformations: \mathrm{trace}(X^{-1}AX) = \mathrm{trace}(A) for any nonsingular X.

An an example of how the trace can be useful, suppose A is a symmetric and orthogonal n\times n matrix, so that its eigenvalues are \pm 1. If there are p eigenvalues 1 and q eigenvalues -1 then \mathrm{trace}(A) = p - q and n = p + q. Therefore p = (n + \mathrm{trace}(A))/2 and q = (n - \mathrm{trace}(A))/2.

Another important property is that for an m\times n matrix A and an n\times m matrix B,

\notag  \mathrm{trace}(AB) = \mathrm{trace}(BA) \qquad\qquad(2)

(despite the fact that AB \ne BA in general). The proof is simple:

\notag \begin{aligned}   \mathrm{trace}(AB) &= \displaystyle\sum_{i=1}^m (AB)_{ii}                      = \sum_{i=1}^m \sum_{k=1}^n a_{ik} b_{ki}                      = \sum_{k=1}^n \sum_{i=1}^m b_{ki} a_{ik} \\                      & = \sum_{k=1}^n (BA)_{kk}                      = \mathrm{trace}(BA). \end{aligned}

This simple fact can have non-obvious consequences. For example, consider the equation AX - XA = I in n\times n matrices. Taking the trace gives 0 = \mathrm{trace}(AX) - \mathrm{trace}(XA)    = \mathrm{trace}(AX - XA)    = \mathrm{trace}(I) = n, which is a contradiction. Therefore the equation has no solution.

The relation (2) gives \mathrm{trace}(ABC) = \mathrm{trace}((AB)C) = \mathrm{trace}(C(AB)) = \mathrm{trace}(CAB) for n\times n matrices A, B, and C, that is,

\notag   \mathrm{trace}(ABC) = \mathrm{trace}(CAB). \qquad\qquad(3)

So we can cyclically permute terms in a matrix product without changing the trace.

As an example of the use of (2) and (3), if x and y are n-vectors then \mathrm{trace}(xy^T) = \mathrm{trace}(y^Tx) = y^Tx. If A is an n\times n matrix then \mathrm{trace}(xy^TA) can be evaluated without forming the matrix xy^TA since, by (3), \mathrm{trace}(xy^TA) =                 \mathrm{trace}(y^TAx) = y^TAx.

The trace is useful in calculations with the Frobenius norm of an m\times n matrix:

\notag     \|A\|_F = \left(\displaystyle\sum_{i=1}^m \sum_{j=1}^n |a_{ij}|^2\right)^{1/2}             = \bigr(\mathrm{trace}(A^*A)\bigr)^{1/2},

where * denotes the conjugate transpose. For example, we can generalize the formula |x+\mathrm{i}y|^2 = x^2 + y^2 for a complex number to an m\times n matrix A by splitting A into its Hermitian and skew-Hermitian parts:

\notag  A = \frac{1}{2}(A+A^*) + \frac{1}{2}(A-A^*) \equiv B + C,

where B = B^* and C = -C^*. Then

\notag \begin{aligned}     \|A\|_F^2 =     \|B + C\|_F^2 &= \mathrm{trace}((B+C)^*(B+C))\\                 &= \mathrm{trace}(B^*B + C^*C) + \mathrm{trace}(B^*C + C^*B)\\                 &= \mathrm{trace}(B^*B + C^*C) + \mathrm{trace}(BC - CB)\\                 &= \mathrm{trace}(B^*B + C^*C)\\                 &= \|B\|_F^2 + \|C\|_F^2. \end{aligned}

If a matrix is not explicitly known but we can compute matrix–vector products with it then the trace can be estimated by

\notag      \mathrm{trace}(A) \approx x^TAx,

where the vector x has elements independently drawn from the standard normal distribution with mean 0 and variance 1. The expectation of this estimate is

\notag \begin{aligned}     E\bigl( x^TAx \bigr)  &=     E\bigl( \mathrm{trace}(x^TAx) \bigr)  =     E\bigl( \mathrm{trace}(Axx^T) \bigr)  =     \mathrm{trace} \bigl( E(Axx^T) \bigr) \\      &=  \mathrm{trace}\bigl(A E(xx^T) \bigr)       = \mathrm{trace}(A), \end{aligned}

since E(x_ix_j) = 0 for i\ne j and E(x_i^2) = 1 for all i, so E(xx^T) = I. This stochastic estimate, which is due to Hutchinson, is therefore unbiased.

References

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

What Is a Diagonalizable Matrix?

A matrix A \in\mathbb{C}^{n\times n} is diagonalizable if there exists a nonsingular matrix X\in\mathbb{C}^{n\times n} such that X^{-1}AX is diagonal. In other words, a diagonalizable matrix is one that is similar to a diagonal matrix.

The condition X^{-1}AX = D = \mathrm{diag}(\lambda_i) is equivalent to AX = XD with X nonsingular, that is, Ax_i = \lambda_ix_i, i=1\colon n, where X = [x_1,x_2,\dots, x_n]. Hence A is diagonalizable if and only if it has a complete set of linearly independent eigenvectors.

A Hermitian matrix is diagonalizable because the eigenvectors can be taken to be mutually orthogonal. The same is true for a normal matrix (one for which A^*A = AA^*). A matrix with distinct eigenvalues is also diagonalizable.

Theorem 1.

If A\in\mathbb{C}^{n\times n} has distinct eigenvalues then it is diagonalizable.

Proof. Let A have eigenvalues \lambda_1,\lambda_2,\dots,\lambda_n with corresponding eigenvectors x_1,x_2,\dots,x_n. Suppose that y = \sum_{i=1}^n \alpha_i x_i = 0 for some \alpha_1,\alpha_2,\dots,\alpha_n. Then

\notag \begin{aligned}  0 &= (A-\lambda_2 I)\cdots (A-\lambda_n I)y    = \sum_{i=1}^n \alpha_i (A-\lambda_2 I)\cdots (A-\lambda_n I) x_i\\    &= \sum_{i=1}^n \alpha_i (\lambda_i-\lambda_2)\cdots(\lambda_i-\lambda_n)x_i    =  \alpha_1 (\lambda_1-\lambda_2)\cdots(\lambda_1-\lambda_n)x_1, \end{aligned}

which implies \alpha_1 = 0 since \lambda_1\ne \lambda_j for j\ge 2 and x_1\ne0. Premultiplying y = \sum_{i=2}^n \alpha_i x_i = 0 by \prod_{j=3}^n(A-\lambda_j I) shows, in the same way, that \alpha_2 = 0. Continuing in this way we find that \alpha_1 = \alpha_2 = \cdots = \alpha_n = 0. Therefore the x_i are linearly independent and hence A is diagonalizable.

A matrix can have repeated eigenvalues and be diagonalizable, as diagonal matrices with repeated diagonal entries show. What is needed for diagonalizability is that every k-times repeated eigenvalue has k linearly independent eigenvectors associated with it. Equivalently, the algebraic and geometric multiplicities of every eigenvalue must be equal, that is, the eigenvalues must all be semisimple. Another equivalent condition is that the degree of the minimal polynomial is equal to the number of distinct eigenvalues.

The simplest example of a matrix that is not diagonalizable is \bigl[\begin{smallmatrix} 0 & 1 \\ 0 & 0 \end{smallmatrix}\bigr]. This matrix is a 2\times 2 Jordan block with the eigenvalue 0. Diagonalizability is easily understood in terms of the Jordan canonical form: A is diagonalizable if and only if all the Jordan blocks in its Jordan form are 1\times 1.

Most matrices are diagonalizable, in the sense that the diagonalizable matrices are dense in \mathbb{C}^{n\times n}, that is, any matrix in \mathbb{C}^{n\times n} is arbitrarily close to a diagonalizable matrix. This property is useful because it can be convenient to prove a result by first proving it for diagonalizable matrices and then arguing that by continuity the result holds for a general matrix.

Is a rank-1 matrix A = xy^* diagonalizable, where x,y\in\mathbb{C}^n are nonzero? There are n-1 zero eigenvalues with eigenvectors any set of linearly independent vectors orthogonal to y. If y^*x \ne 0 then y^*x is the remaining eigenvalue, with eigenvector x, which is linearly independent of the eigenvectors for 0, and A is diagonalizable. If y^*x = 0 then all the eigenvalues of A are zero and so A cannot be diagonalizable, as the only diagonalizable matrix whose eigenvalues are all zero is the zero matrix. For the matrix \bigl[\begin{smallmatrix} 0 & 1 \\ 0 & 0 \end{smallmatrix}\bigr] mentioned above, x = \bigl[{1 \atop 0} \bigr] and y = \bigl[{0 \atop 1} \bigr], so y^*x = 0, confirming that this matrix is not diagonalizable.

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

The Ten Most-Viewed Posts of 2022

top10.jpg

According to the WordPress statistics, this blog received over 192,000 visitors and 281,000 views in 2022. These are the ten most-viewed posts published during the year.

  1. Seven Sins of Numerical Linear Algebra
  2. What Is an Eigenvalue?
  3. The Big Six Matrix Factorizations
  4. What Is a Schur Decomposition?
  5. What Is a Permutation Matrix?
  6. What Is the Logarithmic Norm?
  7. What Is the Jordan Canonical Form?
  8. What Is the Frank Matrix?
  9. What Is a Circulant matrix?
  10. What Is a Toeplitz Matrix?

Eight of the posts are from the What Is series, which now contains 83 articles; more will follow in 2023.

What Is a Stochastic Matrix?

A stochastic matrix is an n\times n matrix with nonnegative entries and unit row sums. If A\in\mathbb{R}^{n\times n} is stochastic then Ae = e, where e = [1,1,\dots,1]^T is the vector of ones. This means that e is an eigenvector of A corresponding to the eigenvalue 1.

The identity matrix is stochastic, as is any permutation matrix. Here are some other examples of stochastic matrices:

\notag \begin{aligned}   A_n &= n^{-1}ee^T, \quad \textrm{in particular}~~     A_3 = \begin{bmatrix}    \frac{1}{3} & \frac{1}{3} & \frac{1}{3}\\[3pt]    \frac{1}{3} & \frac{1}{3} & \frac{1}{3}\\[3pt]    \frac{1}{3} & \frac{1}{3} & \frac{1}{3}    \end{bmatrix}, \qquad (1)\\   B_n &= \frac{1}{n-1}(ee^T -I), \quad \textrm{in particular}~~     B_3 = \begin{bmatrix}              0 & \frac{1}{2} & \frac{1}{2}\\[2pt]    \frac{1}{2} & 0           & \frac{1}{2}\\[2pt]    \frac{1}{2} & \frac{1}{2} & 0    \end{bmatrix},    \qquad (2)\\   C_n &= \begin{bmatrix}             1         &                   &           &    \\      \frac{1}{2}      &  \frac{1}{2}      &           &    \\      \vdots           &  \vdots           &\ddots     &    \\      \frac{1}{n}      &  \frac{1}{n}      &\cdots     &  \frac{1}{n}      \end{bmatrix}.   \qquad (3)\\ \end{aligned}

For any matrix A, the spectral radius \rho(A) = \max\{ |\lambda| : \lambda \mathrm{~is~an~eigenvalue~of~} A \} is bounded by \rho(A) \le \|A\| for any norm. For a stochastic matrix, taking the \infty-norm (the maximum row sum of absolute values) gives \rho(A) \le 1, so since we know that 1 is an eigenvalue, \rho(A) = 1. It can be shown that 1 is a semisimple eigenvalue, that is, if there are k eigenvalues equal to 1 then there are k linearly independent eigenvectors corresponding to 1 (Meyer, 2000, p. 696).

A lower bound on the spectral radius can be obtained from Gershgorin’s theorem. The ith Gershgorin disc is defined by |\lambda - a_{ii}| \le \sum_{j\ne i} |a_{ij}| = 1 - a_{ii}, which implies |\lambda| \ge 2a_{ii}-1. Every eigenvalue \lambda lies in the union of the n discs and so must satisfy

\notag         2\min_i a_{ii}-1 \le |\lambda| \le 1.

The lower bound is positive if A is strictly diagonally dominant by rows.

If A and B are stochastic then AB is nonnegative and ABe = Ae = e, so AB is stochastic. In particular, any positive integer power of A is stochastic. Does A^k converge as k\to\infty? The answer is that it does, and the limit is stochastic, as long as 1 is the only eigenvalue of modulus 1, and this will be the case if all the elements of A are positive (by Perron’s theorem). The simplest example of non-convergence is the stochastic matrix

\notag    A = \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix},

which has eigenvalues 1 and -1. Since A^2 = I, all even powers are equal to I and all odd powers are equal to A. For the matrix (1), A_n^k = A_n for all k, while for (2), B_n^k \to A_n as k\to\infty. For (3), C_n^k converges to the matrix with 1 in every entry of the first column and zeros everywhere else.

Stochastic matrices arise in Markov chains. A transition matrix for a finite homogeneous Markov chain is a matrix whose (i,j) element is the probability of moving from state i to state j over a time step. It has nonnegative entries and the rows sum to 1, so it is a stochastic matrix. In applications including finance and healthcare, a transition matrix may be estimated for a certain time period, say one year, but a transition matrix for a shorter period, say one month, may be needed. If A is a transition matrix for a time period p then a stochastic pth root of A, which is a stochastic solution X of the equation X^p = A, is a transition matrix for a time period a factor p smaller. Therefore p = 12 (years to months) and p = 7 (weeks to days) are among the values of interest. Unfortunately, a stochastic pth root may not exist. For example, the matrix

\notag \begin{bmatrix}    0 & 1 & 0 \\    0 & 0 & 1 \\    0 & 0 & 1 \end{bmatrix}

has no pth roots at all, let alone stochastic ones. Yet many stochastic matrices do have stochastic roots. The matrix (3) has a stochastic pth root for all p, as shown by Higham and Lin (2011), who give a detailed analysis of pth roots of stochastic matrices. For example, to four decimal places,

\notag  C_6^{1/7} = \begin{bmatrix} 1.000  &        &        &        &        &        \\ 0.094  & 0.906  &        &        &        &        \\ 0.043  & 0.102  & 0.855  &        &        &        \\ 0.027  & 0.050  & 0.103  & 0.820  &        &        \\ 0.019  & 0.032  & 0.052  & 0.103  & 0.795  &        \\ 0.014  & 0.023  & 0.034  & 0.053  & 0.102  & 0.774  \\ \end{bmatrix}.

A stochastic matrix is sometime called a row-stochastic matrix to distinguish it from a column-stochastic matrix, which is a nonnegative matrix for which e^TA = e^T (so that A^T is row-stochastic). A matrix that is both row-stochastic and column-stochastic is called doubly stochastic. A permutation matrix is an example of a doubly stochastic matrix. If U is a unitary matrix then the matrix with a_{ij} = |u_{ij}|^2 is doubly stochastic. A magic square scaled by the magic sum is also doubly stochastic. For example,

>> M = magic(4), A = M/sum(M(1,:)) % A is doubly stochastic.
M =
    16     2     3    13
     5    11    10     8
     9     7     6    12
     4    14    15     1
A =
   4.7059e-01   5.8824e-02   8.8235e-02   3.8235e-01
   1.4706e-01   3.2353e-01   2.9412e-01   2.3529e-01
   2.6471e-01   2.0588e-01   1.7647e-01   3.5294e-01
   1.1765e-01   4.1176e-01   4.4118e-01   2.9412e-02
>> [sum(A) sum(A')]
ans =
     1     1     1     1     1     1     1     1
>> eig(A)'
ans =
   1.0000e+00   2.6307e-01  -2.6307e-01   8.5146e-18

References

  • Nicholas J. Higham and Lijing Lin, On pth Roots of Stochastic Matrices, Linear Algebra Appl. 435, 448–463, 2011.
  • Carl D. Meyer, Matrix Analysis and Applied Linear Algebra, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2000.

Related Blog Posts

What Is the Inertia of a Matrix?

The inertia of a real symmetric n\times n matrix A is a triple, written \mathrm{In}(A) = (i_+(A),i_-(A),i_0(A)), where i_+(A) is the number of positive eigenvalues of A, i_-(A) is the number of negative eigenvalues of A, and i_0(A) is the number of zero eigenvalues of A.

The rank of A is i_+(A) + i_-(A). The difference i_+(A) - i_-(A) is called the signature.

In general it is not possible to determine the inertia by inspection, but some deductions can be made. If A has both positive and negative diagonal elements then i_+(A) > 1 and i_-(A) > 1. But in general the diagonal elements do not tell us much about the inertia. For example, here is a matrix that has positive diagonal elements but only one positive eigenvalue (and this example works for any n):

>> n = 4; A = -eye(n) + 2*ones(n), eigA = eig(sym(A))'
A =
     1     2     2     2
     2     1     2     2
     2     2     1     2
     2     2     2     1
eigA =
[-1, -1, -1, 7]

A congruence transformation of a symmetric matrix A is a transformation A \to X^T\!AX for a nonsingular matrix X. The result is clearly symmetric. Sylvester’s law of inertia (1852) says that the inertia is preserved under congruence transformations.

Theorem 1 (Sylvester’s law of inertia).

If A\in\mathbb{R}^{n\times n} is symmetric and X\in\mathbb{R}^{n\times n} is nonsingular then \mathrm{In}(A) = \mathrm{In}(X^T\!AX).

Sylvester’s law gives a way to determine the inertia without computing eigenvalues: find a congruence transformation that transforms A to a matrix whose inertia can be easily determined. A factorization PAP^T = LDL^T does the job, where P is a permutation matrix, L is unit lower triangular, and D is diagonal Then \mathrm{In}(A) = \mathrm{In}(D), and \mathrm{In}(D) can be read off the diagonal of D. This factorization does not always exist, and if it does exist is can be numerically unstable. A block LDL^T factorization, in which D is block diagonal with diagonal blocks of size 1 or 2, always exists, and its computation is numerically stable with a suitable pivoting strategy such as symmetric rook pivoting.

For the matrix above we can compute a block LDL^T factorization using the MATLAB ldl function:

>> [L,D,P] = ldl(A); D
D =
   1.0000e+00   2.0000e+00            0            0
   2.0000e+00   1.0000e+00            0            0
            0            0  -1.6667e+00            0
            0            0            0  -1.4000e+00

Since the leading 2-by-2 block of D has negative determinant and hence one positive eigenvalue and one negative eigenvalue, it follows that A has one positive eigenvalue and three negative eigenvalues.

A congruence transformation preserves the signs of the eigenvalues but not their magnitude. A result of Ostrowski (1959) bounds the ratios of the eigenvalues of the original and transformed matrices. Let the eigenvalues of a symmetric matrix be ordered \lambda_n \le \cdots \le \lambda_1.

Theorem (Ostrowski).

For a symmetric A\in \mathbb{R}^{n\times n} and X\in\mathbb{R}^{n\times n},

\lambda_k(X^*AX) = \theta_k \lambda_k(A), \quad k=1\colon n,

where \lambda_n(X^*X) \le \theta_k \le \lambda_1(X^*X).

The theorem shows that the further X is from being orthogonal the greater the potential change in the eigenvalues.

Finally, we note that everything here generalizes to complex Hermitian matrices by replacing transpose by conjugate transpose.

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

What Is Gershgorin’s Theorem?

For a given n\times n matrix, Gershgorin’s theorem defines n discs in the complex plane whose union contains the eigenvalues of the matrix. The theorem can provide approximations to eigenvalues. It can also provide qualitative information, such as that all the eigenvalues lie in a particular half-plane.

Theorem 1 (Gershgorin’s theorem).

The eigenvalues of A\in\mathbb{C}^{n\times n} lie in the union of the n discs in the complex plane

\notag      D_i = \Big\{ z\in\mathbb{C}: |z-a_{ii}| \le \displaystyle\sum_{j\ne i}      |a_{ij}|\Big\}, \quad i=1\colon n.

Proof. Let \lambda be an eigenvalue of A and x a corresponding eigenvector and let |x_k| = \|x\|_\infty. From the kth equation in Ax=\lambda x we have

\notag        \sum_{j=1 \atop j\ne k}^n a_{kj}x_j = (\lambda-a_{kk}) x_k.

Hence

\notag    |\lambda-a_{kk}| \le  \sum_{j=1 \atop j\ne k}^n |a_{kj}||x_j|/|x_k|,

and since |x_j|/|x_k|\le 1 it follows that \lambda belongs to the kth disc, D_k.

The Gershgorin discs D_i are defined in terms of a summation over the rows of A, but since the eigenvalues of A are the same as those of A^T the same result holds with summation over the columns.

A consequence of the theorem is that if 0 does not belong to any of the Gershgorin discs then A is nonsingular. This is equivalent to the well-known result that a strictly diagonally dominant matrix is nonsingular.

Another consequence of the theorem is that if A is symmetric and all the discs lie in the open right half-plane then the eigenvalues are positive and so A is positive definite. This condition is equivalent to A having positive diagonal elements and being strictly diagonally dominant.

The Gershgorin discs for the 5\times 5 matrix

\notag \left[\begin{array}{ccccc} 5/4         & 1 & 3/4         & 1/2         & 1/4        \\ 1 & 0 & 0 & 0 & 0\\ -1 & 1 & 0 & 0 & 0\\ 0 & 0 & 1 & 3 & 0\\ 0 & 0 & 0 & 1/2         & 5 \end{array}\right]

are shown here:

gersh1.jpg The eigenvalues—three real and one complex conjugate pair—are the black dots. It happens that each disc contains an eigenvalue, but this is not always the case. For the matrix

\notag \left[\begin{array}{cc} 2 & -1\\ 2 & 0 \end{array}\right]

the discs are

gersh2.jpg and the blue disc does not contain an eigenvalue. The next result, which is proved by a continuity argument, provides additional information that increases the utility of Gershgorin’s theorem. In particular it says that if a disc is disjoint from the other discs then it contains an eigenvalue.

Theorem 2.

If k of the Gershgorin discs of A\in\mathbb{C}^{n\times n} are disjoint from the other discs then their union contains k eigenvalues of A.

Theorem 2 tells us that the rightmost disc in our 5\times 5 example contains one eigenvalue, \lambda, since it is disjoint from the other discs, and the union of the other four discs contains four eigenvalues. Furthermore, \lambda must be real, because if not it occurs in a complex conjugate pair since the matrix is real, and as the disc is symmetric about the real axis \overline{\lambda} would also lie in the disc, contradicting Theorem 2.

Gershgorin’s theorem is most useful for matrices that are close to being diagonal. A technique that can produce improved eigenvalue estimates is to apply the theorem to D^{-1}AD for some nonsingular diagonal matrix D. This similarity transformation does not change the eigenvalues but it does change the discs, and the aim is to choose D to reduce the radiuses of the discs. Consider our 5\times 5 example. We know that there is one eigenvalue \lambda in the rightmost disc and that it is real, so 4.5 \le \lambda \le 5.5. For D = \mathrm{diag}(1,1,1,1,1/4) the rightmost disc shrinks and remains distinct from the others and we obtain the sharper bounds 4.875 \le \lambda \le 5.126. The discs for D^{-1}AD are shown here:

gersh1a.jpg

Notes

Most books on matrix analysis or numerical linear algebra include Gershgorin’s theorem.

Eigenvalue inclusion regions have been developed with discs replaced by more complicated shapes, such as Brauer’s ovals of Cassini.

Varga’s 2004 book is devoted to Gershgorin’s theorem and related results. It reproduces Gershgorin’s 1931 paper in an appendix.

References

  • S. Gerschgorin. Uber die Abgrenzung der Eigenwerte einer Matrix. Izv. Akad. Nauk. SSSR, 1:749–754, 1931.
  • Richard S. Varga. Geršgorin and His Circles. Springer-Verlag, Berlin, Germany, 2004.

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

What Is a Nilpotent Matrix?

An n\times n matrix A is nilpotent if A^k =0 for some positive integer k. A nonzero nilpotent matrix must have both positive and negative entries in order for cancellation to take place in the matrix powers. The smallest k for which A^k =0 is called the index of nilpotency. The index does not exceed n, as we will see below.

Here are some examples of nilpotent matrices.

\notag \begin{aligned}  A_1 &= \begin{bmatrix}0 & 1 \\ 0 & 0 \end{bmatrix}, \quad         A_1^2=0,\\  A_2 &= \begin{bmatrix}0 & 1 & 1\\ 0 & 0 & 1\\ 0 & 0 & 0 \end{bmatrix}, \quad         A_2^3 = 0,\\  A_3 &= \begin{bmatrix}1 & -1 \\ 1 & -1 \end{bmatrix}, \quad         A_3^2 = 0,\\  A_4 &= \begin{bmatrix}2 & 2 & 4\\ -4 & -4 & -8\\ 1 & 1 & 2  \end{bmatrix}, \quad         A_4^2 = 0. \end{aligned}

Matrix A_1 is the 2\times 2 instance of the upper bidiagonal p\times p matrix

\notag N =  \begin{bmatrix}  0   & 1         &          &           \\                           & 0         & \ddots   &           \\                           &           & \ddots   &    1      \\                           &           &          & 0         \end{bmatrix},   \qquad (1)

for which

\notag N^2     = \begin{bmatrix}                0 & 0 & 1      &        &   \\                  & 0 & \ddots & \ddots &   \\                  &   &        & \ddots & 1 \\                  &   &        & \ddots & 0 \\                  &   &        &        & 0            \end{bmatrix}, \quad \dots, \quad N^{p-1} = \begin{bmatrix}                0 & 0 & \dots  & 0      & 1      \\                  & 0 & \ddots &        & 0      \\                  &   & \ddots & \ddots & \vdots \\                  &   &        & 0      & 0      \\                  &   &        &        & 0            \end{bmatrix}

and N^p = 0. The superdiagonal of ones moves up to the right with each increase in the index of the power until it disappears off the top right corner of the matrix.

Matrix A_4 has rank 1 and was constructed using a general formula: if A = xy^T with y^Tx = 0 then A^2 = xy^T xy^T = (y^Tx) xy^T = 0. We simply took orthogonal vectors x =[2, -4, 1]^T and y = [1, 1, 2]^T.

If A is nilpotent then every eigenvalue is zero, since Ax = \lambda x with x\ne 0 implies 0 = A^nx = \lambda^n x or \lambda = 0. Consequently, the trace and determinant of a nilpotent matrix are both zero.

If A is nilpotent and Hermitian or symmetric, or more generally normal (A^*A = AA^*), then A = 0, since such a matrix has a spectral decomposition A = Q \mathrm{diag}(\lambda_i)Q^* and the matrix \mathrm{diag}(\lambda_i) is zero. It is only for nonnormal matrices that nilpotency is a nontrivial property, and the best way to understand it is with the Jordan canonical form (JCF). The JCF of a matrix with only zero eigenvalues has the form A = XJX^{-1}, where J = \mathrm{diag}(J_{m_1},J_{m_2}, \dots, J_{m_p}), where J_{m_i} is of the form (1) and hence J_{m_i}^{m_i} = 0. It follows that the index of nilpotency is k = \max\{\,m_i : i=1\colon p\,\} \le n.

What is the rank of an n\times n nilpotent matrix A? The minimum possible rank is 0, attained for the zero matrix. The maximum possible rank is n-1, attained when the JCF of A has just one Jordan block of size n. Any rank between 0 and n-1 is possible: rank j is attained when there is a Jordan block of size j+1 and all other blocks are 1\times 1.

Finally, while a nilpotent matrix is obviously not invertible, like every matrix it has a Moore–Penrose pseudoinverse. The pseudoinverse of a Jordan block with eigenvalue zero is just the transpose of the block: N^+ = N^T for N in (1).

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

What Is an Eigenvalue?

An eigenvalue of a square matrix A is a scalar \lambda such that Ax = \lambda x for some nonzero vector x. The vector x is an eigenvector of A and it has the distinction of being a direction that is not changed on multiplication by A.

An n\times n matrix has n eigenvalues. This can be seen by noting that Ax = \lambda x is equivalent to (\lambda I - A) x = 0, which means that \lambda I - A is singular, since x\ne 0. Hence \det(\lambda I - A) = 0. But

\notag  \det(\lambda I - A) = \lambda^n +                        a_{n-1}\lambda^{n-1} + \cdots + a_1 \lambda + a_0

is a scalar polynomial of degree n (the characteristic polynomial of A) with nonzero leading coefficient and so has n roots, which are the eigenvalues of A. Since \det(\lambda I - A) = \det( (\lambda I - A)^T)       = \det(\lambda I - A^T), the eigenvalues of A^T are the same as those of A.

A real matrix may have complex eigenvalues, but they appear in complex conjugate pairs. Indeed Ax = \lambda x implies \overline{A}\overline{x} = \overline{\lambda} \overline{x}, so if A is real then \overline{\lambda} is an eigenvalue of A with eigenvector \overline{x}.

Here are some 2\times 2 matrices and their eigenvalues.

\notag \begin{aligned}  A_1 &= \begin{bmatrix}1 & 0 \\ 0 & 1 \end{bmatrix}, \quad         \lambda = 1,1,\\  A_2 &= \begin{bmatrix}0 & 1 \\ 0 & 0 \end{bmatrix}, \quad         \lambda = 0,0,\\  A_3 &= \begin{bmatrix}0 & 1 \\ -1 & 0 \end{bmatrix}, \quad         \lambda = \mathrm{i},\mathrm{-i}. \end{aligned}

Note that A_1 and A_2 are upper triangular, that is, a_{ij} = 0 for i>j. For such a matrix the eigenvalues are the diagonal elements.

A symmetric matrix (A^T = A) or Hermitian matrix (A^* = A, where A^* = \overline{A}^T) has real eigenvalues. A proof is Ax = \lambda x \Rightarrow x^*A^* = \overline{\lambda} x^* so premultiplying the first equation by x^* and postmultiplying the second by x gives x^*Ax = \lambda x^*x and x^*Ax = \overline{\lambda} x^*x, which means that (\lambda-\overline{\lambda})x^*x = 0, or \lambda=\overline{\lambda} since x^*x \ne 0. The matrix A_1 above is symmetric.

A skew-symmetric matrix (A^T = -A) or skew-Hermitian complex matrix (A^* = -A) has pure imaginary eigenvalues. A proof is similar to the Hermitian case: Ax = \lambda x \Rightarrow -x^*A = x^*A^* = \overline{\lambda} x^* and so x^*Ax is equal to both \lambda x^*x and -\overline{\lambda} x^*x, so \lambda = -\overline{\lambda}. The matrix A_3 above is skew-symmetric.

In general, the eigenvalues of a matrix A can lie anywhere in the complex plane, subject to restrictions based on matrix structure such as symmetry or skew-symmetry, but they are restricted to the disc centered at the origin with radius \|A\|, because for any matrix norm \|\cdot\| it can be shown that every eigenvalue satisfies |\lambda| \le \|A\|.

Here are some example eigenvalue distributions, computed in MATLAB. (The eigenvalues are computed at high precision using the Advanpix Multiprecision Computing Toolbox in order to ensure that rounding errors do not affect the plots.) The second and third matrices are real, so the eigenvalues are symmetrically distributed about the real axis. (The first matrix is complex.)

eig_smoke.jpg eig_dramadah.jpg eig_toeppen_inv.jpg

Although this article is about eigenvalues we need to say a little more about eigenvectors. An n\times n matrix A with distinct eigenvalues has n linearly independent eigenvectors. Indeed it is diagonalizable: A = XDX^{-1} for some nonsingular matrix X with D = \mathrm{diag}(\lambda_i) the matrix of eigenvalues. If we write X in terms of its columns as X = [x_1,x_2,\dots,x_n] then AX = XD is equivalent to Ax_i = \lambda _i x_i, i=1\colon n, so the x_i are eigenvectors of A. The matrices A_1 and A_3 above both have two linearly independent eigenvectors.

If there are repeated eigenvalues there can be less than n linearly independent eigenvectors. The matrix A_2 above has only one eigenvector: the vector \left[\begin{smallmatrix}1 \\ 0 \end{smallmatrix}\right] (or any nonzero scalar multiple of it). This matrix is a Jordan block. The matrix A_1 shows that a matrix with repeated eigenvalues can have linearly independent eigenvectors.

Here are some questions about eigenvalues.

  • What matrix decompositions reveal eigenvalues? The answer is the Jordan canonical form and the Schur decomposition. The Jordan canonical form shows how many linearly independent eigenvectors are associated with each eigenvalue.
  • Can we obtain better bounds on where eigenvalues lie in the complex plane? Many results are available, of which the most well-known is Gershgorin’s theorem.
  • How can we compute eigenvalues? Various methods are available. The QR algorithm is widely used and is applicable to all types of eigenvalue problems.

Finally, we note that the concept of eigenvalue is more general than just for matrices: it extends to nonlinear operators on finite or infinite dimensional spaces.

References

Many books include treatments of eigenvalues of matrices. We give just three examples.

  • Gene Golub and Charles F. Van Loan, Matrix Computations, fourth edition, Johns Hopkins University Press, Baltimore, MD, USA, 2013.
  • Roger A. Horn and Charles R. Johnson, Matrix Analysis, second edition, Cambridge University Press, 2013. My review of the second edition.
  • Carl D. Meyer, Matrix Analysis and Applied Linear Algebra, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2000.

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

What Is a Symmetric Indefinite Matrix?

A symmetric indefinite matrix A is a symmetric matrix for which the quadratic form x^TAx takes both positive and negative values. By contrast, for a positive definite matrix x^TAx > 0 for all nonzero x and for a negative definite matrix x^TAx < 0 for all nonzero x.

A neat way to express the indefinitess is that there exist vectors x and y for which (x^TAx)(y^TAy) < 0.

A symmetric indefinite matrix has both positive and negative eigenvalues and in some sense is a typical symmetric matrix. For example, a random symmetric matrix is usually indefinite:

>> rng(3); B = rand(4); A = B + B'; eig(A)'
ans =
  -8.9486e-01  -6.8664e-02   1.1795e+00   3.9197e+00

In general it is difficult to tell if a symmetric matrix is indefinite or definite, but there is one easy-to-spot sufficient condition for indefinitess: if the matrix has a zero diagonal element that has a nonzero element in its row then it is indefinite. Indeed if a_{kk} = 0 then e_k^TAe_k = a_{kk} = 0, where e_k is the kth unit vector, so A cannot be positive definite or negative definite. The existence of a nonzero element in the row of the zero rules out the matrix being positive semidefinite (x^TAx \ge 0 for all x) or negative semidefinite (x^TAx \le 0 for all x).

An example of a symmetric indefinite matrix is a saddle point matrix, which has the block 2\times 2 form

\notag  C =  \begin{bmatrix} A & B^T \\ B & 0       \end{bmatrix},

where A is symmetric positive definite and B\ne0. When A is the identity matrix, C is the augmented system matrix associated with a least squares problem \min_x \|Bx - d\|_2. Another example is the n\times n reverse identity matrix J_n, illustrated by

\notag   J_4 = \begin{bmatrix}   0 & 0 & 0 & 1  \\   0 & 0 & 1 & 0  \\   0 & 1 & 0 & 0  \\   1 & 0 & 0 & 0 \end{bmatrix},

which has eigenvalues \pm1 (exercise: how many 1s and how many -1s?). A third example is a Toeplitz tridiagonal matrix with zero diagonal:

>> A = full(gallery('tridiag',5,1,0,1)), eig(sym(A))'
A =
     0     1     0     0     0
     1     0     1     0     0
     0     1     0     1     0
     0     0     1     0     1
     0     0     0     1     0
ans =
[-1, 0, 1, 3^(1/2), -3^(1/2)]

How can we exploit symmetry in solving a linear system Ax = b with a symmetric indefinite matrix A? A Cholesky factorization does not exist, but we could try to compute a factorization A = LDL^T, where L is unit lower triangular and D is diagonal with both positive and negative diagonal entries. However, this factorization does not always exist and if it does, its computation in floating-point arithmetic can be numerically unstable. The simplest example of nonexistence is the matrix

\notag   \begin{bmatrix} 0 & 1\\ 1 & 1 \end{bmatrix} \ne   \begin{bmatrix} 1 & 0 \\ \ell_{21} & 0 \end{bmatrix}   \begin{bmatrix} d_{11} & 0 \\ 0 & d_{22}\end{bmatrix}   \begin{bmatrix} 1 & \ell_{21}\\  0 & 1\end{bmatrix}.

The way round this is to allow D to have 2 \times 2 blocks. We can compute a block \mathrm{LDL^T} factorization PAP^T = LDL^T, were P is a permutation matrix, L is unit lower triangular, and D is block diagonal with diagonal blocks of size 1 or 2. Various pivoting strategies, which determine P, are possible, but the recommend one is the symmetric rook pivoting strategy of Ashcraft, Grimes, and Lewis (1998), which has the key property of producing a bounded L factor. Solving Ax = b now reduces to substitutions with L and a solve with D, which involves solving 2\times 2 linear systems for the 2\times 2 blocks and doing divisions for the 1\times 1 blocks (scalars).

MATLAB implements \mathrm{LDL^T} factorization in its ldl function. Here is an example using Anymatrix:

>> A = anymatrix('core/blockhouse',4), [L,D,P] = ldl(A), eigA = eig(A)'
A =
  -4.0000e-01  -8.0000e-01  -2.0000e-01   4.0000e-01
  -8.0000e-01   4.0000e-01  -4.0000e-01  -2.0000e-01
  -2.0000e-01  -4.0000e-01   4.0000e-01  -8.0000e-01
   4.0000e-01  -2.0000e-01  -8.0000e-01  -4.0000e-01
L =
   1.0000e+00            0            0            0
            0   1.0000e+00            0            0
   5.0000e-01  -8.3267e-17   1.0000e+00            0
  -2.2204e-16  -5.0000e-01            0   1.0000e+00
D =
  -4.0000e-01  -8.0000e-01            0            0
  -8.0000e-01   4.0000e-01            0            0
            0            0   5.0000e-01  -1.0000e+00
            0            0  -1.0000e+00  -5.0000e-01
P =
     1     0     0     0
     0     1     0     0
     0     0     1     0
     0     0     0     1
eigA =
  -1.0000e+00  -1.0000e+00   1.0000e+00   1.0000e+00

Notice the 2\times 2 blocks on the diagonal of D, each of which contains one negative eigenvalue and one positive eigenvalue. The eigenvalues of D are not the same as those of A, but since A and D are congruent they have the same number of positive, zero, and negative eigenvalues.

References

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

What Is a Toeplitz Matrix?

T\in\mathbb{C}^{n\times n} is a Toeplitz matrix if t_{ij} = t_{i-j} for 2n-1 parameters t_{1-n},\dots,t_{n-1}. A Toeplitz matrix has constant diagonals. For n = 4:

\notag   T = \begin{bmatrix} t_0 & t_{-1} & t_{-2} & t_{-3}\\                t_1 & t_0    & t_{-1} & t_{-2}\\                t_2 & t_1    & t_0    & t_{-1}\\                t_3 & t_2    & t_1    & t_0 \end{bmatrix}.

Toeplitz matrices arise in various problems, including analysis of time series, discretization of constant coefficient differential equations, and discretization of convolution equations \int a(t-s)x(s)\,\mathrm{d} s = b(t).

Since a Toeplitz matrix depends on just 2n-1 parameters it is reasonable to expect that a linear system Tx = b can be solved in less than the O(n^3) flops that would be required by LU factorization. Indeed methods are available that require only O(n^2) flops; see Golub and Van Loan (2013) for details.

Upper triangular Toeplitz matrices can be written in the form

\notag T = \sum_{j=1}^n t_{1-j} N^{j-1}, \quad      N = \begin{bmatrix} 0 & 1      &        &   \\                     & 0      & \ddots &   \\                     &        & \ddots &   1  \\                     &        &        &   0 \end{bmatrix},

where N is upper bidiagonal with a superdiagonal of ones and N^n = 0. It follows that the product of two upper triangular Toeplitz matrices is again upper triangular Toeplitz, upper triangular Toeplitz matrices commute, and T^{-1} is also an upper triangular Toeplitz matrix (assuming t_0 is nonzero, so that T is nonsingular).

Tridiagonal Toeplitz matrices arise frequently:

\notag  T(c,d,e) = \begin{bmatrix}                      d   & e      &        &   \\                        c & d      & \ddots &   \\                          & \ddots & \ddots & e \\                          &        & c      & d                \end{bmatrix} \in\mathbb{C}^{n\times n}.

The eigenvalues of T(c,d,e) are

\notag         d + 2 (ce)^{1/2} \cos\biggl( \displaystyle\frac{k \pi}{n+1} \biggr),         \quad k = 1:n.

The Kac–Murdock–Szegö matrix is the symmetric Toeplitz matrix

\notag A(\rho) = \begin{bmatrix}    1          & \rho       & \rho^2 & \dots  & \rho^{n-1} \\    \rho       & 1          & \rho   & \dots  & \rho^{n-2} \\    \rho^2     & \rho       & 1      & \ddots & \vdots     \\    \vdots     & \vdots     & \ddots & \ddots & \rho       \\    \rho^{n-1} & \rho^{n-2} & \dots  & \rho   & 1 \end{bmatrix}.

It has a number of interesting properties.

In MATLAB, a Toeplitz matrix can be constructed using toeplitz(c,r), which produces the matrix with first column c and first row r. Example:

>> n = 5; A = toeplitz(1:n,[1 -2:-1:-n])
A =
     1    -2    -3    -4    -5
     2     1    -2    -3    -4
     3     2     1    -2    -3
     4     3     2     1    -2
     5     4     3     2     1

References

  • Gene Golub and Charles F. Van Loan, Matrix Computations, fourth edition, Johns Hopkins University Press, Baltimore, MD, USA, 2013. Section 4.7.

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.