What Is the Schur Complement of a Matrix?

For an n\times n matrix

\notag   A =   \begin{bmatrix} A_{11} & A_{12} \\                   A_{21} & A_{22}   \end{bmatrix} \qquad (1)

with nonsingular (1,1) block A_{11} the Schur complement is A_{22} - A_{21}A_{11}^{-1}A_{12}. It is denoted by A/A_{11}. The block with respect to which the Schur complement is taken need not be the (1,1) block. Assuming that A_{22} is nonsingular, the Schur complement of A_{22} in A is A_{11} - A_{12}A_{22}^{-1}A_{21}.

More generally, for an index vector \alpha = [i_1,i_2,\dots,i_k], where 1\le i_1 < i_2 < \cdots < i_p \le n, the Schur complement of A(\alpha,\alpha) in A is A(\alpha^c,\alpha^c) - A(\alpha^c,\alpha) A(\alpha,\alpha)^{-1} A(\alpha,\alpha^c), where \alpha^c is the complement of \alpha (the vector of indices not in \alpha). Most often, it is the Schur complement of A_{11} in A that is of interest, and the general case can be reduced to it by row and column permutations that move A(\alpha,\alpha) to the (1,1) block of A.

Schur Complements in Gaussian Elimination

The reduced submatrices in Gaussian elimination are Schur complements. Write an n\times n matrix A with nonzero (1,1) element \alpha as

\notag   A =   \begin{bmatrix} \alpha & a^*   \\                    b     & C   \end{bmatrix}    =   \begin{bmatrix}  1            & 0      \\                    b/\alpha     & I_{n-1}   \end{bmatrix}   \begin{bmatrix}  \alpha       & a^*    \\                     0           & C - ba^*/\alpha   \end{bmatrix}.

This factorization represents the first step of Gaussian elimination.

After k stages of Gaussian elimination we have computed the following factorization, in which the (1,1) blocks are k\times k:

\notag   \begin{bmatrix} L_{11} & 0      \\                   L_{21} & I   \end{bmatrix}   \begin{bmatrix} A_{11} & A_{12} \\                   A_{21} & A_{22}   \end{bmatrix}  =   \begin{bmatrix} U_{11} & U_{12}      \\                      0   & S   \end{bmatrix},

where the first matrix on the left is the product of the elementary transformations that reduce the first k columns to upper triangular form. Equating (2,1) and (2,2) blocks in this equation gives L_{21}A_{11} + A_{21} = 0 and L_{21}A_{12} + A_{22} = S. Hence S= A_{22} + L_{21}A_{12}                       = A_{22} + (-A_{21}A_{11}^{-1})A_{12}, which is the Schur complement of A_{11} in A.

The next step of the elimination, which zeros out the first column of S below the diagonal, succeeds as long as the (1,1) element of S is nonzero (or if the whole first column of S is zero, in which case there is nothing to do, and A is singular in this case).

For a number of matrix structures, such as

  • Hermitian (or real symmetric) positive definite matrices,
  • totally positive matrices,
  • matrices diagonally dominant by rows or columns,
  • M-matrices,

one can show that the Schur complement inherits the structure. For these four structures the (1,1) element of the matrix is nonzero, so the success of Gaussian elimination (or equivalently, the existence of an LU factorization) is guaranteed. In the first three cases the preservation of structure is also the basis for a proof of the numerical stability of Gaussian elimination.

Inverse of a Block Matrix

The Schur complement arises in formulas for the inverse of a block 2\times 2 matrix. If A_{11} is nonsingular, we can write

\notag   A = \begin{bmatrix}A_{11} & A_{12} \\ A_{21} & A_{22} \end{bmatrix}     = \begin{bmatrix}I & 0 \\ A_{21}A_{11}^{-1} & I \end{bmatrix}       \begin{bmatrix}A_{11} & A_{12} \\ 0 & S \end{bmatrix}, \quad                       S = A_{22} - A_{21}A_{11}^{-1}A_{12}.  \qquad (2)

If S is nonsingular then inverting gives

\notag \begin{aligned}   A^{-1} &=       \begin{bmatrix}A_{11}^{-1} & -A_{11}^{-1}A_{12}S^{-1} \\                     0 & S^{-1}                     \end{bmatrix}                         \begin{bmatrix}I & 0 \\ -A_{21}A_{11}^{-1} & I                           \end{bmatrix} \notag\\         &=       \begin{bmatrix}A_{11}^{-1} + A_{11}^{-1}A_{12}S^{-1}A_{21}A_{11}^{-1} & -A_{11}^{-1}A_{12}S^{-1} \\                -S^{-1}A_{21}A_{11}^{-1} & S^{-1}       \end{bmatrix}. \end{aligned}

So S^{-1} is the (2,2) block of A^{-1}.

One can obtain an analogous formula for A^{-1} in terms of the Schur complement of A_{22} in A, in which the inverse of the Schur complement is the (1,1) block of A^{-1}. More generally, we have (A/A(\alpha,\alpha))^{-1} = A^{-1}(\alpha^c,\alpha^c).

Test for Positive Definiteness

For Hermitian matrices the Schur complement provides a test for positive definiteness. Suppose

\notag   A =   \begin{bmatrix} A_{11} & A_{12} \\                   A_{12}^* & A_{22}   \end{bmatrix}

with A_{11} positive definite. The factorization

\notag  A =  \begin{bmatrix} I & 0 \cr A_{12}^*A_{11}^{-1} & I \end{bmatrix}       \begin{bmatrix} A_{11} & 0 \cr 0 & S \end{bmatrix}        \begin{bmatrix} I & A_{11}^{-1}A_{12} \cr 0 & I \end{bmatrix}, \quad    S = A_{22} - A_{12}^*A_{11}^{-1}A_{12},

shows that A is congruent to the block diagonal matrix \mathrm{diag}(A_{11},S), and since congruences preserve inertia (the number of positive, zero, and negative eigenvalues), A is positive definite if and only if \mathrm{diag}(A_{11},S) is positive definite, that is, if and only if S is positive definite. The same equivalence holds with “definite” replace by “semidefinite” (with A_{11} still required to be positive definite).

Generalized Schur Complement

For A in (1) with a possibly singular (1,1) block A_{11} we can define the generalized Schur complement A_{22} - A_{21}A_{11}^+A_{12}, where “+” denotes the Moore-Penrose pseudo-inverse. The generalized Schur complement is useful in the context of Hermitian positive semidefinite matrices, as the following result shows.

Theorem 1.

If

\notag        A = \begin{bmatrix} A_{11} & A_{12} \\                            A_{12}^* & A_{22}            \end{bmatrix} \in\mathbb{C}^{n\times n}

is Hermitian then A is positive semidefinite if and only if \mathrm{range}(A_{12}) \subseteq \mathrm{range}(A_{11}) and A_{11} and the generalized Schur complement S = A_{22} - A_{12}^*A_{11}^+ A_{12} are both positive semidefinite.

If A_{11} is positive definite then the range condition is trivially satisfied.

Notes and References

The term “Schur complement” was coined by Haynsworth in 1968, thereby focusing attention on this important form of matrix and spurring many subsequent papers that explore its properties and applications. The name “Schur” was chosen because a 1917 determinant lemma of Schur says that if A_{11} is nonsingular then

\notag    \det\left(        \begin{bmatrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{bmatrix} \right)      = \det(A_{11}) \det(A_{22} - A_{21}A_{11}^{-1} A_{12})      = \det(A_{11}) \det(A/A_{11}),

which is obtained by taking determinants in (2).

For much more about the Schur complement see Zhang (2005) or Horn and Johnson (2013).

What Is the Minimal Polynomial of a Matrix?

For a polynomial

\notag  \phi(t) = a_kt^k + \cdots + a_1t + a_0,

where a_k\in\mathbb{C} for all k, the matrix polynomial obtained by evaluating \phi at A\in\mathbb{C}^{n \times n} is

\notag  \phi(A) = a_kA^k + \cdots + a_1A + a_0 I.

(Note that the constant term is a_0 A^0 = a_0 I). The polynomial \phi is monic if a_k = 1.

The characteristic polynomial of a matrix A\in\mathbb{C}^{n \times n} is p(t) = \det(t I - A), a degree n monic polynomial whose roots are the eigenvalues of A. The Cayley–Hamilton theorem tells us that p(A) = 0, but p may not be the polynomial of lowest degree that annihilates A. The monic polynomial \psi of lowest degree such that \psi(A) = 0 is the minimal polynomial of A. Clearly, \psi has degree at most n.

The minimal polynomial divides any polynomial \phi such that \phi(A) = 0, and in particular it divides the characteristic polynomial. Indeed by polynomial long division we can write \phi(t) = q(t)\psi(t) + r(t), where the degree of r is less than the degree of \psi. Then

\notag      0 = \phi(A) = q(A) \psi(A) + r(A) = r(A).

If r\ne 0 then we have a contradiction to the minimality of the degree of \psi. Hence r = 0 and so \psi divides \phi.

The minimal polynomial is unique. For if \psi_1 and \psi_2 are two different monic polynomials of minimum degree m such that \psi_i(A) = 0, i = 1,2, then \psi_3 = \psi_1 - \psi_2 is a polynomial of degree less than m and \psi_3(A) = \psi_1(A) - \psi_2(A) = 0, and we can scale \psi_3 to be monic, so by the minimality of m, \psi_3 = 0, or \psi_1 = \psi_2.

If A has distinct eigenvalues then the characteristic polynomial and the minimal polynomial are equal. When A has repeated eigenvalues the minimal polynomial can have degree less than n. An extreme case is the identity matrix, for which \psi(t) = t - 1, since \psi(I) = I - I = 0. On the other hand, for the Jordan block

\notag  J = \begin{bmatrix}      \lambda & 1 & 0 \\      0 & \lambda & 1 \\      0 & 0 & \lambda   \end{bmatrix}

the characteristic polynomial and the minimal polynomial are both equal to (\lambda - 1)^3.

The minimal polynomial has degree less than n when in the Jordan canonical form of A an eigenvalue appears in more than one Jordan block. Indeed it is not hard to show that the minimal polynomial can be written

\notag     \psi(t) = \displaystyle\prod_{i=1}^s (t-\lambda_i)^{n_i},

where \lambda_1,\lambda_2,\dots,\lambda_s are the distinct eigenvalues of A and n_i is the dimension of the largest Jordan block in which \lambda_i appears. This expression is composed of linear factors (that is, n_i = 1 for all i) if and only if A is diagonalizable.

To illustrate, for the matrix

\notag  A = \left[\begin{array}{cc|c|c|c}      \lambda &  1      &         &     &   \\              & \lambda &         &     &   \\\hline              &         & \lambda &     &   \\\hline              &         &         & \mu &    \\\hline              &         &         &     & \mu   \end{array}\right]

in Jordan form (where blank elements are zero), the minimal polynomial is \psi(t) = (t-\lambda)^2(t-\mu), while the characteristic polynomial is p(t) = (t-\lambda)^3(t-\mu)^2.

What is the minimal polynomial of a rank-1 matrix, A = xy^* \ne 0? Since A^2 = (y^*x) xy^*, we have q(A) = 0 for q(t) = t^2 - (y^*x) t = t^2 - \mathrm{trace}(A) t. For any linear polynomial p(t) = t - a_0, p(A) = xy^* - a_0 I, which is nonzero since xy^* has rank 1 and I has rank n. Hence the minimal polynomial is q.

The minimal polynomial is important in the theory of matrix functions and in the theory of Krylov subspace methods. One does not normally need to compute the minimal polynomial in practice.

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

What Is Iterative Refinement?

Iterative refinement is a method for improving the quality of an approximate solution \widehat{x} to a linear system of equations Ax = b, where A is an n\times n nonsingular matrix. The basic iterative refinement algorithm is very simple.

  1. Compute the residual r = b - A\widehat{x}.
  2. Solve Ad = r.
  3. Update \widehat{x} \gets \widehat{x} + d.
  4. Repeat from step 1 if necessary.

At first sight, this algorithm seems as expensive as solving for the original \widehat{x}. However, usually the solver is LU factorization with pivoting, A = LU (where we include the permutations in L). Most of the work is in the LU factorization, which costs O(n^3) flops, and each iteration requires a multiplication with A for the residual and two substitutions to compute d, which total only O(n^2) flops. If the refinement converges quickly then it is inexpensive.

Turning to the error, with a stable LU factorization the initial \widehat{x} computed in floating-point arithmetic of precision u satisfies (omitting constants)

\notag     \displaystyle\frac{\|x- \widehat{x}\|}{\|x\|}\lesssim \kappa_{\infty}(A) u, \qquad (1)

where \kappa_{\infty}(A) = \|A\|_{\infty} \|A^{-1}\|_{\infty} \ge 1 is the matrix condition number in the \infty-norm. We would like refinement to produce a solution accurate to precision u:

\notag     \displaystyle\frac{\|x- \widehat{x}\|}{\|x\|}\approx u. \qquad (2)

But if the solver cannot compute the initial \widehat{x} accurately when A is ill-conditioned why should it be able to produce an update d that improves \widehat{x}?

The simplest answer is that when iterative refinement was first used on digital computers the residual r was computed at twice the working precision, which could be done at no extra cost in the hardware. If \widehat{x} is a reasonable approximation then we expect cancellation in forming r = b - A\widehat{x}, so using extra precision in forming r ensures that r has enough correct digits to yield a correction d that improves \widehat{x}. This form of iterative refinement produces a solution satisfying (2) as long as \kappa_{\infty}(A) < u^{-1}.

Here is a MATLAB example, where the working precision is single and residuals are computed in double precision.

n = 8; A = single(gallery('frank',n)); xact = ones(n,1);
b = A*xact; % b is formed exactly for small n.
x = A\b;
fprintf('Initial_error = %4.1e\n', norm(x - xact,inf))
r = single( double(b) - double(A)*double(x) );
d = A\r;
x = x + d;
fprintf('Second error  = %4.1e\n', norm(x - xact,inf))

The output is

Initial_error = 9.1e-04
Second error  = 6.0e-08

which shows that after just one step the error has been brought down from 10^{-4} to the level of u_s\approx 5\times 10^{-8}, the unit roundoff for IEEE single precision arithmetic.

Fixed Precision Iterative Refinement

By the 1970s, computers had started to lose the ability to cheaply accumulate inner products in extra precision, and extra precision could not be programmed portably in software. It was discovered, though, that even if iterative refinement is run entirely in one precision it can bring benefits when \kappa_{\infty}(A) < u^{-1}. Specifically,

  • if the solver is somewhat numerically unstable the instability is cured by the refinement, in that a relative residual satisfying

    \notag     \displaystyle\frac{\|b-A\widehat{x}\|_{\infty}}{\|A\|_{\infty} \|\widehat{x}\|_{\infty} + \|b\|_{\infty}} \approx u \qquad(3)

    is produced, and

  • a relative error satisfying

    \notag     \displaystyle\frac{\|x- \widehat{x}\|_{\infty}}{\|x\|_{\infty}}      \lesssim \mathrm{cond}(A,x) u   \qquad (4)

    is produced, where

\notag    \mathrm{cond}(A,x) = \displaystyle\frac{ \|\, |A^{-1}| |A| |x| \,\|_{\infty} } {\|x\|_{\infty}}.

The bound (4) is stronger than (1) because \mathrm{cond}(A,x) is no larger than \kappa_{\infty}(A) and can be much smaller, especially if A has badly scaled rows.

Low Precision Factorization

In the 2000s processors became available in which 32-bit single precision arithmetic ran at twice the speed of 64-bit double precision arithmetic. A new usage of iterative refinement was developed in which the working precision is double precision and a double precision matrix A is factorized in single precision.

  1. Factorize A = LU in single precision.
  2. Solve LUx = b by substitution in single precision, obtaining \widehat{x}.
  3. Compute the residual r = b - A\widehat{x} in double precision.
  4. Solve LUd = r in single precision.
  5. Update \widehat{x} \gets \widehat{x} + d in double precision.
  6. Repeat from step 3 if necessary.

Since most of the work is in the single precision factorization, this algorithm is potentially twice as fast as solving Ax=b entirely in double precision arithmetic. The algorithm achieves the same limiting accuracy (4) and limiting residual (3) provided that \kappa_{\infty}(A) < u_s^{-1} \approx 2\times 10^7.

GRMES-IR

A way to weaken this restriction on \kappa_{\infty}(A) is to use a different solver on step 4: solve

\notag      \widetilde{A}d := U^{-1}L^{-1}Ad = U^{-1}L^{-1}r

by GMRES (a Krylov subspace iterative method) in double precision. Within GMRES, \widetilde{A}d is not formed but is applied to a vector as a multiplication with A followed by substitutions with L and U. As long as GMRES converges quickly for this preconditioned system the speed again from the fast single precision factorization will not be lost. Moreover, this different step 4 results in convergence for \kappa_{\infty}(A) a couple of orders magnitude larger than before, and if residuals are computed in quadruple precision then a limiting accuracy (2) is achieved.

This GMRES-based iterative refinement becomes particularly advantageous when the fast half precision arithmetic now available in hardware is used within the LU factorization, and one can use three or more precisions in the algorithm in order to balance speed, accuracy, and the range of problems that can be solved.

Finally, we note that iterative refinement can also be applied to least squares problems, eigenvalue problems, and the singular value decomposition. See Higham and Mary (2022) for details and references.

References

We give five references, which contain links to the earlier literature.

Related Blog Posts

What Is the Trace of a Matrix?

The trace of an n\times n matrix is the sum of its diagonal elements: \mathrm{trace}(A) = \sum_{i=1}^n a_{ii}. The trace is linear, that is, \mathrm{trace}(A+B) = \mathrm{trace}(A) + \mathrm{trace}(B), and \mathrm{trace}(A) = \mathrm{trace}(A^T).

A key fact is that the trace is also the sum of the eigenvalues. The proof is by considering the characteristic polynomial p(t) = \det(t I - A) = t^n + a_{n-1}t^{n-1} + \dots + a_1 t + a_0. The roots of p are the eigenvalues \lambda_1, \lambda_2, \dots, \lambda_n of A, so p can be factorized

\notag  p(t) = (t - \lambda_1) (t - \lambda_2) \dots (t - \lambda_n),

and so a_{n-1} = -(\lambda_1 + \lambda_2 + \cdots + \lambda_n). The Laplace expansion of \det(t I - A) shows that the coefficient of t^{n-1} is -(a_{11} + a_{22} + \cdots + a_{nn}). Equating these two expressions for a_{n-1} gives

\notag     \mathrm{trace}(A) = \displaystyle\sum_{i=1}^n \lambda_i.  \qquad\qquad(1)

A consequence of (1) is that any transformation that preserves the eigenvalues preserves the trace. Therefore the trace is unchanged under similarity transformations: \mathrm{trace}(X^{-1}AX) = \mathrm{trace}(A) for any nonsingular X.

An an example of how the trace can be useful, suppose A is a symmetric and orthogonal n\times n matrix, so that its eigenvalues are \pm 1. If there are p eigenvalues 1 and q eigenvalues -1 then \mathrm{trace}(A) = p - q and n = p + q. Therefore p = (n + \mathrm{trace}(A))/2 and q = (n - \mathrm{trace}(A))/2.

Another important property is that for an m\times n matrix A and an n\times m matrix B,

\notag  \mathrm{trace}(AB) = \mathrm{trace}(BA) \qquad\qquad(2)

(despite the fact that AB \ne BA in general). The proof is simple:

\notag \begin{aligned}   \mathrm{trace}(AB) &= \displaystyle\sum_{i=1}^m (AB)_{ii}                      = \sum_{i=1}^m \sum_{k=1}^n a_{ik} b_{ki}                      = \sum_{k=1}^n \sum_{i=1}^m b_{ki} a_{ik} \\                      & = \sum_{k=1}^n (BA)_{kk}                      = \mathrm{trace}(BA). \end{aligned}

This simple fact can have non-obvious consequences. For example, consider the equation AX - XA = I in n\times n matrices. Taking the trace gives 0 = \mathrm{trace}(AX) - \mathrm{trace}(XA)    = \mathrm{trace}(AX - XA)    = \mathrm{trace}(I) = n, which is a contradiction. Therefore the equation has no solution.

The relation (2) gives \mathrm{trace}(ABC) = \mathrm{trace}((AB)C) = \mathrm{trace}(C(AB)) = \mathrm{trace}(CAB) for n\times n matrices A, B, and C, that is,

\notag   \mathrm{trace}(ABC) = \mathrm{trace}(CAB). \qquad\qquad(3)

So we can cyclically permute terms in a matrix product without changing the trace.

As an example of the use of (2) and (3), if x and y are n-vectors then \mathrm{trace}(xy^T) = \mathrm{trace}(y^Tx) = y^Tx. If A is an n\times n matrix then \mathrm{trace}(xy^TA) can be evaluated without forming the matrix xy^TA since, by (3), \mathrm{trace}(xy^TA) =                 \mathrm{trace}(y^TAx) = y^TAx.

The trace is useful in calculations with the Frobenius norm of an m\times n matrix:

\notag     \|A\|_F = \left(\displaystyle\sum_{i=1}^m \sum_{j=1}^n |a_{ij}|^2\right)^{1/2}             = \bigr(\mathrm{trace}(A^*A)\bigr)^{1/2},

where * denotes the conjugate transpose. For example, we can generalize the formula |x+\mathrm{i}y|^2 = x^2 + y^2 for a complex number to an m\times n matrix A by splitting A into its Hermitian and skew-Hermitian parts:

\notag  A = \frac{1}{2}(A+A^*) + \frac{1}{2}(A-A^*) \equiv B + C,

where B = B^* and C = -C^*. Then

\notag \begin{aligned}     \|A\|_F^2 =     \|B + C\|_F^2 &= \mathrm{trace}((B+C)^*(B+C))\\                 &= \mathrm{trace}(B^*B + C^*C) + \mathrm{trace}(B^*C + C^*B)\\                 &= \mathrm{trace}(B^*B + C^*C) + \mathrm{trace}(BC - CB)\\                 &= \mathrm{trace}(B^*B + C^*C)\\                 &= \|B\|_F^2 + \|C\|_F^2. \end{aligned}

If a matrix is not explicitly known but we can compute matrix–vector products with it then the trace can be estimated by

\notag      \mathrm{trace}(A) \approx x^TAx,

where the vector x has elements independently drawn from the standard normal distribution with mean 0 and variance 1. The expectation of this estimate is

\notag \begin{aligned}     E\bigl( x^TAx \bigr)  &=     E\bigl( \mathrm{trace}(x^TAx) \bigr)  =     E\bigl( \mathrm{trace}(Axx^T) \bigr)  =     \mathrm{trace} \bigl( E(Axx^T) \bigr) \\      &=  \mathrm{trace}\bigl(A E(xx^T) \bigr)       = \mathrm{trace}(A), \end{aligned}

since E(x_ix_j) = 0 for i\ne j and E(x_i^2) = 1 for all i, so E(xx^T) = I. This stochastic estimate, which is due to Hutchinson, is therefore unbiased.

References

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

What Is a Diagonalizable Matrix?

A matrix A \in\mathbb{C}^{n\times n} is diagonalizable if there exists a nonsingular matrix X\in\mathbb{C}^{n\times n} such that X^{-1}AX is diagonal. In other words, a diagonalizable matrix is one that is similar to a diagonal matrix.

The condition X^{-1}AX = D = \mathrm{diag}(\lambda_i) is equivalent to AX = XD with X nonsingular, that is, Ax_i = \lambda_ix_i, i=1\colon n, where X = [x_1,x_2,\dots, x_n]. Hence A is diagonalizable if and only if it has a complete set of linearly independent eigenvectors.

A Hermitian matrix is diagonalizable because the eigenvectors can be taken to be mutually orthogonal. The same is true for a normal matrix (one for which A^*A = AA^*). A matrix with distinct eigenvalues is also diagonalizable.

Theorem 1.

If A\in\mathbb{C}^{n\times n} has distinct eigenvalues then it is diagonalizable.

Proof. Let A have eigenvalues \lambda_1,\lambda_2,\dots,\lambda_n with corresponding eigenvectors x_1,x_2,\dots,x_n. Suppose that y = \sum_{i=1}^n \alpha_i x_i = 0 for some \alpha_1,\alpha_2,\dots,\alpha_n. Then

\notag \begin{aligned}  0 &= (A-\lambda_2 I)\cdots (A-\lambda_n I)y    = \sum_{i=1}^n \alpha_i (A-\lambda_2 I)\cdots (A-\lambda_n I) x_i\\    &= \sum_{i=1}^n \alpha_i (\lambda_i-\lambda_2)\cdots(\lambda_i-\lambda_n)x_i    =  \alpha_1 (\lambda_1-\lambda_2)\cdots(\lambda_1-\lambda_n)x_1, \end{aligned}

which implies \alpha_1 = 0 since \lambda_1\ne \lambda_j for j\ge 2 and x_1\ne0. Premultiplying y = \sum_{i=2}^n \alpha_i x_i = 0 by \prod_{j=3}^n(A-\lambda_j I) shows, in the same way, that \alpha_2 = 0. Continuing in this way we find that \alpha_1 = \alpha_2 = \cdots = \alpha_n = 0. Therefore the x_i are linearly independent and hence A is diagonalizable.

A matrix can have repeated eigenvalues and be diagonalizable, as diagonal matrices with repeated diagonal entries show. What is needed for diagonalizability is that every k-times repeated eigenvalue has k linearly independent eigenvectors associated with it. Equivalently, the algebraic and geometric multiplicities of every eigenvalue must be equal, that is, the eigenvalues must all be semisimple. Another equivalent condition is that the degree of the minimal polynomial is equal to the number of distinct eigenvalues.

The simplest example of a matrix that is not diagonalizable is \bigl[\begin{smallmatrix} 0 & 1 \\ 0 & 0 \end{smallmatrix}\bigr]. This matrix is a 2\times 2 Jordan block with the eigenvalue 0. Diagonalizability is easily understood in terms of the Jordan canonical form: A is diagonalizable if and only if all the Jordan blocks in its Jordan form are 1\times 1.

Most matrices are diagonalizable, in the sense that the diagonalizable matrices are dense in \mathbb{C}^{n\times n}, that is, any matrix in \mathbb{C}^{n\times n} is arbitrarily close to a diagonalizable matrix. This property is useful because it can be convenient to prove a result by first proving it for diagonalizable matrices and then arguing that by continuity the result holds for a general matrix.

Is a rank-1 matrix A = xy^* diagonalizable, where x,y\in\mathbb{C}^n are nonzero? There are n-1 zero eigenvalues with eigenvectors any set of linearly independent vectors orthogonal to y. If y^*x \ne 0 then y^*x is the remaining eigenvalue, with eigenvector x, which is linearly independent of the eigenvectors for 0, and A is diagonalizable. If y^*x = 0 then all the eigenvalues of A are zero and so A cannot be diagonalizable, as the only diagonalizable matrix whose eigenvalues are all zero is the zero matrix. For the matrix \bigl[\begin{smallmatrix} 0 & 1 \\ 0 & 0 \end{smallmatrix}\bigr] mentioned above, x = \bigl[{1 \atop 0} \bigr] and y = \bigl[{0 \atop 1} \bigr], so y^*x = 0, confirming that this matrix is not diagonalizable.

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

What Is a Stochastic Matrix?

A stochastic matrix is an n\times n matrix with nonnegative entries and unit row sums. If A\in\mathbb{R}^{n\times n} is stochastic then Ae = e, where e = [1,1,\dots,1]^T is the vector of ones. This means that e is an eigenvector of A corresponding to the eigenvalue 1.

The identity matrix is stochastic, as is any permutation matrix. Here are some other examples of stochastic matrices:

\notag \begin{aligned}   A_n &= n^{-1}ee^T, \quad \textrm{in particular}~~     A_3 = \begin{bmatrix}    \frac{1}{3} & \frac{1}{3} & \frac{1}{3}\\[3pt]    \frac{1}{3} & \frac{1}{3} & \frac{1}{3}\\[3pt]    \frac{1}{3} & \frac{1}{3} & \frac{1}{3}    \end{bmatrix}, \qquad (1)\\   B_n &= \frac{1}{n-1}(ee^T -I), \quad \textrm{in particular}~~     B_3 = \begin{bmatrix}              0 & \frac{1}{2} & \frac{1}{2}\\[2pt]    \frac{1}{2} & 0           & \frac{1}{2}\\[2pt]    \frac{1}{2} & \frac{1}{2} & 0    \end{bmatrix},    \qquad (2)\\   C_n &= \begin{bmatrix}             1         &                   &           &    \\      \frac{1}{2}      &  \frac{1}{2}      &           &    \\      \vdots           &  \vdots           &\ddots     &    \\      \frac{1}{n}      &  \frac{1}{n}      &\cdots     &  \frac{1}{n}      \end{bmatrix}.   \qquad (3)\\ \end{aligned}

For any matrix A, the spectral radius \rho(A) = \max\{ |\lambda| : \lambda \mathrm{~is~an~eigenvalue~of~} A \} is bounded by \rho(A) \le \|A\| for any norm. For a stochastic matrix, taking the \infty-norm (the maximum row sum of absolute values) gives \rho(A) \le 1, so since we know that 1 is an eigenvalue, \rho(A) = 1. It can be shown that 1 is a semisimple eigenvalue, that is, if there are k eigenvalues equal to 1 then there are k linearly independent eigenvectors corresponding to 1 (Meyer, 2000, p. 696).

A lower bound on the spectral radius can be obtained from Gershgorin’s theorem. The ith Gershgorin disc is defined by |\lambda - a_{ii}| \le \sum_{j\ne i} |a_{ij}| = 1 - a_{ii}, which implies |\lambda| \ge 2a_{ii}-1. Every eigenvalue \lambda lies in the union of the n discs and so must satisfy

\notag         2\min_i a_{ii}-1 \le |\lambda| \le 1.

The lower bound is positive if A is strictly diagonally dominant by rows.

If A and B are stochastic then AB is nonnegative and ABe = Ae = e, so AB is stochastic. In particular, any positive integer power of A is stochastic. Does A^k converge as k\to\infty? The answer is that it does, and the limit is stochastic, as long as 1 is the only eigenvalue of modulus 1, and this will be the case if all the elements of A are positive (by Perron’s theorem). The simplest example of non-convergence is the stochastic matrix

\notag    A = \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix},

which has eigenvalues 1 and -1. Since A^2 = I, all even powers are equal to I and all odd powers are equal to A. For the matrix (1), A_n^k = A_n for all k, while for (2), B_n^k \to A_n as k\to\infty. For (3), C_n^k converges to the matrix with 1 in every entry of the first column and zeros everywhere else.

Stochastic matrices arise in Markov chains. A transition matrix for a finite homogeneous Markov chain is a matrix whose (i,j) element is the probability of moving from state i to state j over a time step. It has nonnegative entries and the rows sum to 1, so it is a stochastic matrix. In applications including finance and healthcare, a transition matrix may be estimated for a certain time period, say one year, but a transition matrix for a shorter period, say one month, may be needed. If A is a transition matrix for a time period p then a stochastic pth root of A, which is a stochastic solution X of the equation X^p = A, is a transition matrix for a time period a factor p smaller. Therefore p = 12 (years to months) and p = 7 (weeks to days) are among the values of interest. Unfortunately, a stochastic pth root may not exist. For example, the matrix

\notag \begin{bmatrix}    0 & 1 & 0 \\    0 & 0 & 1 \\    0 & 0 & 1 \end{bmatrix}

has no pth roots at all, let alone stochastic ones. Yet many stochastic matrices do have stochastic roots. The matrix (3) has a stochastic pth root for all p, as shown by Higham and Lin (2011), who give a detailed analysis of pth roots of stochastic matrices. For example, to four decimal places,

\notag  C_6^{1/7} = \begin{bmatrix} 1.000  &        &        &        &        &        \\ 0.094  & 0.906  &        &        &        &        \\ 0.043  & 0.102  & 0.855  &        &        &        \\ 0.027  & 0.050  & 0.103  & 0.820  &        &        \\ 0.019  & 0.032  & 0.052  & 0.103  & 0.795  &        \\ 0.014  & 0.023  & 0.034  & 0.053  & 0.102  & 0.774  \\ \end{bmatrix}.

A stochastic matrix is sometime called a row-stochastic matrix to distinguish it from a column-stochastic matrix, which is a nonnegative matrix for which e^TA = e^T (so that A^T is row-stochastic). A matrix that is both row-stochastic and column-stochastic is called doubly stochastic. A permutation matrix is an example of a doubly stochastic matrix. If U is a unitary matrix then the matrix with a_{ij} = |u_{ij}|^2 is doubly stochastic. A magic square scaled by the magic sum is also doubly stochastic. For example,

>> M = magic(4), A = M/sum(M(1,:)) % A is doubly stochastic.
M =
    16     2     3    13
     5    11    10     8
     9     7     6    12
     4    14    15     1
A =
   4.7059e-01   5.8824e-02   8.8235e-02   3.8235e-01
   1.4706e-01   3.2353e-01   2.9412e-01   2.3529e-01
   2.6471e-01   2.0588e-01   1.7647e-01   3.5294e-01
   1.1765e-01   4.1176e-01   4.4118e-01   2.9412e-02
>> [sum(A) sum(A')]
ans =
     1     1     1     1     1     1     1     1
>> eig(A)'
ans =
   1.0000e+00   2.6307e-01  -2.6307e-01   8.5146e-18

References

  • Nicholas J. Higham and Lijing Lin, On pth Roots of Stochastic Matrices, Linear Algebra Appl. 435, 448–463, 2011.
  • Carl D. Meyer, Matrix Analysis and Applied Linear Algebra, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2000.

Related Blog Posts

What Is the Inertia of a Matrix?

The inertia of a real symmetric n\times n matrix A is a triple, written \mathrm{In}(A) = (i_+(A),i_-(A),i_0(A)), where i_+(A) is the number of positive eigenvalues of A, i_-(A) is the number of negative eigenvalues of A, and i_0(A) is the number of zero eigenvalues of A.

The rank of A is i_+(A) + i_-(A). The difference i_+(A) - i_-(A) is called the signature.

In general it is not possible to determine the inertia by inspection, but some deductions can be made. If A has both positive and negative diagonal elements then i_+(A) > 1 and i_-(A) > 1. But in general the diagonal elements do not tell us much about the inertia. For example, here is a matrix that has positive diagonal elements but only one positive eigenvalue (and this example works for any n):

>> n = 4; A = -eye(n) + 2*ones(n), eigA = eig(sym(A))'
A =
     1     2     2     2
     2     1     2     2
     2     2     1     2
     2     2     2     1
eigA =
[-1, -1, -1, 7]

A congruence transformation of a symmetric matrix A is a transformation A \to X^T\!AX for a nonsingular matrix X. The result is clearly symmetric. Sylvester’s law of inertia (1852) says that the inertia is preserved under congruence transformations.

Theorem 1 (Sylvester’s law of inertia).

If A\in\mathbb{R}^{n\times n} is symmetric and X\in\mathbb{R}^{n\times n} is nonsingular then \mathrm{In}(A) = \mathrm{In}(X^T\!AX).

Sylvester’s law gives a way to determine the inertia without computing eigenvalues: find a congruence transformation that transforms A to a matrix whose inertia can be easily determined. A factorization PAP^T = LDL^T does the job, where P is a permutation matrix, L is unit lower triangular, and D is diagonal Then \mathrm{In}(A) = \mathrm{In}(D), and \mathrm{In}(D) can be read off the diagonal of D. This factorization does not always exist, and if it does exist is can be numerically unstable. A block LDL^T factorization, in which D is block diagonal with diagonal blocks of size 1 or 2, always exists, and its computation is numerically stable with a suitable pivoting strategy such as symmetric rook pivoting.

For the matrix above we can compute a block LDL^T factorization using the MATLAB ldl function:

>> [L,D,P] = ldl(A); D
D =
   1.0000e+00   2.0000e+00            0            0
   2.0000e+00   1.0000e+00            0            0
            0            0  -1.6667e+00            0
            0            0            0  -1.4000e+00

Since the leading 2-by-2 block of D has negative determinant and hence one positive eigenvalue and one negative eigenvalue, it follows that A has one positive eigenvalue and three negative eigenvalues.

A congruence transformation preserves the signs of the eigenvalues but not their magnitude. A result of Ostrowski (1959) bounds the ratios of the eigenvalues of the original and transformed matrices. Let the eigenvalues of a symmetric matrix be ordered \lambda_n \le \cdots \le \lambda_1.

Theorem (Ostrowski).

For a symmetric A\in \mathbb{R}^{n\times n} and X\in\mathbb{R}^{n\times n},

\lambda_k(X^*AX) = \theta_k \lambda_k(A), \quad k=1\colon n,

where \lambda_n(X^*X) \le \theta_k \le \lambda_1(X^*X).

The theorem shows that the further X is from being orthogonal the greater the potential change in the eigenvalues.

Finally, we note that everything here generalizes to complex Hermitian matrices by replacing transpose by conjugate transpose.

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

What Is Gershgorin’s Theorem?

For a given n\times n matrix, Gershgorin’s theorem defines n discs in the complex plane whose union contains the eigenvalues of the matrix. The theorem can provide approximations to eigenvalues. It can also provide qualitative information, such as that all the eigenvalues lie in a particular half-plane.

Theorem 1 (Gershgorin’s theorem).

The eigenvalues of A\in\mathbb{C}^{n\times n} lie in the union of the n discs in the complex plane

\notag      D_i = \Big\{ z\in\mathbb{C}: |z-a_{ii}| \le \displaystyle\sum_{j\ne i}      |a_{ij}|\Big\}, \quad i=1\colon n.

Proof. Let \lambda be an eigenvalue of A and x a corresponding eigenvector and let |x_k| = \|x\|_\infty. From the kth equation in Ax=\lambda x we have

\notag        \sum_{j=1 \atop j\ne k}^n a_{kj}x_j = (\lambda-a_{kk}) x_k.

Hence

\notag    |\lambda-a_{kk}| \le  \sum_{j=1 \atop j\ne k}^n |a_{kj}||x_j|/|x_k|,

and since |x_j|/|x_k|\le 1 it follows that \lambda belongs to the kth disc, D_k.

The Gershgorin discs D_i are defined in terms of a summation over the rows of A, but since the eigenvalues of A are the same as those of A^T the same result holds with summation over the columns.

A consequence of the theorem is that if 0 does not belong to any of the Gershgorin discs then A is nonsingular. This is equivalent to the well-known result that a strictly diagonally dominant matrix is nonsingular.

Another consequence of the theorem is that if A is symmetric and all the discs lie in the open right half-plane then the eigenvalues are positive and so A is positive definite. This condition is equivalent to A having positive diagonal elements and being strictly diagonally dominant.

The Gershgorin discs for the 5\times 5 matrix

\notag \left[\begin{array}{ccccc} 5/4         & 1 & 3/4         & 1/2         & 1/4        \\ 1 & 0 & 0 & 0 & 0\\ -1 & 1 & 0 & 0 & 0\\ 0 & 0 & 1 & 3 & 0\\ 0 & 0 & 0 & 1/2         & 5 \end{array}\right]

are shown here:

gersh1.jpg The eigenvalues—three real and one complex conjugate pair—are the black dots. It happens that each disc contains an eigenvalue, but this is not always the case. For the matrix

\notag \left[\begin{array}{cc} 2 & -1\\ 2 & 0 \end{array}\right]

the discs are

gersh2.jpg and the blue disc does not contain an eigenvalue. The next result, which is proved by a continuity argument, provides additional information that increases the utility of Gershgorin’s theorem. In particular it says that if a disc is disjoint from the other discs then it contains an eigenvalue.

Theorem 2.

If k of the Gershgorin discs of A\in\mathbb{C}^{n\times n} are disjoint from the other discs then their union contains k eigenvalues of A.

Theorem 2 tells us that the rightmost disc in our 5\times 5 example contains one eigenvalue, \lambda, since it is disjoint from the other discs, and the union of the other four discs contains four eigenvalues. Furthermore, \lambda must be real, because if not it occurs in a complex conjugate pair since the matrix is real, and as the disc is symmetric about the real axis \overline{\lambda} would also lie in the disc, contradicting Theorem 2.

Gershgorin’s theorem is most useful for matrices that are close to being diagonal. A technique that can produce improved eigenvalue estimates is to apply the theorem to D^{-1}AD for some nonsingular diagonal matrix D. This similarity transformation does not change the eigenvalues but it does change the discs, and the aim is to choose D to reduce the radiuses of the discs. Consider our 5\times 5 example. We know that there is one eigenvalue \lambda in the rightmost disc and that it is real, so 4.5 \le \lambda \le 5.5. For D = \mathrm{diag}(1,1,1,1,1/4) the rightmost disc shrinks and remains distinct from the others and we obtain the sharper bounds 4.875 \le \lambda \le 5.126. The discs for D^{-1}AD are shown here:

gersh1a.jpg

Notes

Most books on matrix analysis or numerical linear algebra include Gershgorin’s theorem.

Eigenvalue inclusion regions have been developed with discs replaced by more complicated shapes, such as Brauer’s ovals of Cassini.

Varga’s 2004 book is devoted to Gershgorin’s theorem and related results. It reproduces Gershgorin’s 1931 paper in an appendix.

References

  • S. Gerschgorin. Uber die Abgrenzung der Eigenwerte einer Matrix. Izv. Akad. Nauk. SSSR, 1:749–754, 1931.
  • Richard S. Varga. Geršgorin and His Circles. Springer-Verlag, Berlin, Germany, 2004.

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

What Is a Nilpotent Matrix?

An n\times n matrix A is nilpotent if A^k =0 for some positive integer k. A nonzero nilpotent matrix must have both positive and negative entries in order for cancellation to take place in the matrix powers. The smallest k for which A^k =0 is called the index of nilpotency. The index does not exceed n, as we will see below.

Here are some examples of nilpotent matrices.

\notag \begin{aligned}  A_1 &= \begin{bmatrix}0 & 1 \\ 0 & 0 \end{bmatrix}, \quad         A_1^2=0,\\  A_2 &= \begin{bmatrix}0 & 1 & 1\\ 0 & 0 & 1\\ 0 & 0 & 0 \end{bmatrix}, \quad         A_2^3 = 0,\\  A_3 &= \begin{bmatrix}1 & -1 \\ 1 & -1 \end{bmatrix}, \quad         A_3^2 = 0,\\  A_4 &= \begin{bmatrix}2 & 2 & 4\\ -4 & -4 & -8\\ 1 & 1 & 2  \end{bmatrix}, \quad         A_4^2 = 0. \end{aligned}

Matrix A_1 is the 2\times 2 instance of the upper bidiagonal p\times p matrix

\notag N =  \begin{bmatrix}  0   & 1         &          &           \\                           & 0         & \ddots   &           \\                           &           & \ddots   &    1      \\                           &           &          & 0         \end{bmatrix},   \qquad (1)

for which

\notag N^2     = \begin{bmatrix}                0 & 0 & 1      &        &   \\                  & 0 & \ddots & \ddots &   \\                  &   &        & \ddots & 1 \\                  &   &        & \ddots & 0 \\                  &   &        &        & 0            \end{bmatrix}, \quad \dots, \quad N^{p-1} = \begin{bmatrix}                0 & 0 & \dots  & 0      & 1      \\                  & 0 & \ddots &        & 0      \\                  &   & \ddots & \ddots & \vdots \\                  &   &        & 0      & 0      \\                  &   &        &        & 0            \end{bmatrix}

and N^p = 0. The superdiagonal of ones moves up to the right with each increase in the index of the power until it disappears off the top right corner of the matrix.

Matrix A_4 has rank 1 and was constructed using a general formula: if A = xy^T with y^Tx = 0 then A^2 = xy^T xy^T = (y^Tx) xy^T = 0. We simply took orthogonal vectors x =[2, -4, 1]^T and y = [1, 1, 2]^T.

If A is nilpotent then every eigenvalue is zero, since Ax = \lambda x with x\ne 0 implies 0 = A^nx = \lambda^n x or \lambda = 0. Consequently, the trace and determinant of a nilpotent matrix are both zero.

If A is nilpotent and Hermitian or symmetric, or more generally normal (A^*A = AA^*), then A = 0, since such a matrix has a spectral decomposition A = Q \mathrm{diag}(\lambda_i)Q^* and the matrix \mathrm{diag}(\lambda_i) is zero. It is only for nonnormal matrices that nilpotency is a nontrivial property, and the best way to understand it is with the Jordan canonical form (JCF). The JCF of a matrix with only zero eigenvalues has the form A = XJX^{-1}, where J = \mathrm{diag}(J_{m_1},J_{m_2}, \dots, J_{m_p}), where J_{m_i} is of the form (1) and hence J_{m_i}^{m_i} = 0. It follows that the index of nilpotency is k = \max\{\,m_i : i=1\colon p\,\} \le n.

What is the rank of an n\times n nilpotent matrix A? The minimum possible rank is 0, attained for the zero matrix. The maximum possible rank is n-1, attained when the JCF of A has just one Jordan block of size n. Any rank between 0 and n-1 is possible: rank j is attained when there is a Jordan block of size j+1 and all other blocks are 1\times 1.

Finally, while a nilpotent matrix is obviously not invertible, like every matrix it has a Moore–Penrose pseudoinverse. The pseudoinverse of a Jordan block with eigenvalue zero is just the transpose of the block: N^+ = N^T for N in (1).

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

What Is an Eigenvalue?

An eigenvalue of a square matrix A is a scalar \lambda such that Ax = \lambda x for some nonzero vector x. The vector x is an eigenvector of A and it has the distinction of being a direction that is not changed on multiplication by A.

An n\times n matrix has n eigenvalues. This can be seen by noting that Ax = \lambda x is equivalent to (\lambda I - A) x = 0, which means that \lambda I - A is singular, since x\ne 0. Hence \det(\lambda I - A) = 0. But

\notag  \det(\lambda I - A) = \lambda^n +                        a_{n-1}\lambda^{n-1} + \cdots + a_1 \lambda + a_0

is a scalar polynomial of degree n (the characteristic polynomial of A) with nonzero leading coefficient and so has n roots, which are the eigenvalues of A. Since \det(\lambda I - A) = \det( (\lambda I - A)^T)       = \det(\lambda I - A^T), the eigenvalues of A^T are the same as those of A.

A real matrix may have complex eigenvalues, but they appear in complex conjugate pairs. Indeed Ax = \lambda x implies \overline{A}\overline{x} = \overline{\lambda} \overline{x}, so if A is real then \overline{\lambda} is an eigenvalue of A with eigenvector \overline{x}.

Here are some 2\times 2 matrices and their eigenvalues.

\notag \begin{aligned}  A_1 &= \begin{bmatrix}1 & 0 \\ 0 & 1 \end{bmatrix}, \quad         \lambda = 1,1,\\  A_2 &= \begin{bmatrix}0 & 1 \\ 0 & 0 \end{bmatrix}, \quad         \lambda = 0,0,\\  A_3 &= \begin{bmatrix}0 & 1 \\ -1 & 0 \end{bmatrix}, \quad         \lambda = \mathrm{i},\mathrm{-i}. \end{aligned}

Note that A_1 and A_2 are upper triangular, that is, a_{ij} = 0 for i>j. For such a matrix the eigenvalues are the diagonal elements.

A symmetric matrix (A^T = A) or Hermitian matrix (A^* = A, where A^* = \overline{A}^T) has real eigenvalues. A proof is Ax = \lambda x \Rightarrow x^*A^* = \overline{\lambda} x^* so premultiplying the first equation by x^* and postmultiplying the second by x gives x^*Ax = \lambda x^*x and x^*Ax = \overline{\lambda} x^*x, which means that (\lambda-\overline{\lambda})x^*x = 0, or \lambda=\overline{\lambda} since x^*x \ne 0. The matrix A_1 above is symmetric.

A skew-symmetric matrix (A^T = -A) or skew-Hermitian complex matrix (A^* = -A) has pure imaginary eigenvalues. A proof is similar to the Hermitian case: Ax = \lambda x \Rightarrow -x^*A = x^*A^* = \overline{\lambda} x^* and so x^*Ax is equal to both \lambda x^*x and -\overline{\lambda} x^*x, so \lambda = -\overline{\lambda}. The matrix A_3 above is skew-symmetric.

In general, the eigenvalues of a matrix A can lie anywhere in the complex plane, subject to restrictions based on matrix structure such as symmetry or skew-symmetry, but they are restricted to the disc centered at the origin with radius \|A\|, because for any matrix norm \|\cdot\| it can be shown that every eigenvalue satisfies |\lambda| \le \|A\|.

Here are some example eigenvalue distributions, computed in MATLAB. (The eigenvalues are computed at high precision using the Advanpix Multiprecision Computing Toolbox in order to ensure that rounding errors do not affect the plots.) The second and third matrices are real, so the eigenvalues are symmetrically distributed about the real axis. (The first matrix is complex.)

eig_smoke.jpg eig_dramadah.jpg eig_toeppen_inv.jpg

Although this article is about eigenvalues we need to say a little more about eigenvectors. An n\times n matrix A with distinct eigenvalues has n linearly independent eigenvectors. Indeed it is diagonalizable: A = XDX^{-1} for some nonsingular matrix X with D = \mathrm{diag}(\lambda_i) the matrix of eigenvalues. If we write X in terms of its columns as X = [x_1,x_2,\dots,x_n] then AX = XD is equivalent to Ax_i = \lambda _i x_i, i=1\colon n, so the x_i are eigenvectors of A. The matrices A_1 and A_3 above both have two linearly independent eigenvectors.

If there are repeated eigenvalues there can be less than n linearly independent eigenvectors. The matrix A_2 above has only one eigenvector: the vector \left[\begin{smallmatrix}1 \\ 0 \end{smallmatrix}\right] (or any nonzero scalar multiple of it). This matrix is a Jordan block. The matrix A_1 shows that a matrix with repeated eigenvalues can have linearly independent eigenvectors.

Here are some questions about eigenvalues.

  • What matrix decompositions reveal eigenvalues? The answer is the Jordan canonical form and the Schur decomposition. The Jordan canonical form shows how many linearly independent eigenvectors are associated with each eigenvalue.
  • Can we obtain better bounds on where eigenvalues lie in the complex plane? Many results are available, of which the most well-known is Gershgorin’s theorem.
  • How can we compute eigenvalues? Various methods are available. The QR algorithm is widely used and is applicable to all types of eigenvalue problems.

Finally, we note that the concept of eigenvalue is more general than just for matrices: it extends to nonlinear operators on finite or infinite dimensional spaces.

References

Many books include treatments of eigenvalues of matrices. We give just three examples.

  • Gene Golub and Charles F. Van Loan, Matrix Computations, fourth edition, Johns Hopkins University Press, Baltimore, MD, USA, 2013.
  • Roger A. Horn and Charles R. Johnson, Matrix Analysis, second edition, Cambridge University Press, 2013. My review of the second edition.
  • Carl D. Meyer, Matrix Analysis and Applied Linear Algebra, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2000.

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.