- Compute the residual .
- Solve .
- Update .
- Repeat from step 1 if necessary.

At first sight, this algorithm seems as expensive as solving for the original . However, usually the solver is LU factorization with pivoting, (where we include the permutations in ). Most of the work is in the LU factorization, which costs flops, and each iteration requires a multiplication with for the residual and two substitutions to compute , which total only flops. If the refinement converges quickly then it is inexpensive.

Turning to the error, with a stable LU factorization the initial computed in floating-point arithmetic of precision satisfies (omitting constants)

where is the matrix condition number in the -norm. We would like refinement to produce a solution accurate to precision :

But if the solver cannot compute the initial accurately when is ill-conditioned why should it be able to produce an update that improves ?

The simplest answer is that when iterative refinement was first used on digital computers the residual was computed at twice the working precision, which could be done at no extra cost in the hardware. If is a reasonable approximation then we expect cancellation in forming , so using extra precision in forming ensures that has enough correct digits to yield a correction that improves . This form of iterative refinement produces a solution satisfying (2) as long as .

Here is a MATLAB example, where the working precision is single and residuals are computed in double precision.

n = 8; A = single(gallery('frank',n)); xact = ones(n,1); b = A*xact; % b is formed exactly for small n. x = A\b; fprintf('Initial_error = %4.1e\n', norm(x - xact,inf)) r = single( double(b) - double(A)*double(x) ); d = A\r; x = x + d; fprintf('Second error = %4.1e\n', norm(x - xact,inf))

The output is

Initial_error = 9.1e-04 Second error = 6.0e-08

which shows that after just one step the error has been brought down from to the level of , the unit roundoff for IEEE single precision arithmetic.

By the 1970s, computers had started to lose the ability to cheaply accumulate inner products in extra precision, and extra precision could not be programmed portably in software. It was discovered, though, that even if iterative refinement is run entirely in one precision it can bring benefits when . Specifically,

- if the solver is somewhat numerically unstable the instability is cured by the refinement, in that a relative residual satisfying
is produced, and

- a relative error satisfying
is produced, where

The bound (4) is stronger than (1) because is no larger than and can be much smaller, especially if has badly scaled rows.

In the 2000s processors became available in which 32-bit single precision arithmetic ran at twice the speed of 64-bit double precision arithmetic. A new usage of iterative refinement was developed in which the working precision is double precision and a double precision matrix is factorized in single precision.

- Factorize in single precision.
- Solve by substitution in single precision, obtaining .
- Compute the residual in double precision.
- Solve in single precision.
- Update in double precision.
- Repeat from step 3 if necessary.

Since most of the work is in the single precision factorization, this algorithm is potentially twice as fast as solving entirely in double precision arithmetic. The algorithm achieves the same limiting accuracy (4) and limiting residual (3) provided that .

A way to weaken this restriction on is to use a different solver on step 4: solve

by GMRES (a Krylov subspace iterative method) in double precision. Within GMRES, is not formed but is applied to a vector as a multiplication with followed by substitutions with and . As long as GMRES converges quickly for this preconditioned system the speed again from the fast single precision factorization will not be lost. Moreover, this different step 4 results in convergence for a couple of orders magnitude larger than before, and if residuals are computed in quadruple precision then a limiting accuracy (2) is achieved.

This GMRES-based iterative refinement becomes particularly advantageous when the fast half precision arithmetic now available in hardware is used within the LU factorization, and one can use three or more precisions in the algorithm in order to balance speed, accuracy, and the range of problems that can be solved.

Finally, we note that iterative refinement can also be applied to least squares problems, eigenvalue problems, and the singular value decomposition. See Higham and Mary (2022) for details and references.

We give five references, which contain links to the earlier literature.

- Erin Carson and Nicholas J. Higham. Accelerating the Solution of Linear Systems by Iterative Refinement in Three Precisions. SIAM J. Sci. Comput., 40(2):A817–A847,2018.
- Azzam Haidar, Harun Bayraktar, Stanimire Tomov, Jack Dongarra, and Nicholas J. Higham. Mixed-Precision Iterative Refinement Using Tensor Cores on GPUs to Accelerate Solution Of linear systems. Proc. Roy. Soc. London A, 476(2243):20200110, 2020.
- Nicholas J. Higham and Theo Mary. Mixed Precision Algorithms in Numerical Linear Algebra. Acta Numerica, 31:347–414, 2022.
- Nicholas J. Higham and Dennis Sherwood, How to Boost Your Creativity, SIAM News, 55(5):1, 3, 2022. (Explains how developments in iterative refinement 1948–2022 correspond to asking “how might this be different” about each aspect of the algorithm.)
- Julie Langou, Julien Langou, Piotr Luszczek, Jakub Kurzak, Alfredo Buttari, and Jack Dongarra. Exploiting the Performance of 32 Bit Floating Point Arithmetic in Obtaining 64 Bit Accuracy (Revisiting Iterative Refinement for Linear Systems). In Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, IEEE, November 2006.

- A Multiprecision World (2017)
- Accelerating the Solution of Linear Systems by Iterative Refinement in Three Precisions (2017)
- What Is an LU Factorization? (2021)
- What Is IEEE Standard Arithmetic? (2020)

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

I plotted the Gershgorin discs of a stochastic matrix: a matrix with nonnegative elements and row sums all equal to . The Gershgorin discs for an matrix are the discs in the complex plane defined by

Gershgorin’s theorem says that the eigenvalues of lie in the union of the discs.

Why do the discs form this interesting pattern? For a stochastic matrix the th Gershgorin disc is

This disc goes through and the closer is to the smaller the radius of the disc, so the discs are nested, with the disc corresponding to containing all the others.

The matrix used for the plot is `A = anymatrix('core/symmstoch',64)`

from the Anymatrix collection. It has diagonal elements approximately uniformly distributed on , so the centers of the discs are roughly equally spaced and shrink as the centers move to the right.

The image above is for the matrix of dimension . The black dots are the eigenvalues. Here is the plot for . The function used to produce these plots is `gersh`

from the Matrix Computation Toolbox.

Here are two other matrices whose Gershgorin discs make a graphically interesting plot.

If you know of any other interesting examples please put them in the comments below.

A key fact is that the trace is also the sum of the eigenvalues. The proof is by considering the characteristic polynomial . The roots of are the eigenvalues of , so can be factorized

and so . The Laplace expansion of shows that the coefficient of is . Equating these two expressions for gives

A consequence of (1) is that any transformation that preserves the eigenvalues preserves the trace. Therefore the trace is unchanged under similarity transformations: for any nonsingular .

An an example of how the trace can be useful, suppose is a symmetric and orthogonal matrix, so that its eigenvalues are . If there are eigenvalues and eigenvalues then and . Therefore and .

Another important property is that for an matrix and an matrix ,

(despite the fact that in general). The proof is simple:

This simple fact can have non-obvious consequences. For example, consider the equation in matrices. Taking the trace gives , which is a contradiction. Therefore the equation has no solution.

The relation (2) gives for matrices , , and , that is,

So we can cyclically permute terms in a matrix product without changing the trace.

As an example of the use of (2) and (3), if and are -vectors then . If is an matrix then can be evaluated without forming the matrix since, by (3), .

The trace is useful in calculations with the Frobenius norm of an matrix:

where denotes the conjugate transpose. For example, we can generalize the formula for a complex number to an matrix by splitting into its Hermitian and skew-Hermitian parts:

where and . Then

If a matrix is not explicitly known but we can compute matrix–vector products with it then the trace can be estimated by

where the vector has elements independently drawn from the standard normal distribution with mean and variance . The expectation of this estimate is

since for and for all , so . This stochastic estimate, which is due to Hutchinson, is therefore unbiased.

- Haim Avron and Sivan Toledo, Randomized Algorithms for Estimating the Trace of an Implicit Symmetric Positive Semi-definite Matrix, J. ACM 58, 8:1-8:34, 2011.

- What Is a Matrix Norm? (2021)
- What Is an Eigenvalue? (2022)

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

The condition is equivalent to with nonsingular, that is, , , where . Hence is diagonalizable if and only if it has a complete set of linearly independent eigenvectors.

A Hermitian matrix is diagonalizable because the eigenvectors can be taken to be mutually orthogonal. The same is true for a normal matrix (one for which ). A matrix with distinct eigenvalues is also diagonalizable.

Theorem 1.If has distinct eigenvalues then it is diagonalizable.

Proof.Let have eigenvalues with corresponding eigenvectors . Suppose that for some . Then

which implies since for and . Premultiplying by shows, in the same way, that . Continuing in this way we find that . Therefore the are linearly independent and hence is diagonalizable.

A matrix can have repeated eigenvalues and be diagonalizable, as diagonal matrices with repeated diagonal entries show. What is needed for diagonalizability is that every -times repeated eigenvalue has linearly independent eigenvectors associated with it. Equivalently, the algebraic and geometric multiplicities of every eigenvalue must be equal, that is, the eigenvalues must all be semisimple. Another equivalent condition is that the degree of the minimal polynomial is equal to the number of distinct eigenvalues.

The simplest example of a matrix that is not diagonalizable is . This matrix is a Jordan block with the eigenvalue . Diagonalizability is easily understood in terms of the Jordan canonical form: is diagonalizable if and only if all the Jordan blocks in its Jordan form are .

Most matrices are diagonalizable, in the sense that the diagonalizable matrices are dense in , that is, any matrix in is arbitrarily close to a diagonalizable matrix. This property is useful because it can be convenient to prove a result by first proving it for diagonalizable matrices and then arguing that by continuity the result holds for a general matrix.

Is a rank- matrix diagonalizable, where are nonzero? There are zero eigenvalues with eigenvectors any set of linearly independent vectors orthogonal to . If then is the remaining eigenvalue, with eigenvector , which is linearly independent of the eigenvectors for , and is diagonalizable. If then all the eigenvalues of are zero and so cannot be diagonalizable, as the only diagonalizable matrix whose eigenvalues are all zero is the zero matrix. For the matrix mentioned above, and , so , confirming that this matrix is not diagonalizable.

- What Is an Eigenvalue? (2022)
- What Is the Jordan Canonical Form? (2022)

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

According to the WordPress statistics, this blog received over 192,000 visitors and 281,000 views in 2022. These are the ten most-viewed posts published during the year.

- Seven Sins of Numerical Linear Algebra
- What Is an Eigenvalue?
- The Big Six Matrix Factorizations
- What Is a Schur Decomposition?
- What Is a Permutation Matrix?
- What Is the Logarithmic Norm?
- What Is the Jordan Canonical Form?
- What Is the Frank Matrix?
- What Is a Circulant matrix?
- What Is a Toeplitz Matrix?

Eight of the posts are from the What Is series, which now contains 83 articles; more will follow in 2023.

Seven Sins of Numerical Linear Algebrahttps://t.co/HRAwaOgNek pic.twitter.com/ZBqQXx7l5j

— Nick Higham (@nhigham) October 11, 2022

What Is an Eigenvalue? https://t.co/XxsgIt8dRF pic.twitter.com/ZdzADn7GXJ

— Nick Higham (@nhigham) November 8, 2022

The Big Six Matrix Factorizationshttps://t.co/mlzem13RFq pic.twitter.com/DksEvxiVID

— Nick Higham (@nhigham) May 18, 2022

What Is a Schur Decomposition? #eigenvalues https://t.co/ORrsFZ3DwZ pic.twitter.com/rif7ciMXG8

— Nick Higham (@nhigham) May 11, 2022

What Is a Permutation Matrix? https://t.co/tgi5UqmUgT pic.twitter.com/kIKoNxBI3o

— Nick Higham (@nhigham) April 28, 2022

What Is the Jordan Canonical Form? #eigenvalues https://t.co/qD3nCgUeLv pic.twitter.com/ZnkRbYktFo

— Nick Higham (@nhigham) February 28, 2022

What Is the Frank Matrix? https://t.co/1BIKZVCld9 pic.twitter.com/mU4lo9dXId

— Nick Higham (@nhigham) January 25, 2022

What Is a Circulant Matrix? https://t.co/rb3vNYbi2c pic.twitter.com/7qL3eeNpAs

— Nick Higham (@nhigham) September 27, 2022

]]>What Is a Toeplitz Matrix? https://t.co/Iv9oK2rer7 pic.twitter.com/bK0eL52UBk

— Nick Higham (@nhigham) October 18, 2022

The identity matrix is stochastic, as is any permutation matrix. Here are some other examples of stochastic matrices:

For any matrix , the spectral radius is bounded by for any norm. For a stochastic matrix, taking the -norm (the maximum row sum of absolute values) gives , so since we know that is an eigenvalue, . It can be shown that is a semisimple eigenvalue, that is, if there are eigenvalues equal to then there are linearly independent eigenvectors corresponding to (Meyer, 2000, p. 696).

A lower bound on the spectral radius can be obtained from Gershgorin’s theorem. The th Gershgorin disc is defined by , which implies . Every eigenvalue lies in the union of the discs and so must satisfy

The lower bound is positive if is strictly diagonally dominant by rows.

If and are stochastic then is nonnegative and , so is stochastic. In particular, any positive integer power of is stochastic. Does converge as ? The answer is that it does, and the limit is stochastic, as long as is the only eigenvalue of modulus , and this will be the case if all the elements of are positive (by Perron’s theorem). The simplest example of non-convergence is the stochastic matrix

which has eigenvalues and . Since , all even powers are equal to and all odd powers are equal to . For the matrix (1), for all , while for (2), as . For (3), converges to the matrix with in every entry of the first column and zeros everywhere else.

Stochastic matrices arise in Markov chains. A transition matrix for a finite homogeneous Markov chain is a matrix whose element is the probability of moving from state to state over a time step. It has nonnegative entries and the rows sum to , so it is a stochastic matrix. In applications including finance and healthcare, a transition matrix may be estimated for a certain time period, say one year, but a transition matrix for a shorter period, say one month, may be needed. If is a transition matrix for a time period then a stochastic th root of , which is a stochastic solution of the equation , is a transition matrix for a time period a factor smaller. Therefore (years to months) and (weeks to days) are among the values of interest. Unfortunately, a stochastic th root may not exist. For example, the matrix

has no th roots at all, let alone stochastic ones. Yet many stochastic matrices do have stochastic roots. The matrix (3) has a stochastic th root for all , as shown by Higham and Lin (2011), who give a detailed analysis of th roots of stochastic matrices. For example, to four decimal places,

A stochastic matrix is sometime called a *row-stochastic matrix* to distinguish it from a *column-stochastic matrix*, which is a nonnegative matrix for which (so that is row-stochastic). A matrix that is both row-stochastic and column-stochastic is called *doubly stochastic*. A permutation matrix is an example of a doubly stochastic matrix. If is a unitary matrix then the matrix with is doubly stochastic. A magic square scaled by the magic sum is also doubly stochastic. For example,

>> M = magic(4), A = M/sum(M(1,:)) % A is doubly stochastic. M = 16 2 3 13 5 11 10 8 9 7 6 12 4 14 15 1 A = 4.7059e-01 5.8824e-02 8.8235e-02 3.8235e-01 1.4706e-01 3.2353e-01 2.9412e-01 2.3529e-01 2.6471e-01 2.0588e-01 1.7647e-01 3.5294e-01 1.1765e-01 4.1176e-01 4.4118e-01 2.9412e-02 >> [sum(A) sum(A')] ans = 1 1 1 1 1 1 1 1 >> eig(A)' ans = 1.0000e+00 2.6307e-01 -2.6307e-01 8.5146e-18

- Nicholas J. Higham and Lijing Lin, On th Roots of Stochastic Matrices, Linear Algebra Appl. 435, 448–463, 2011.
- Carl D. Meyer, Matrix Analysis and Applied Linear Algebra, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2000.

- What Is a Fractional Matrix Power? (2020)
- What Is a Matrix Norm? (2021)
- What Is a Permutation Matrix? (2022)
- What Is an Eigenvalue? (2022)
- What Is Gershgorin’s Theorem? (2022)
- What Is the Perron–Frobenius Theorem? (2021)

The rank of is . The difference is called the *signature*.

In general it is not possible to determine the inertia by inspection, but some deductions can be made. If has both positive and negative diagonal elements then and . But in general the diagonal elements do not tell us much about the inertia. For example, here is a matrix that has positive diagonal elements but only one positive eigenvalue (and this example works for any ):

>> n = 4; A = -eye(n) + 2*ones(n), eigA = eig(sym(A))' A = 1 2 2 2 2 1 2 2 2 2 1 2 2 2 2 1 eigA = [-1, -1, -1, 7]

A *congruence transformation* of a symmetric matrix is a transformation for a nonsingular matrix . The result is clearly symmetric. *Sylvester’s law of inertia* (1852) says that the inertia is preserved under congruence transformations.

Theorem 1(Sylvester’s law of inertia).If is symmetric and is nonsingular then .

Sylvester’s law gives a way to determine the inertia without computing eigenvalues: find a congruence transformation that transforms to a matrix whose inertia can be easily determined. A factorization does the job, where is a permutation matrix, is unit lower triangular, and is diagonal Then , and can be read off the diagonal of . This factorization does not always exist, and if it does exist is can be numerically unstable. A block factorization, in which is block diagonal with diagonal blocks of size or , always exists, and its computation is numerically stable with a suitable pivoting strategy such as symmetric rook pivoting.

For the matrix above we can compute a block factorization using the MATLAB `ldl`

function:

>> [L,D,P] = ldl(A); D D = 1.0000e+00 2.0000e+00 0 0 2.0000e+00 1.0000e+00 0 0 0 0 -1.6667e+00 0 0 0 0 -1.4000e+00

Since the leading 2-by-2 block of has negative determinant and hence one positive eigenvalue and one negative eigenvalue, it follows that has one positive eigenvalue and three negative eigenvalues.

A congruence transformation preserves the signs of the eigenvalues but not their magnitude. A result of Ostrowski (1959) bounds the ratios of the eigenvalues of the original and transformed matrices. Let the eigenvalues of a symmetric matrix be ordered .

Theorem(Ostrowski).For a symmetric and ,

where .

The theorem shows that the further is from being orthogonal the greater the potential change in the eigenvalues.

Finally, we note that everything here generalizes to complex Hermitian matrices by replacing transpose by conjugate transpose.

- What Is a Symmetric Indefinite Matrix? (2022)
- What Is an Eigenvalue? (2022)

Theorem 1(Gershgorin’s theorem).The eigenvalues of lie in the union of the discs in the complex plane

Proof.Let be an eigenvalue of and a corresponding eigenvector and let . From the th equation in we have

Hence

and since it follows that belongs to the th disc, .

The Gershgorin discs are defined in terms of a summation over the rows of , but since the eigenvalues of are the same as those of the same result holds with summation over the columns.

A consequence of the theorem is that if does not belong to any of the Gershgorin discs then is nonsingular. This is equivalent to the well-known result that a strictly diagonally dominant matrix is nonsingular.

Another consequence of the theorem is that if is symmetric and all the discs lie in the open right half-plane then the eigenvalues are positive and so is positive definite. This condition is equivalent to having positive diagonal elements and being strictly diagonally dominant.

The Gershgorin discs for the matrix

are shown here:

The eigenvalues—three real and one complex conjugate pair—are the black dots. It happens that each disc contains an eigenvalue, but this is not always the case. For the matrix

the discs are

and the blue disc does not contain an eigenvalue. The next result, which is proved by a continuity argument, provides additional information that increases the utility of Gershgorin’s theorem. In particular it says that if a disc is disjoint from the other discs then it contains an eigenvalue.

Theorem 2.If of the Gershgorin discs of are disjoint from the other discs then their union contains eigenvalues of .

Theorem 2 tells us that the rightmost disc in our example contains one eigenvalue, , since it is disjoint from the other discs, and the union of the other four discs contains four eigenvalues. Furthermore, must be real, because if not it occurs in a complex conjugate pair since the matrix is real, and as the disc is symmetric about the real axis would also lie in the disc, contradicting Theorem 2.

Gershgorin’s theorem is most useful for matrices that are close to being diagonal. A technique that can produce improved eigenvalue estimates is to apply the theorem to for some nonsingular diagonal matrix . This similarity transformation does not change the eigenvalues but it does change the discs, and the aim is to choose to reduce the radiuses of the discs. Consider our example. We know that there is one eigenvalue in the rightmost disc and that it is real, so . For the rightmost disc shrinks and remains distinct from the others and we obtain the sharper bounds . The discs for are shown here:

Most books on matrix analysis or numerical linear algebra include Gershgorin’s theorem.

Eigenvalue inclusion regions have been developed with discs replaced by more complicated shapes, such as Brauer’s ovals of Cassini.

Varga’s 2004 book is devoted to Gershgorin’s theorem and related results. It reproduces Gershgorin’s 1931 paper in an appendix.

- S. Gerschgorin. Uber die Abgrenzung der Eigenwerte einer Matrix. Izv. Akad. Nauk. SSSR, 1:749–754, 1931.
- Richard S. Varga. Geršgorin and His Circles. Springer-Verlag, Berlin, Germany, 2004.

- What Is a Diagonally Dominant Matrix? (2021)
- What Is an Eigenvalue? (2022)

Here are some examples of nilpotent matrices.

Matrix is the instance of the upper bidiagonal matrix

for which

and . The superdiagonal of ones moves up to the right with each increase in the index of the power until it disappears off the top right corner of the matrix.

Matrix has rank and was constructed using a general formula: if with then . We simply took orthogonal vectors and .

If is nilpotent then every eigenvalue is zero, since with implies or . Consequently, the trace and determinant of a nilpotent matrix are both zero.

If is nilpotent and Hermitian or symmetric, or more generally normal (), then , since such a matrix has a spectral decomposition and the matrix is zero. It is only for nonnormal matrices that nilpotency is a nontrivial property, and the best way to understand it is with the Jordan canonical form (JCF). The JCF of a matrix with only zero eigenvalues has the form , where , where is of the form (1) and hence . It follows that the index of nilpotency is .

What is the rank of an nilpotent matrix ? The minimum possible rank is , attained for the zero matrix. The maximum possible rank is , attained when the JCF of has just one Jordan block of size . Any rank between and is possible: rank is attained when there is a Jordan block of size and all other blocks are .

Finally, while a nilpotent matrix is obviously not invertible, like every matrix it has a Moore–Penrose pseudoinverse. The pseudoinverse of a Jordan block with eigenvalue zero is just the transpose of the block: for in (1).

- Matrix Rank Relations (2021)
- What Is a Generalized Inverse? (2020)
- What Is the Jordan Canonical Form? (2022)

An matrix has eigenvalues. This can be seen by noting that is equivalent to , which means that is singular, since . Hence . But

is a scalar polynomial of degree (the characteristic polynomial of ) with nonzero leading coefficient and so has roots, which are the eigenvalues of . Since , the eigenvalues of are the same as those of .

A real matrix may have complex eigenvalues, but they appear in complex conjugate pairs. Indeed implies , so if is real then is an eigenvalue of with eigenvector .

Here are some matrices and their eigenvalues.

Note that and are upper triangular, that is, for . For such a matrix the eigenvalues are the diagonal elements.

A *symmetric matrix* () or *Hermitian matrix* (, where ) has real eigenvalues. A proof is so premultiplying the first equation by and postmultiplying the second by gives and , which means that , or since . The matrix above is symmetric.

A *skew-symmetric matrix* () or *skew-Hermitian complex matrix* () has pure imaginary eigenvalues. A proof is similar to the Hermitian case: and so is equal to both and , so . The matrix above is skew-symmetric.

In general, the eigenvalues of a matrix can lie anywhere in the complex plane, subject to restrictions based on matrix structure such as symmetry or skew-symmetry, but they are restricted to the disc centered at the origin with radius , because for any matrix norm it can be shown that every eigenvalue satisfies .

Here are some example eigenvalue distributions, computed in MATLAB. (The eigenvalues are computed at high precision using the Advanpix Multiprecision Computing Toolbox in order to ensure that rounding errors do not affect the plots.) The second and third matrices are real, so the eigenvalues are symmetrically distributed about the real axis. (The first matrix is complex.)

Although this article is about eigenvalues we need to say a little more about eigenvectors. An matrix with distinct eigenvalues has linearly independent eigenvectors. Indeed it is diagonalizable: for some nonsingular matrix with the matrix of eigenvalues. If we write in terms of its columns as then is equivalent to , , so the are eigenvectors of . The matrices and above both have two linearly independent eigenvectors.

If there are repeated eigenvalues there can be less than linearly independent eigenvectors. The matrix above has only one eigenvector: the vector (or any nonzero scalar multiple of it). This matrix is a Jordan block. The matrix shows that a matrix with repeated eigenvalues can have linearly independent eigenvectors.

Here are some questions about eigenvalues.

- What matrix decompositions reveal eigenvalues? The answer is the Jordan canonical form and the Schur decomposition. The Jordan canonical form shows how many linearly independent eigenvectors are associated with each eigenvalue.
- Can we obtain better bounds on where eigenvalues lie in the complex plane? Many results are available, of which the most well-known is Gershgorin’s theorem.
- How can we compute eigenvalues? Various methods are available. The QR algorithm is widely used and is applicable to all types of eigenvalue problems.

Finally, we note that the concept of eigenvalue is more general than just for matrices: it extends to nonlinear operators on finite or infinite dimensional spaces.

Many books include treatments of eigenvalues of matrices. We give just three examples.

- Gene Golub and Charles F. Van Loan, Matrix Computations, fourth edition, Johns Hopkins University Press, Baltimore, MD, USA, 2013.
- Roger A. Horn and Charles R. Johnson, Matrix Analysis, second edition, Cambridge University Press, 2013. My review of the second edition.
- Carl D. Meyer, Matrix Analysis and Applied Linear Algebra, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2000.