Eigenvalue Inequalities for Hermitian Matrices

The eigenvalues of Hermitian matrices satisfy a wide variety of inequalities. We present some of the most useful and explain their implications. Proofs are omitted, but as Parlett (1998) notes, the proofs of the Courant–Fischer, Weyl, and Cauchy results are all consequences of the elementary fact that if the sum of the dimensions of two subspaces of \mathbb{C}^n exceeds n then the subspaces have a nontrivial intersection.

The eigenvalues of a Hermitian matrix A\in\mathbb{C}^{n\times n} are real and we order them \lambda_n\le \lambda_{n-1} \le \cdots \le \lambda_1. Note that in some references, such as Horn and Johnson (2013), the reverse ordering is used, with \lambda_n the largest eigenvalue. When it is necessary to specify what matrix \lambda_k is an eigenvalue of we write \lambda_k(A): the kth largest eigenvalue of A. All the following results also hold for symmetric matrices over \mathbb{R}^{n\times n}.

Quadratic Form

The function f(x) = x^*Ax/x^*x is the quadratic form x^*Ax for A evaluated on the unit sphere, since f(x) = f(x/\|x\|_2). As A is Hermitian it has a spectral decomposition A = Q\Lambda Q^*, where Q is unitary and \Lambda = \mathrm{diag}(\lambda_i). Then

f(x) = \displaystyle\frac{x^*Q\Lambda Q^*x}{x^*x}             = \displaystyle\frac{y^*\Lambda y}{y^*y}             = \displaystyle\frac{\sum_{i=1}^{n}\lambda_i y_i^2}                                 {\sum_{i=1}^{n}y_i^2} \quad (y = Q^*x),

from which is it clear that

\notag  \lambda_n = \displaystyle\min_{x\ne0} \displaystyle\frac{x^*Ax}{x^*x}, \quad  \lambda_1 = \displaystyle\max_{x\ne0} \displaystyle\frac{x^*Ax}{x^*x}, \qquad(*)

with equality when x is an eigenvector corresponding to \lambda_n and \lambda_1, respectively, This characterization of the extremal eigenvalues of A as the extrema of f is due to Lord Rayleigh (John William Strutt), and f(x) is called a Rayleigh quotient. The intermediate eigenvalues correspond to saddle points of f.

Courant–Fischer Theorem

The Courant–Fischer theorem (1905) states that every eigenvalue of a Hermitian matrix A\in\mathbb{C}^{n\times n} is the solution of both a min-max problem and a max-min problem over suitable subspaces of \mathbb{C}^n.

Theorem (Courant–Fischer).

For a Hermitian A\in\mathbb{C}^{n\times n},

\notag \begin{aligned}    \lambda_k &= \min_{\dim(S)=n-k+1} \, \max_{0\ne x\in S} \frac{x^*Ax}{x^*x}\\              &= \max_{\dim(S)= k} \, \min_{0\ne x\in S} \frac{x^*Ax}{x^*x},                  \quad k=1\colon n. \end{aligned}

Note that the equalities (*) are special cases of these characterizations.

In general there is no useful formula for the eigenvalues of a sum A+B of Hermitian matrices. However, the Courant–Fischer theorem yields the upper and lower bounds

\notag  \lambda_k(A) + \lambda_n(B) \le \lambda_k(A+B) \le \lambda_k(A) + \lambda_1(B),   \qquad (1)

from which it follows that

\notag   \max_k|\lambda_k(A+B)-\lambda_k(A)| \le \max(|\lambda_n(B)|,|\lambda_1(B)|)     = \|B\|_2.

This inequality shows that the eigenvalues of a Hermitian matrix are well conditioned under perturbation. We can rewrite the inequality in the symmetric form

\notag   \max_k |\lambda_k(A)-\lambda_k(B)| \le \|A-B\|_2.

If B is positive semidefinite then (1) gives

\notag    \lambda_k(A) \le \lambda_k(A + B),    \quad k = 1\colon n, \qquad (2)

while if B is positive definite then strict inequality holds for all i. These bounds are known as the Weyl monotonicity theorem.

Weyl’s Inequalities

Weyl’s inequalities (1912) bound the eigenvalues of A+B in terms of those of A and B.

Theorem (Weyl).

For Hermitian A,B\in\mathbb{C}^{n\times n} and i,j = 1\colon n,

\notag \begin{aligned}     \lambda_{i+j-1}(A+B) &\le \lambda_i(A) + \lambda_j(B),     \quad i+j \le n+1, \qquad (3)\\     \lambda_i(A) + \lambda_j(B) &\le \lambda_{i+j-n}(A+B).     \quad i+j \ge n+1, \qquad (4) \end{aligned}

The Weyl inequalities yield much information about the effect of low rank perturbations. Consider a positive semidefinite rank-1 perturbation B = zz^*. Inequality (3) with j = 1 gives

\notag     \lambda_i(A+B) \le \lambda_i(A) + z^*z,       \quad i = 1\colon n

(which also follows from (1)). Inequality (3) with j = 2, combined with (2), gives

\notag     \lambda_{i+1}(A) \le \lambda_{i+1}(A + zz^*) \le \lambda_i(A),       \quad i = 1\colon n-1. \qquad (5)

These inequalities confine each eigenvalue of A + zz^* to the interval between two adjacent eigenvalues of A; the eigenvalues of A + zz^* are said to interlace those of A. The following figure illustrates the case n = 4, showing a possible configuration of the eigenvalues \lambda_i of A and \mu_i of A + zz^*.

weyl_fig.jpg A specific example, in MATLAB, is

>> n = 4; eig_orig = 5:5+n-1
>> D = diag(eig_orig); eig_pert = eig(D + ones(n))'
eig_orig =
     5     6     7     8
eig_pert =
   5.2961e+00   6.3923e+00   7.5077e+00   1.0804e+01

Since \mathrm{trace}(A + zz^*) = \mathrm{trace}(A) + z^*z and the trace is the sum of the eigenvalues, we can write

\notag       \lambda_i(A + zz^*) = \lambda_i(A) + \theta_i z^*z,

where the \theta_i are nonnegative and sum to 1. If we greatly increase z^*z, the norm of the perturbation, then most of the increase in the eigenvalues is concentrated in the largest, since (5) bounds how much the smaller eigenvalues can change:

>> eig_pert = eig(D + 100*ones(n))'
eig_pert =
   5.3810e+00   6.4989e+00   7.6170e+00   4.0650e+02

More generally, if B has p positive eigenvalues and q negative eigenvalues then (3) with j = p+1 gives

\notag     \lambda_{i+p}(A+B) \le \lambda_i(A),      \quad i = 1\colon n-p,

while (4) with j = n-q gives

\notag     \lambda_i(A) \le \lambda_{i-q}(A + B),     \quad i = q+1\colon n.

So the inertia of B (the number of negative, zero, and positive eigenvalues) determines how far the eigenvalues can move as measured relative to the indexes of the eigenvalues of A.

An important implication of the last two inequalities is for the case A = I, for which we have

\notag \begin{aligned}  \lambda_{i+p}(I+B) &\le 1, \quad i = 1 \colon n-p, \\  \lambda_{i-q}(I+B) &\ge 1, \quad i = q+1 \colon n. \end{aligned}

Exactly p+q eigenvalues appear in one of these inequalities and n-(p+q) appear in both. Therefore n - (p+q) of the eigenvalues are equal to 1 and so only \mathrm{rank}(B) = p+q eigenvalues can differ from 1. So perturbing the identity matrix by a Hermitian matrix of rank r changes at most r of the eigenvalues. (In fact, it changes exactly r eigenvalues, as can be seen from a spectral decomposition.)

Finally, if B has rank r then \lambda_{r+1}(B) \le 0 and \lambda_{n-r}(B) \ge 0 and so taking j = r+1 in (3) and j = n-r in (4) gives

\notag   \begin{aligned}     \lambda_{i+r}(A+B) &\le \lambda_i(A),      ~~\qquad\qquad i = 1\colon n-r, \\         \lambda_i(A) &\le \lambda_{i-r}(A + B), ~~\quad i = r+1\colon n.   \end{aligned}

Cauchy Interlace Theorem

The Cauchy interlace theorem relates the eigenvalues of successive leading principal submatrices of a Hermitian matrix. We denote the leading principal submatrix of A of order k by A_k = A(1\colon k, 1\colon k).

Theorem (Cauchy).

For a Hermitian A\in\mathbb{C}^{n\times n},

\notag  \lambda_{i+1}(A_{k+1}) \le \lambda_i(A_k) \le \lambda_i(A_{k+1}),    \quad i = 1\colon k, \quad k=1\colon n-1.

The theorem says that the eigenvalues of A_k interlace those of A_{k+1} for all k. Two immediate implications are that (a) if A is Hermitian positive definite then so are all its leading principal submatrices and (b) appending a row and a column to a Hermitian matrix does not decrease the largest eigenvalue or increase the smallest eigenvalue.

Since eigenvalues are unchanged under symmetric permutations of the matrix, the theorem can be reformulated to say that the eigenvalues of any principal submatrix of order n-1 interlace those of A. A generalization to principal submatrices of order n-\ell is given in the next result.

Theorem.

If B is a principal submatrix of order n-\ell of a Hermitian A\in\mathbb{C}^{n\times n} then

\notag  \lambda_{i+\ell}(A) \le \lambda_i(B) \le \lambda_i(A),    \quad i=1\colon n-\ell.

Majorization Results

It follows by taking x to be a unit vector e_i in the formula \lambda_1 = \max_{x\ne0} x^*Ax/(x^*x) that \lambda_1 \ge a_{ii} for all i. And of course the trace of A is the sum of the eigenvalues: \sum_{i=1}^n a_{ii} = \sum_{i=1}^n \lambda_i. These relations are the first and last in a sequence of inequalities relating sums of eigenvalues to sums of diagonal elements obtained by Schur in 1923.

Theorem (Schur).

For a Hermitian A\in\mathbb{C}^{n\times n},

\notag     \displaystyle\sum_{i=1}^k \lambda_i \ge \displaystyle\sum_{i=1}^k \widetilde{a}_{ii},     \quad k=1\colon n,

where \{\widetilde{a}_{ii}\} is the set of diagonal elements of A arranged in decreasing order: \widetilde{a}_{11} \ge \cdots \ge \widetilde{a}_{nn}.

These inequalities say that the vector [\lambda_1,\dots,\lambda_n] of eigenvalues majorizes the ordered vector [\widetilde{a}_{11},\dots,\widetilde{a}_{nn}] of diagonal elements.

An interesting special case is a correlation matrix, a symmetric positive semidefinite matrix with unit diagonal, for which the inequalities are

\notag     \lambda_1 \ge 1, \quad     \lambda_1+ \lambda_2\ge 2, \quad \dots, \quad     \lambda_1+ \lambda_2 + \cdots + \lambda_{n-1} \ge n-1,

and \lambda_1+ \lambda_2 + \cdots + \lambda_n = n. Here is an illustration in MATLAB.

>> n = 5; rng(1); A = gallery('randcorr',n);
>> e = sort(eig(A)','descend'), partial_sums = cumsum(e)
e =
  2.2701e+00   1.3142e+00   9.5280e-01   4.6250e-01   3.6045e-04
partial_sums =
  2.2701e+00   3.5843e+00   4.5371e+00   4.9996e+00   5.0000e+00

Ky Fan (1949) proved a majorization relation between the eigenvalues of A, B, and A+B:

\notag   \displaystyle\sum_{i=1}^k \lambda_i(A+B) \le   \displaystyle\sum_{i=1}^k \lambda_i(A) +   \displaystyle\sum_{i=1}^k \lambda_i(B), \quad k = 1\colon n.

For k = 1, the inequality is the same as the upper bound of (1), and for k = n it is an equality: \mathrm{trace}(A+B) = \mathrm{trace}(A) + \mathrm{trace}(B).

Ostrowski’s Theorem

For a Hermitian A and a nonsingular X, the transformation A\to X^*AX is a congruence transformation. Sylvester’s law of inertia says that congruence transformations preserve the inertia. A result of Ostrowski (1959) goes further by providing bounds on the ratios of the eigenvalues of the original and transformed matrices.

Theorem (Ostrowski).

For a Hermitian A\in \mathbb{C}^{n\times n} and X\in\mathbb{C}^{n\times n},

\lambda_k(X^*AX) = \theta_k \lambda_k(A), \quad k=1\colon n,

where \lambda_n(X^*X) \le \theta_k \le \lambda_1(X^*X).

If X is unitary then X^*X = I and so Ostrowski’s theorem reduces to the fact that a congruence with a unitary matrix is a similarity transformation and so preserves eigenvalues. The theorem shows that the further X is from being unitary the greater the potential change in the eigenvalues.

Ostrowski’s theorem can be generalized to the situation where X is rectangular (Higham and Cheng, 1998).

Interrelations

The results we have described are strongly interrelated. For example, the Courant–Fischer theorem and the Cauchy interlacing theorem can be derived from each other, and Ostrowski’s theorem can be proved using the Courant–Fischer Theorem.

References

2 thoughts on “Eigenvalue Inequalities for Hermitian Matrices

  1. This is a very nice overview; thanks! Tangential to what Parlett is stating, most if not even all of these results can be concluded from the Eckart–Young–Mirsky theorem in the spectral norm.

Leave a comment