What Is a (Non)normal Matrix?

An n\times n matrix is normal if A^*A = AA^*, that is, if A commutes with its conjugate transpose. Although the definition is simple to state, its significance is not immediately obvious.

The definition says that the inner product of the ith and jth columns equals the inner product of the ith and jth rows for all i and j. For i = j this means that the ith row and the ith column have the same 2-norm for all i. This fact can easily be used to show that a normal triangular matrix must be diagonal. It then follows from the Schur decomposition that A is normal if it is unitarily diagonalizable: A = QDQ^* for some unitary Q and diagonal D, where D contains the eigenvalues of A on the diagonal. Thus the normal matrices are those with a complete set of orthonormal eigenvectors.

For a general diagonalizable matrix, A = XDX^{-1}, the condition number \kappa(X) = \|X| \|X^{-1}\| can be arbitrarily large, but for a normal matrix X can be taken to have 2-norm condition number 1. This property makes normal matrices well-behaved for numerical computation.

Many equivalent conditions to A being normal are known: seventy are given by Grone et al. (1987) and a further nineteen are given by Elsner and Ikramov (1998).

The normal matrices include the classes of matrix given in this table:

Real Complex
Diagonal Diagonal
Symmetric Hermitian
Skew-symmetric Skew-Hermitian
Orthogonal Unitary
Circulant Circulant

Circulant matrices are n\times n Toeplitz matrices in which the diagonals wrap around:

\notag      \begin{bmatrix} c_1     & c_n    & \dots   & c_2     \\                      c_2     & c_1    & \dots   & \vdots  \\                      \vdots  & \ddots & \ddots  & c_n     \\                      c_n     & \dots  & c_2     & c_1     \\      \end{bmatrix}.

They are diagonalized by a unitary matrix known as the discrete Fourier transform matrix, which has (r,s) element \exp( -2\pi \mathrm{i} (r-1)(s-1) / n ).

A normal matrix is not necessarily of the form given in the table, even for n = 2. Indeed, a 2\times 2 normal matrix must have one of the forms

\notag    \left[\begin{array}{@{\mskip2mu}rr@{\mskip2mu}}      a & b\\      b & c    \end{array}\right], \quad    \left[\begin{array}{@{}rr@{\mskip2mu}}      a & b\\     -b & a    \end{array}\right].

The first matrix is symmetric. The second matrix is of the form aI + bJ, where J = \bigl[\begin{smallmatrix}\!\phantom{-}0 & 1\\\!-1 & 0 \end{smallmatrix}\bigr] is skew-symmetric and satisfies J^2 = -I, and it has eigenvalues a \pm \mathrm{i}b.

It is natural to ask what the commutator C = AA^*- A^*A can look like when A is not normal. One immediate observation is that C has zero trace, so its eigenvalues sum to zero, implying that C is an indefinite Hermitian matrix if it is not zero. Since an indefinite matrix has at least two different nonzero eigenvalues, C cannot be of rank 1.

In the polar decomposition A = UH, where U is unitary and H is Hermitian positive semidefinite, the polar factors commute if and only if A is normal.

The field of values, also known as the numerical range, is defined for A\in\mathbb{C}^{n\times n} by

F(A) = \biggl\{\, \displaystyle\frac{z^*Az}{z^*z} : 0 \ne z \in \mathbb{C}^n \, \biggr\}.

The set F(A) is compact and convex (a nontrivial property proved by Toeplitz and Hausdorff), and it contains all the eigenvalues of A. Normal matrices have the property that the field of values is the convex hull of the eigenvalues. The next figure illustrates two fields of values, with the eigenvalues plotted as dots. The one on the left is for the nonnormal matrix gallery('smoke',16) and that on the right is for the circulant matrix gallery('circul',x) with x constructed as x = randn(16,1); x = x/norm(x).

fv_ex.jpg

Measures of Nonnormality

How can we measure the degree of nonnormality of a matrix? Let A have the Schur decomposition A = QTQ^*, where Q is unitary and T is upper triangular, and write T = D+M, where D = \mathrm{diag}(\lambda_i) is diagonal with the eigenvalues of A on its diagonal and M is strictly upper triangular. If A is normal then M is zero, so \|M\|_F is a natural measure of how far A is from being normal. While M depends on Q (which is not unique), its Frobenius norm does not, since \|A\|_F^2 = \|T\|_F^2 = \|D\|_F^2 + \|M\|_F^2. Accordingly, Henrici defined the departure from normality by

\notag   \nu(A) = \biggl( \|A\|_F^2 - \displaystyle\sum_{j=1}^n |\lambda_j|^2 \biggr)^{1/2}.

Henrici (1962) derived an upper bound for \nu(A) and Elsner and Paardekooper (1987) derived a lower bound, both based on the commutator:

\notag   \displaystyle\frac{\|A^*A-AA^*\|_F}{4\|A\|_2}   \le \nu(A) \le \Bigl( \displaystyle\frac{n^3-n}{12} \Bigr)^{1/4} \|A^*A-AA^*\|_F^{1/2}.

The distance to normality is

\notag   d(A) = \min \bigl\{\, \|E\|_F: ~A+E \in \mathbb{C}^{n\times n}~~\mathrm{is~                                   normal} \,\bigr\}.

This quantity can be computed by an algorithm of Ruhe (1987). It is trivially bounded above by \nu(A) and is also bounded below by a multiple of it (László, 1994):

\notag     \displaystyle\frac{\nu(A)}{n^{1/2}} \le d(A) \le \nu(A).

Normal matrices are a particular class of diagonalizable matrices. For diagonalizable matrices various bounds are available that depend on the condition number of a diagonalizing transformation. Since such a transformation is not unique, we take a diagonalization A = XDX^{-1}, D = \mathrm{diag}(\lambda_i), with X having minimal 2-norm condition number:

\kappa_2(X) = \min\{\, \kappa_2(Y): A = YDY^{-1}, ~D~\mathrm{diagonal} \,\}.

Here are some examples of such bounds. We denote by \rho(A) the spectral radius of A, the largest absolute value of any eigenvalue of A.

  • By taking norms in the eigenvalue-eigenvector equation Ax = \lambda x we obtain \rho(A) \le \|A\|_2. Taking norms in A = XDX^{-1} gives \|A\|_2 \le \kappa_2(X) \|D\|_2 = \kappa_2(X)\rho(A). Hence

\notag    \displaystyle\frac{\|A\|_2}{\kappa_2(X)} \le \rho(A) \le \|A\|_2.

  • If A has singular values \sigma_1 \ge \sigma_2 \ge \cdots \ge   \sigma_n and its eigenvalues are ordered |\lambda_1| \ge |\lambda_2| \ge \cdots \ge |\lambda_n|, then (Ruhe, 1975)

    \notag      \displaystyle\frac{\sigma_i(A)}{\kappa_2(X)}      \le |\lambda_i(A)|      \le \kappa_2(X) \sigma_i(A), \quad i = 1\colon n.

    Note that for i=1 the previous upper bound is sharper.

  • For any real p > 0,

    \notag    \displaystyle\frac{\rho(A)^p}{\kappa_2(X)} \le     \|A^p\|_2 \le \kappa_2(X) \rho(A)^p.

  • For any function f defined on the spectrum of A,

    \notag    \displaystyle\frac{\max_i|f(\lambda_i)|}{\kappa_2(X)} \le     \|f(A)\|_2 \le \max_i|f(\lambda_i)|.

For normal A we can take X unitary and so all these bounds are equalities. The condition number \kappa_2(X) can therefore be regarded as another measure of non-normality, as quantified by these bounds.

References

This is a minimal set of references, which contain further useful references within.

Leave a comment