What is a Diagonally Dominant Matrix?

Matrices arising in applications often have diagonal elements that are large relative to the off-diagonal elements. In the context of a linear system this corresponds to relatively weak interactions between the different unknowns. We might expect a matrix with a large diagonal to be assured of certain properties, such as nonsingularity. However, to ensure nonsingularity it is not enough for each diagonal element to be the largest in its row. For example, the matrix

\notag  \left[\begin{array}{rrr}     3 & -1 & -2\\    -2 &  3 & -1\\    -2 & -1 &  3   \end{array}\right] \qquad (1)

is singular because [1~1~1]^T is a null vector. A useful definition of a matrix with large diagonal requires a stronger property.

A matrix A\in\mathbb{C}^{n\times n} is diagonally dominant by rows if

\notag        |a_{ii}| \ge \displaystyle\sum_{j\ne i} |a_{ij}|, \quad i=1\colon n. \qquad (2)

It is strictly diagonally dominant by rows if strict inequality holds in (2) for all i. A is (strictly) diagonally dominant by columns if A^T is (strictly) diagonally dominant by rows.

Diagonal dominance on its own is not enough to ensure nonsingularity, as the matrix (1) shows. Strict diagonal dominance does imply nonsingularity, however.

Theorem 1.

If A\in\mathbb{C}^{n\times n} is strictly diagonally dominant by rows or columns then it is nonsingular.

Proof. Since A is nonsingular if and only if A^T is nonsingular, it suffices to consider diagonal dominance by rows. For any nonzero x let y = Ax and choose k so that |x_k| = \|x\|_{\infty}. Then the kth equation of y = Ax can be written

\notag    a_{kk}x_k = y_k - \displaystyle\sum_{j\ne k} a_{kj}x_j,

which gives

\notag    |a_{kk}|\|x\|_{\infty} = |a_{kk}||x_k|     \le |y_k| + \displaystyle\sum_{j\ne k} |a_{kj}||x_j|     \le |y_k| + \|x\|_\infty \displaystyle\sum_{j\ne k} |a_{kj}|.

Using (2), we have

\notag    |y_k| \ge \|x\|_{\infty} \Bigl(|a_{kk}| - \displaystyle\sum_{j\ne k} |a_{kj}|\Bigr) > 0.    \qquad (3)

Therefore y\ne0 and so A is nonsingular. ~\square

Diagonal dominance plus two further conditions is enough to ensure nonsingularity. We need the notion of irreducibility. A matrix A\in\mathbb{R}^{n\times n} is irreducible if there does not exist a permutation matrix P such that

\notag       P^TAP = \begin{bmatrix} A_{11} & A_{12} \\                                  0   & A_{22} \end{bmatrix}

with A_{11} and A_{22} square matrices. Irreducibility is equivalent to the directed graph of A being strongly connected.

Theorem 2.

If A\in\mathbb{C}^{n\times n} is irreducible and diagonally dominant by rows with strict inequality in (2) for some i then it is nonsingular.

Proof. The proof is by contradiction. Suppose there exists x\ne 0 such that Ax = 0. Define

\notag   G = \{\, j: |x_j| = \|x\|_{\infty} \,\},   \quad   H = \{\, j: |x_j| < \|x\|_{\infty} \,\}.

The ith equation of Ax = 0 can be written

\notag       a_{ii}x_i = - \displaystyle\sum_{j\ne i} a_{ij}x_j                 = - \displaystyle\sum_{j\in G \atop j\ne i } a_{ij}x_j                   - \displaystyle\sum_{j\in H \atop j\ne i } a_{ij}x_j. \qquad (4)

Hence for i = r\in G,

\notag |a_{rr}| \le \displaystyle\sum_{j\in G \atop j\ne r } |a_{rj}|       + \displaystyle\sum_{j\in H \atop j\ne r } |a_{rj}|\frac{|x_j|}{\|x\|_\infty}.

The set H is nonempty, because if it were empty then we would have |x_j| = \|x\|_\infty for all j and if there is strict inequality in (2) for i = m, then putting i = m in (4) would give |a_{mm}| \le \sum_{j\ne m} |a_{mj}| |x_j|/|x_m|             =  \sum_{j\ne m} |a_{mj}|, which is a contradiction. Hence as long as a_{rj}\ne0 for some j\in H, we obtain |a_{rr}| <  \sum_{j\ne r } |a_{rj}|, which contradicts the diagonal dominance. Therefore we must have a_{rj}= 0 for all j\in H and all r\in G. This means that all the rows indexed by G have zeros in the columns indexed by H, which means that A is reducible. This is a contradiction, so A must be nonsingular. ~\square

The obvious analogue of Theorem 2 holds for column diagonal dominance.

As an example, the n\times n symmetric tridiagonal matrix (minus the second difference matrix)

\notag  T_n = \left[\begin{array}{@{\mskip 5mu}c*{4}{@{\mskip 15mu} r}@{\mskip 5mu}}      2 &   -1  &          &         & \\     -1 &    2  &  -1      &         & \\        &    -1 &   2      &  \ddots & \\        &       &  \ddots  &  \ddots & -1\\        &       &          &  -1     & 2            \end{array}\right], \qquad (5)

is row diagonally dominant with strict inequality in the first and last diagonal dominance relations. It can also be shown to be irreducible and so it is nonsingular by Theorem 2. If we replace t_{11} or t_{nn} by 1, then T remains nonsingular by the same argument. What if we replace both t_{11} and t_{nn} by 1? We can answer this question by using an observation of Strang. If we define the rectangular matrix

\notag  L_n = \begin{bmatrix}                1  &      &        &  \\                -1 &  1   &        &  \\                   & -1   & \ddots &  \\                   &      & \ddots & 1 \\                   &      &        &  -1     \end{bmatrix} \in\mathbb{R}^{(n+1)\times n}

then T_n = L_n^T L_n and

\notag \widetilde{T}_{n+1}  = \begin{bmatrix}                        1 &-1      &        &      & \\                       -1 & 2      & \ddots &      & \\                          & \ddots & \ddots &  -1  & \\                          &        &   -1    &  2  & -1\\                          &        &         &  -1 & 1                    \end{bmatrix}                          = L_n L_n^T \in \mathbb{R}^{(n+1) \times (n+1)}.

Since in general AB and BA have the same nonzero eigenvalues, we conclude that \Lambda(\widetilde{T}_{n+1})  = \Lambda(T_n) \cup \{0\}, where \Lambda(\cdot) denotes the spectrum. Hence T_n is symmetric positive definite and \widetilde{T}_n is singular and symmetric positive semidefinite.

Relation to Gershgorin’s Theorem

Theorem 1 can be used to obtain information about the location of the eigenvalues of a matrix. Indeed if \lambda is an eigenvalue of A then A - \lambda I is singular and hence cannot be strictly diagonally dominant, by Theorem 1. So |a_{ii}-\lambda| > \sum_{j\ne i} |a_{ij}| cannot be true for all i. Gershgorin’s theorem is simply a restatement of this fact.

Theorem 3 (Gershgorin’s theorem).

The eigenvalues of A\in\mathbb{C}^{n\times n} lie in the union of the n discs in the complex plane

\notag      D_i = \Big\{ z\in\mathbb{C}: |z-a_{ii}| \le \displaystyle\sum_{j\ne i}      |a_{ij}|\Big\}, \quad i=1\colon n.

If A is symmetric with positive diagonal elements and satisfies the conditions of Theorem 1 or Theorem 2 then it is positive definite. Indeed the eigenvalues are real and so in Gershgorin’s theorem the discs are intervals and a_{ii} - z \le |z-a_{ii}| \le \sum_{j\ne i}^n |a_{ij}|, so z \ge |a_{ii}| - \sum_{j\ne i}^n |a_{ij}| \ge 0, so the eigenvalues are nonnegative, and hence positive since nonzero. This provides another proof that the matrix T_n in (5) is positive definite.

Generalized Diagonal Dominance

In some situations A is not diagonally dominant but a row or column scaling of it is. For example, the matrix

\notag   A = \begin{bmatrix}         1   & 1   & 0 \\         2/3 & 2   & 1/4 \\         2/3 & 1/2 & 1       \end{bmatrix}

is not diagonally dominant by rows or columns but

\notag   A \, \mathrm{diag}(3,2,4)    = \begin{bmatrix}         3   & 2   & 0 \\         2   & 4   & 1   \\         2   & 1   & 4       \end{bmatrix}

is strictly diagonally dominant by rows.

A matrix A\in\mathbb{C}^{n\times n} is generalized diagonally dominant by rows if AD is diagonally dominant by rows for some diagonal matrix D = \mathrm{diag}(d_i) with d_i > 0 for all i, that is, if

\notag      |a_{ii}|d_i \ge \displaystyle\sum_{j\ne i} |a_{ij}|d_j, \quad i=1\colon n. \qquad (6)

It is easy to see that if A is irreducible and there is strictly inequality in (6) for some i then A is nonsingular by Theorem 2.

It can be shown that A is generalized diagonally dominant by rows if and only if it is an H-matrix, where an H-matrix is a matrix for which the comparison matrix M(A), defined by

\notag   M(A) = (m_{ij}), \quad m_{ij} =    \begin{cases} |a_{ii}|, & i=j, \\                 -|a_{ij}|, & i\ne j,     \end{cases}

is an M-matrix (see What Is an M-Matrix?).

Block Diagonal Dominance

A matrix A\in\mathbb{C}^{n\times n} is block diagonally dominant by rows if, for a given norm and block m\times m partitioning A = (A_{ij}), the diagonal blocks A_{jj} are all nonsingular and

\notag     \displaystyle\sum_{j\ne i} \|A_{ij}\| \le  \|A_{ii}^{-1}\|^{-1}, \quad i = 1\colon m.   \label{bdd}

A is block diagonally dominant by columns if A^T is block diagonally dominant by rows. If the blocks are all 1\times 1 then block diagonal dominance reduces to the usual notion of diagonal dominance. Block diagonal dominance holds for certain block tridiagonal matrices arising in the discretization of PDEs.

Analogues of Theorems 1 and 2 giving conditions under which block diagonal dominance implies nonsingularity are given by Feingold and Varga (1962).

Bounding the Inverse

If a matrix is strictly diagonally dominant then we can bound its inverse in terms of the minimum amount of diagonal dominance. For full generality, we state the bound in terms of generalized diagonal dominance.

Theorem 4.

If A\in\mathbb{C}^{n\times n} and AD is strictly diagonally dominant by rows for a diagonal matrix D = \mathrm{diag}(d_i) with d_i > 0 for all i, then

\notag   \|A^{-1}\|_\infty \le \displaystyle\frac{\|D\|_{\infty}}{\alpha},

where \alpha = \min_i (|a_{ii}|d_i - \sum_{j\ne i} |a_{ij}|d_j).

Proof. Assume first that D = I. Let y satisfy \|A^{-1}\|_{\infty} = \|A^{-1}y\|_{\infty} / \|y\|_{\infty} and let x = A^{-1}y. Applying (3) gives \|A^{-1}\|_{\infty} = \|x\|_{\infty} / \|y\|_{\infty} \le \alpha^{-1}. The result is obtained on applying this bound to AD and using \|A^{-1}\|_{\infty} \le \|D\|_{\infty} \|(AD)^{-1}\|_{\infty}. ~\square.

Another bound for A^{-1} when A is strictly diagonally dominant by rows can be obtained by writing A = D(I - E), where D = \mathrm{diag}(a_{ii}), e_{ii} = 0, and e_{ij} = -a_{ij}/a_{ii} for i\ne j. It is easy to see that \|E\|_\infty < 1, which gives another proof that A is nonsingular. Then

\notag  \begin{aligned}   |A^{-1}| &= |(I-E)^{-1}D^{-1}|            = |I + E + E^2 + \cdots | |D^{-1}|\\            &\le (I + |E| + |E|^2 + \cdots ) |D|^{-1}\\             &= (I - |E|)^{-1} |D|^{-1}\\             &= M(A)^{-1}.  \end{aligned}

This bound implies that M(A)^{-1} \ge 0, so in view of its sign pattern M(A) is an M-matrix, which essentially proves one direction of the H-matrix equivalence in the previous section. The same bound holds if A is diagonally dominant by columns, by writing A = (I-E)D.

An upper bound also holds for block diagonal dominance.

Theorem 5.

If A\in\mathbb{C}^{n\times n} is block diagonally dominant by rows then

\notag   \|A^{-1}\|_\infty \le \displaystyle\frac{1}{\alpha}.

where \alpha = \min_i ( \|A_{ii}^{-1}\|^{-1} - \sum_{j\ne i} \|A_{ij}\| ).

It is interesting to note that the inverse of a strictly row diagonally dominant matrix enjoys a form of diagonal dominance, namely that the largest element in each column is on the diagonal.

Theorem 6.

If A\in\mathbb{C}^{n\times n} is strictly diagonally dominant by rows then B = A^{-1} satisfies |b_{ij}| < |b_{jj}| for all i\ne j.

Proof. For i\ne j we have \sum_{k=1}^n a_{ik}b_{kj} = 0. Let \beta_j = \max_k |b_{kj}|. Taking absolute values in a_{ii}b_{ij} = -\sum_{k\ne i}a_{ik}b_{kj} gives

\notag  |a_{ii}||b_{ij}| \le \beta_j \sum_{k\ne i} |a_{ik}| < \beta_j |a_{ii}|,

or |b_{ij}| < \beta_j, since a_{ii} \ne 0. This inequality holds for all i\ne j, so we must have \beta_j = |b_{jj}|, which gives the result.

Historical Remarks

Theorems 1 and 2 have a long history and have been rediscovered many times. Theorem 1 was first stated by Lévy (1881) with additional assumptions. In a short but influential paper, Taussky (1949) pointed out the recurring nature of the theorems and gave simple proofs (our proof of Theorem 2 is Taussky’s). Schneider (1977) attributes the surge in interest in matrix theory in the 1950s and 1960s to Taussky’s paper and a few others by her, Brauer, Ostrowski, and Wielandt. The history of Gershgorin’s theorem (published in 1931) is intertwined with that of Theorems 1 and 2; see Varga’s 2004 book for details.

Theorems 4 and 5 are from Varah (1975) and Theorem 6 is from Ostrowski (1952).


This is a minimal set of references, which contain further useful references within.

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

Bounds for the Norm of the Inverse of a Triangular Matrix

In many situations we need to estimate or bound the norm of the inverse of a matrix, for example to compute an error bound or to check whether an iterative process is guaranteed to converge. This is the same problem as bounding the condition number \kappa(A) = \|A\| \|A^{-1}\|, assuming \|A\| is easy to compute or estimate. Here, we focus on triangular matrices. The bounds we derive can be applied to a general matrix if an LU or QR factorization is available.

We denote by \|\cdot\| any matrix norm, and we take the consistency condition \|AB\| \le \|A\| \|B\| as one of the defining properties of a matrix norm.

It will be useful to note that

\notag       \left[\begin{array}{crrrr}       1 & -\theta & -\theta & -\theta & -\theta\\         & 1 & -\theta & -\theta & -\theta\\         &   & 1 & -\theta & -\theta\\         &   &   & 1 & -\theta\\         &   &   &   & 1 \end{array}\right]^{-1} =      \left[\begin{array}{ccccc}      1 & \theta & \theta(1+\theta) & \theta(1+\theta)^2 & \theta(1+\theta)^3\\        & 1 & \theta & \theta(1+\theta) & \theta(1+\theta)^2\\        &   & 1 & \theta & \theta(1+\theta)\\        &   &   & 1 & \theta\\        &   &   &   & 1      \end{array}\right]

and that more generally the inverse of the n\times n upper triangular matrix T(\theta) with

\notag   (T(\theta))_{ij} = \begin{cases} 1, & i=j, \\                     -\theta, & i<j, \end{cases} \qquad (1)

is given by

\notag   \bigl(T(\theta)^{-1}\bigr)_{ij} =     \begin{cases} 1, & i=j, \\                 \theta(1+\theta)^{j-i-1}, & j > i. \end{cases} \qquad (2)

Lower Bound

First, we consider a general matrix A\in\mathbb{C}^{n\times n} and let \lambda be an eigenvalue with |\lambda| = \rho(A) (the spectral radius) and x a corresponding eigenvector. With X = xe^T \in\mathbb{C}^{n\times n}, where e is the vector of ones, AX = \lambda X, so

\notag     |\lambda| \|X\| = \|\lambda X\| = \| AX \| \le \|A\| \|X\|,

which implies |\lambda| \le \|A\| since X\ne 0. Hence

\notag           \|A\| \ge \rho(A).

Let T be a triangular matrix. Applying the latter bound to T^{-1}, whose eigenvalues are its diagonal entries t_{ii}^{-1}, gives

\notag       \|T^{-1}\| \ge \displaystyle\frac{1}{\min_i |t_{ii}|}.  \qquad (3)

Combining this bound with the analogous bound for \|T\| gives

\notag       \kappa(T) \ge \displaystyle\frac{\max_i |t_{ii}|}{\min_i |t_{ii}|}. \qquad (4)

We note that commonly used norms satisfy \|A\| \ge \max_{i,j}|a_{ij}|, which yields another proof of (3) and (4).

For any x and y such that y = Tx we have the lower bound \|x\| / \|y\| \le \| T^{-1} \|. We can choose y and then solve the triangular system Tx = y for x to obtain the lower bound. Condition number estimation techniques, which we will describe in another article, provide ways to choose y that usually yield estimates of \| T^{-1} \| correct to within an order of magnitude.

For the 2-norm, we can choose y and then compute x = (T^TT)^{-k}y by repeated triangular solves, obtaining the lower bound (\|x\|_2 / \|y\|_2)^{\frac{1}{2k}} \le \| T^{-1} \|_2. This bound is simply the power method applied to (T^TT)^{-1}.

Upper Bounds

Let T\in\mathbb{C}^{n\times n} be an upper triangular matrix. The upper bounds for \|T^{-1}\| that we will discuss depend only on the absolute values of the elements of T. This limits the ability of the bounds to distinguish between well-conditioned and ill-conditioned matrices. For example, consider

\notag \begin{gathered}      T_1 =       \left[\begin{array}{crrrr} 1 & -2 & -2 & -2 & -2\\         & 1 & -2 & -2 & -2\\         &   & 1 & -2 & -2\\         &   &   & 1 & -2\\         &   &   &   & 1 \end{array}\right], \quad      T_1^{-1} =      \left[\begin{array}{ccccc}      1 & 2 & 6 & 18 & 54\\        & 1 & 2 & 6 & 18\\        &   & 1 & 2 & 6\\        &   &   & 1 & 2\\        &   &   &   & 1      \end{array}\right], \\     T_2 =     \left[\begin{array}{ccccc}     1 & 2 & 2 & 2 & 2\\       & 1 & 2 & 2 & 2\\       &   & 1 & 2 & 2\\       &   &   & 1 & 2\\       &   &   &   & 1     \end{array}\right], \quad   T_2^{-1} =     \left[\begin{array}{crrrr}      1 & -2 & 2 & -2 & 2\\        & 1 & -2 & 2 & -2\\        &   & 1 & -2 & 2\\        &   &   & 1 & -2\\        &   &   &   & 1     \end{array}\right]. \end{gathered}

The bounds for T_1^{-1} and T_2^{-1} will be the same, yet the inverses are of different sizes (the more so as the dimension increases).

Let D = \mathrm{diag}(T) and write

\notag    T = D(I - N),

where N is strictly upper triangular and hence nilpotent with N^n = 0. Then

\notag    T^{-1} = (I + N + N^2 + \cdots + N^{n-1}) D^{-1}.

Taking absolute values and using the triangle inequality gives

\notag   |T^{-1}| \le (I + |N| + |N|^2 + \cdots + |N|^{n-1}) |D|^{-1}, \qquad(5)

where the inequalities hold elementwise.

The comparison matrix M(A) associated with a general A\in\mathbb{C}^{n \times n} is the matrix with

\notag   (M(A))_{ij} =    \begin{cases} |a_{ii}|, & i=j, \\                 -|a_{ij}|, & i\ne j.     \end{cases}

It is not hard to see that M(T) is upper triangular with M(T) = |D| (I - |N|) and so the bound (5) is

\notag   |T^{-1}| \le M(T)^{-1}.

If we replace every element above the diagonal of M(T) by the most negative off-diagonal element in its row we obtain the upper triangular matrix W(T) with

\notag     (W(T))_{ij} = \begin{cases}                     |t_{ii}|, & i=j, \\                             -\max_{i+1\le k\le n}|t_{ik}|, & i<j. \\                 \end{cases}

Then W(T) = |D| (I - |N_1|), where |N| \le |N_1|, so

\notag \begin{aligned}   M(T)^{-1} &= (I + |N| + |N|^2 + \cdots + |N|^{n-1}) |D|^{-1}\\   & \le (I + |N_1| + |N_1|^2 + \cdots + |N_1|^{n-1}) |D|^{-1} = W(T)^{-1}. \end{aligned}

Finally, let Z(T) = \min_i|t_{ii}|(I - |N_2|), where N_2 is strictly upper triangular with every element above the diagonal equal to the maximum element of |N_1|, that is,

\notag      (Z(T))_{ij} = \begin{cases}       \alpha, & i=j, \\                               -\alpha\beta, & i<j, \\                 \end{cases} \qquad     \alpha = \min_i|t_{ii}|, \quad      \beta = \max_{i < j}|t_{ij}|/|t_{ii}|.


\notag \begin{aligned}   W(T)^{-1} &= (I + |N_1| + |N_1|^2 + \cdots + |N_1|^{n-1}) |D|^{-1} \\             &\le \alpha^{-1} (I + |N_2| + |N_2|^2 + \cdots + |N_2|^{n-1}) = Z(T)^{-1}. \end{aligned}

We note that M(T), W(T), and Z(T) are all nonsingular M-matrices. We summarize the bounds.

Theorem 1.

If T\in\mathbb{C}^{n\times n} is a nonsingular upper triangular matrix then

\notag      |T^{-1}|  \le M(T)^{-1}                \le W(T)^{-1}                \le Z(T)^{-1}. \qquad (6)

We make two remarks.

  • The bounds (6) are equally valid for lower triangular matrices as long as the maxima in the definitions of W(T) and Z(T) are taken over columns instead of rows.
  • We could equally well have written A = (I-N)D. The comparison matrix M(T) = (I - |N|)|D| is unchanged, and (6) continues to hold as long as the maxima in the definitions of W(T) and Z(T) are taken over columns rather than rows.

It follows from the theorem that

\notag    \|T^{-1}\|  \le \|M(T)^{-1}\|                \le \|W(T)^{-1}\|                \le \|Z(T)^{-1}\|

for the 1-, 2-, and \infty-norms and the Frobenius norm. Now M(T), W(T), and Z(T) all have nonnegative inverses, and for a matrix A with nonnegative inverse we have \|A^{-1}\|_{\infty} = \|A^{-1}e\|_{\infty}. Hence

\notag   \begin{aligned}    \|T^{-1}\|_{\infty}                &\le \|M(T)^{-1}e\|_{\infty}                \le \|W(T)^{-1}e\|_{\infty}                \le \|Z(T)^{-1}e\|_{\infty}\\         O(n^3) \hskip10pt & \hskip35pt  O(n^2)                  \hskip65pt  O(n)                  \hskip65pt O(1)   \end{aligned}

where the big-Oh expressions show the asymptotic cost in flops of evaluating each term by solving the relevant triangular system. As the bounds become less expensive to compute they become weaker. The quantity \|Z(T)^{-1}\|_p can be explicitly evaluated for p = \infty, using (2). It has the same value for p = 1, and since \|A\|_2 \le (\|A\|_1\|A\|_{\infty})^{1/2} we have

\notag    \|T^{-1}\|_p \le \displaystyle\frac{ (\beta + 1)^{n-1}}{\alpha}, \quad p = 1,2,\infty.    \qquad(7)

This bound is an equality for p = 1,\infty for the matrix T(\theta) in (1).

For the Frobenius norm, evaluating \|Z(T)^{-1}\|_F, and using \|A\|_2 \le \|A\|_F, gives

\notag  \|T^{-1}\|_{2,F} \le        \displaystyle\frac{ \bigl( (\beta + 1)^{2n} + 2n(\beta + 2) - 1 \bigr)^{1/2}}             {\alpha(\beta + 2)}.       \qquad(8)

For the 2-norm, either of (7) and (8) can be the smaller bound depending on \beta.

For the special case of a bidiagonal matrix B it is easy to show that |B^{-1}| = M(B)^{-1}, and so \|B^{-1}\|_{\infty} = \|M(B)^{-1}\|_{\infty} = \|M(B)^{-1}e\|_{\infty} can be computed exactly in O(n) flops.

These upper bounds can be arbitrarily weak, even for a fixed n, as shown by the example

\notag   T(\theta) = \begin{bmatrix} \theta^{-1} &   1      & 1       \\                       0     &  \theta^{-1} & \theta^{-1} \\                       0     &   0      & \theta^{-2} \end{bmatrix},     \quad \theta > 0,

for which

\notag   T(\theta)^{-1} =           \begin{bmatrix} \theta      &  -\theta^2   & 0       \\                       0     &  \theta      & -\theta^2   \\                       0     &   0      & \theta^2    \end{bmatrix},           \quad   M(T(\theta))^{-1} =           \begin{bmatrix} \theta      &  \theta^2    & 2\theta^3   \\                       0     &  \theta      & \theta^2    \\                       0     &   0      & \theta^2    \end{bmatrix}.

As \theta\to\infty, \|M(T(\theta))^{-1}\|_{\infty} /\|T(\theta)^{-1}\|_{\infty} \approx 2\theta. On the other hand, the overestimation is bounded as a function of n for triangular matrices resulting from certain pivoting strategies.

Theorem 1.

Suppose the upper triangular matrix T\in\mathbb{C}^{n\times n} satisfies

\notag       |t_{ii}| \ge |t_{ij}|, \quad j>i. \qquad (9)

Then, for the 1-, 2-, and \infty-norms,

\notag     \displaystyle\frac{1}{\min_i|t_{ii}|} \le \|T^{-1}\| \le \|M(T)^{-1}\|                               \le \|W(T)^{-1}\|                               \le \|Z(T)^{-1}\|                               \le \displaystyle\frac{2^{n-1}}{{\min_i|t_{ii}|}}.

Proof. The first four inequalities are a combination of (3) and (6). The fifth inequality is obtained from the expression (7) for \|Z(T)^{-1}\| with \beta = 1.

Condition (9) is satisfied for the triangular factors from QR factorization with column pivoting and for the transpose of the unit lower triangular factors from LU factorization with any form of pivoting.

The upper bounds we have described have been derived independently by several authors, as explained by Higham (2002).


What Is a Companion Matrix?

A companion matrix C\in\mathbb{C}^{n\times n} is an upper Hessenberg matrix of the form

\notag   C = \begin{bmatrix} a_{n-1} & a_{n-2} & \dots  &\dots & a_0 \\                 1       & 0       & \dots  &\dots &  0 \\                 0       & 1       & \ddots &      &  \vdots \\                 \vdots  &         & \ddots & 0    &  0 \\                 0       &  \dots  & \dots  & 1    &  0           \end{bmatrix}.

Alternatively, C can be transposed and permuted so that the coefficients a_i appear in the first or last column or the last row. By expanding the determinant about the first row it can be seen that

\notag    \det(\lambda I - C) = \lambda^n - a_{n-1}\lambda^{n-1} - \cdots                          - a_1\lambda - a_0, \qquad (*)

so the coefficients in the first row of C are the coefficients of its characteristic polynomial. (Alternatively, in \lambda I - C add \lambda^{n-j} times the jth column to the last column for j = 1:n-1, to obtain p(\lambda)e_1 as the new last column, and expand the determinant about the last column.) MacDuffee (1946) introduced the term “companion matrix” as a translation from the German “Begleitmatrix”.

Setting \lambda = 0 in (*) gives \det(C) = (-1)^{n+1} a_0, so C is nonsingular if and only if a_0 \ne 0. The inverse is

\notag   C^{-1} = \begin{bmatrix}                 0 & 1 & 0 &\dots& 0 \cr                 0 & 0 & 1 &\ddots& 0 \cr               \vdots & & \ddots & \ddots & 0\cr               \vdots & &       & \ddots & 1\cr     \displaystyle\frac{1}{a_0} & -\displaystyle\frac{a_{n-1}}{a_0}                   & -\displaystyle\frac{a_{n-2}}{a_0} & \dots                   & -\displaystyle\frac{a_1}{a_0}           \end{bmatrix}.

Note that P^{-1}C^{-1}P is in companion form, where P = I_n(n:-1:1,:) is the reverse identity matrix, and the coefficients are those of the polynomial -\lambda^n p(1/\lambda), whose roots are the reciprocals of those of p.

A companion matrix has some low rank structure. It can be expressed as a unitary matrix plus a rank-1 matrix:

\notag   C = \begin{bmatrix} 0 & 0 & \dots  &\dots & 1   \\                 1       & 0       & \dots  &\dots &  0 \\                 0       & 1       & \ddots &      &  0 \\                 \vdots  &         & \ddots & 0    &  0 \\                 0       &  \dots  & \dots  & 1    &  0           \end{bmatrix} + e_1 \begin{bmatrix} a_{n-1} & a_{n-2} & \dots & a_0-1           \end{bmatrix}. \qquad (1)

Also, C^{-T} differs from C in just the first and last columns, so C^{-T} = C + E, where E is a rank-2 matrix.

If \lambda is an eigenvalue of C then [\lambda^{n-1}, \lambda^{n-2}, \dots, \lambda, 1]^T is a corresponding eigenvector. The last n-1 rows of \lambda I - C are clearly linearly independent for any \lambda, which implies that C is nonderogatory, that is, no two Jordan blocks in the Jordan canonical form contain the same eigenvalue. In other words, the characteristic polynomial is the same as the minimal polynomial.

The MATLAB function compan takes as input a vector [p_1,p_2, \dots, p_{n+1}] of the coefficients of a polynomial, p_1x^n + p_2 x^{n-1} + \cdots + p_n x + p_{n+1}, and returns the companion matrix with a_{n-1} = -p_2/p_1, …, a_0 = -p_{n+1}/p_1.

Perhaps surprisingly, the singular values of C have simple representations, found by Kenney and Laub (1988):

\notag \begin{aligned}    \sigma_1^2 &= \displaystyle\frac{1}{2} \left( \alpha + \sqrt{\alpha^2 - 4 a_0^2} \right), \\    \sigma_i^2 &= 1, \qquad i=2\colon n-1, \\    \sigma_n^2 &= \displaystyle\frac{1}{2} \left( \alpha - \sqrt{\alpha^2 - 4 a_0^2} \right), \end{aligned}

where \alpha = 1 + a_0^2 + \cdots + a_{n-1}^2. These formulae generalize to block companion matrices, as shown by Higham and Tisseur (2003).


Companion matrices arise naturally when we convert a high order difference equation or differential equation to first order. For example, consider the Fibonacci numbers 1, 1, 2, 3, 5, \dots, which satisfy the recurrence f_n = f_{n-1} + f_{n-2} for n \ge 2, with f_0 = f_1 = 1. We can write

\notag     \begin{bmatrix} f_n \\ f_{n-1} \end{bmatrix}      = \begin{bmatrix} 1 & 1 \\ 1 & 0 \end{bmatrix}         \begin{bmatrix} f_{n-1} \\ f_{n-2} \end{bmatrix}       = \begin{bmatrix} 1 & 1 \\ 1 & 0 \end{bmatrix}^2         \begin{bmatrix} f_{n-2} \\ f_{n-3} \end{bmatrix}      = \cdots       = \begin{bmatrix} 1 & 1 \\ 1 & 0 \end{bmatrix}^{n-1}         \begin{bmatrix} f_{1} \\ f_{0} \end{bmatrix},

where \left[\begin{smallmatrix}1 & 1 \\ 1 & 0 \end{smallmatrix}\right] is a companion matrix. This expression can be used to compute f_n in O(\log_2n) operations by computing the matrix power using binary powering.

As another example, consider the differential equation

\notag        y''' = b_2 y'' + b_1 y' + b_0 y.

Define new variables

z_1 = y'', \quad z_2 = y', \quad z_3 = y.


\notag \begin{aligned}      z_1' &= b_2 z_1 + b_1 z_2 + b_0 z_3,\\      z_2' &= z_1\\      z_3' &=  z_2, \end{aligned}


\notag      \begin{bmatrix} z_1 \\ z_2\\ z_3 \end{bmatrix}'  =   \begin{bmatrix} b_2 & b_1 & b_0 \\                 1       & 0    &   0 \\                 0       & 1    &   0 \\           \end{bmatrix}      \begin{bmatrix} z_1 \\ z_2\\ z_3 \end{bmatrix},

so the third order scalar equation has been converted into a first order system with a companion matrix as coefficient matrix.

Computing Polynomial Roots

The MATLAB function roots takes as input a vector of the coefficients of a polynomial and returns the roots of the polynomial. It computes the eigenvalues of the companion matrix associated with the polynomial using the eig function. As Moler (1991) explained, MATLAB used this approach starting from the first version of MATLAB, but it does not take advantage of the structure of the companion matrix, requiring O(n^3) flops and O(n^2) storage instead of the O(n^2) flops and O(n) storage that should be possible given the structure of C. Since the early 2000s much research has aimed at deriving methods that achieve this objective, but numerically stable methods proved elusive. Finally, a backward stable algorithm requiring O(n^2) flops and O(n) storage was developed by Aurentz, Mach, Vandebril, and Watkins (2015). It uses the QR algorithm and exploits the unitary plus low rank structure shown in (1). Here, backward stability means that the computed roots are the eigenvalues of C + \Delta C for some \Delta C with \|\Delta C\| \le c_n u \|C\|. It is not necessarily the case that the computed roots are the exact roots of a polynomial with coefficients a_i + \Delta a_i with |\Delta a_i| \le c_n u \max_i |a_i| for all i.

Rational Canonical Form

It is an interesting observation that

\notag      \begin{bmatrix}                  0  &  0   & 0    & 1   \\                  0  &  0   & 1    & -a_3\\                  0  &  1   & -a_3 & -a_2 \\                  1  & -a_3 & -a_2 & -a_1      \end{bmatrix}      \begin{bmatrix}                a_3 & a_2  & a_1  & a_0 \\                 1    & 0  &   0  & 0 \\                 0    &  1 &   0  & 0 \\                 0    &  0 &   1  & 0      \end{bmatrix}                 =      \begin{bmatrix}                0 & 0    & 1     & 0 \\                0 & 1    & -a_3  & 0 \\                1 & -a_3 & -a_2  & 0  \\                0 & 0    & 0     & a_0 \\      \end{bmatrix}.

Multiplying by the inverse of the matrix on the left we express the 4\times 4 companion matrix as the product of two symmetric matrices. The obvious generalization of this factorization to n\times n matrices shows that we can write

\notag   C = S_1S_2,  \quad S_1 = S_1^T, \quad S_2= S_2^T. \qquad (2)

We need the rational canonical form of a matrix, described in the next theorem, which Halmos (1991) calls “the deepest theorem of linear algebra”. Let \mathbb{F} denote the field \mathbb{R} or \mathbb{C}.

Theorem 1 (rational canonical form).

If A\in\mathbb{F}^{n\times n} then A = X^{-1} C X where X\in\mathbb{F}^{n\times n} is nonsingular and C = \mathrm{diag}(C_i)\in\mathbb{F}^{n\times n}, with each C_i a companion matrix.

The theorem says that every matrix is similar over the underlying field to a block diagonal matrix composed of companion matrices. Since we do not need it, we have omitted from the statement of the theorem the description of the C_i in terms of the irreducible factors of the characteristic polynomial. Combining the factorization (2) and Theorem 1 we obtain

\notag \begin{aligned}   A &= X^{-1}CX = X^{-1} S_1S_2 X \\     & = X^{-1}S_1X^{-T} \cdot X^T S_2 X \\     & \equiv \widetilde{S}_1 \widetilde{S}_2,\quad    \widetilde{S}_1 = \widetilde{S}_1^T, \quad    \widetilde{S}_2 = \widetilde{S}_2^T. \end{aligned}

Since S_1 is nonsingular, and since S_2 can alternatively be taken nonsingular by considering the factorization of A^T, this proves a theorem of Frobenius.

Theorem 2 (Frobenius, 1910).

For any A\in\mathbb{F}^{n\times n} there exist symmetric S_1,S_2\in\mathbb{F}^{n\times n}, either one of which can be taken nonsingular, such that A = S_1 S_2.

Note that if A = S_1S_2 with the S_i symmetric then AS_1 = S_1S_2S_1 = S_1A^T = (AS_1)^T, so AS_1 is symmetric. Likewise, S_2A is symmetric.


Fiedler (2003) noted that a companion matrix can be factorized into the product of n simpler factors, n-1 of them being the identity matrix with a 2\times 2 block placed on the diagonal, and he used this factorization to determine a matrix \widetilde{C} similar to C. For n = 5 it is

\notag \widetilde{C} = \begin{bmatrix}    a_4 & a_3 & 1  & 0   & 0  \\      1 & 0   & 0  & 0   & 0  \\      0 & a_2 & 0  & a_1 & 1  \\      0 & 1   & 0  & 0   & 0 \\      0 & 0   & 0  & a_0 & 0 \end{bmatrix} = \left[\begin{array}{cc|cc|c}    a_4 & 1 &     &   &   \\      1 & 0 &     &   &   \\\hline        &   & a_2 & 1 &   \\        &   &  1  & 0 &   \\\hline        &   &     &   &  a_0 \end{array}\right] \left[\begin{array}{c|cc|cc}      1  &     &   &      & \\\hline         & a_3 & 1 &      & \\         &  1  & 0 &      & \\\hline         &     &   &  a_1 & 1 \\         &     &   &   1  & 0 \end{array}\right].

In general, Fielder’s construction yields an n\times n pentadiagonal matrix \widetilde{C} that is not simply a permutation similarity of C. The fact that \widetilde{C} has block diagonal factors opens the possibility of obtaining new methods for finding the eigenvalues of C. This line of research has been extensively pursued in the context of polynomial eigenvalue problems (see Mackey, 2013).


The companion matrix is associated with the monomial basis representation of the characteristic polynomial. Other polynomial bases can be used, notably orthogonal polynomials, and this leads to generalizations of the companion matrix having coefficients on the main diagonal and the subdiagonal and superdiagonal. Good (1961) calls the matrix resulting from the Chebyshev basis a colleague matrix. Barnett (1981) calls the matrices corresponding to orthogonal polynomials comrade matrices, and for a general polynomial basis he uses the term confederate matrices. Generalizations of the properties of companion matrices can be derived for these classes of matrices.

Bounds for Polynomial Roots

Since the roots of a polynomial are the eigenvalues of the associated companion matrix, or a Fiedler matrix similar to it, or indeed the associated comrade matrix or confederate matrix, one can obtain bounds on the roots by applying any available bounds for matrix eigenvalues. For example, since any eigenvalue \lambda of matrix A satisfies |\lambda| \le \|A\|, by taking the 1-norm and the \infty-norm of the companion matrix C we find that any root \lambda of the polynomial (*) satisfies

\notag \begin{aligned} |\lambda| &\le \max\bigl(|a_0|, 1 + \max_{j = 1:n-1} |a_j| \bigr), \\ |\lambda| &\le \max(1, |a_{n-1}| + |a_{n-2}| + \cdots + |a_0|), \end{aligned}

either of which can be the smaller. A rich variety of such bounds is available, and these techniques extend to matrix polynomials and the corresponding block companion matrices.


This is a minimal set of references, which contain further useful references within.

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

What Is an M-Matrix?

An M-matrix is a matrix A\in\mathbb{R}^{n \times n} of the form

\notag      A = sI - B, \quad \mathrm{where}~B \ge 0~\mathrm{and}~s > \rho(B).      \qquad (*)

Here, \rho(B) is the spectral radius of B, that is, the largest modulus of any eigenvalue of B, and B \ge 0 denotes that B has nonnegative entries. An M-matrix clearly has nonpositive off-diagonal elements. It also has positive diagonal elements, which can be shown using the result that

\notag  \rho(A) = \lim_{k\to\infty} \|A^k\|^{1/k}   \qquad (\dagger)

for any consistent matrix norm:

\notag  s > \rho(B) = \lim_{k\to\infty} \|B^k\|_{\infty}^{1/k}          \ge  \lim_{k\to\infty} \|\mathrm{diag}(b_{ii})^k\|_{\infty}^{1/k}          = \max_i b_{ii}.

Although the definition of an M-matrix does not specify s, we can set it to \max_i a_{ii}. Indeed let s satisfy (*) and set \widetilde{s} = \max_i a_{ii} and \widetilde{B} = \widetilde{s}I - A. Then \widetilde{B} \ge 0, since \widetilde{b}_{ii} = \widetilde{s} - a_{ii} \ge 0 and \widetilde{b}_{ij} = -a_{ij} = b_{ij} \ge 0 for i \ne j. Furthermore, for a nonnegative matrix the spectral radius is an eigenvalue, by the Perron–Frobenius theorem, so \rho(B) is an eigenvalue of B and \rho(\widetilde{B)} is an eigenvalue of \widetilde{B}. Hence \rho(\widetilde{B}) = \rho( (\widetilde{s}-s)I + B) = \widetilde{s}  -s + \rho(B) < \widetilde{s}.

The concept of M-matrix was introduced by Ostrowski in 1937. M-matrices arise in a variety of scientific settings, including in finite difference methods for PDEs, input-output analysis in economics, and Markov chains in stochastic processes.

An immediate consequence of the definition is that the eigenvalues of an M-matrix lie in the open right-half plane, which means that M-matrices are special cases of positive stable matrices. Hence an M-matrix is nonsingular and the determinant, being the product of the eigenvalues, is positive. Moreover, since C = s^{-1}B satisfies \rho(C) < 1,

\notag     A^{-1} = s^{-1}(I - C)^{-1}            = s^{-1}(I + C + C^2 + \cdots) \ge 0.

In fact, nonnegativity of the inverse characterizes M-matrices. Define

\notag     Z_n = \{ \, A \in \mathbb{R}^{n\times n}: a_{ij} \le 0, \; i\ne j \,\}.

Theorem 1.

A nonsingular matrix A\in Z_n is an M-matrix if and only if A^{-1} \ge 0.

Sometimes an M-matrix is defined to be a matrix with nonpositive off-diagonal elements and a nonnegative inverse. In fact, this condition is just one of a large number of conditions equivalent to a matrix with nonpositive off-diagonal elements being an M-matrix, fifty of which are given in Berman and Plemmons (1994, Chap. 6).

It is easy to check from the definitions, or using Theorem 1, that a triangular matrix T with positive diagonal and nonpositive off-diagonal is an M-matrix. An example is

\notag     T_4 = \left[\begin{array}{@{\mskip 5mu}c*{3}{@{\mskip 15mu} r}@{\mskip 5mu}}      1 &   -1  &  -1  & -1 \\        &    1  &  -1  & -1 \\        &       &   1  & -1 \\        &       &      &  1            \end{array}\right], \quad   T_4^{-1} =    \begin{bmatrix}    1 & 1 & 2 & 4\\   & 1 & 1 & 2\\   &   & 1 & 1\\   &   &   & 1    \end{bmatrix}.

Bounding the Norm of the Inverse

An M-matrix can be constructed from any nonsingular triangular matrix by taking the comparison matrix. The comparison matrix associated with a general B\in\mathbb{R}^{n \times n} is the matrix

\notag   M(B) = (m_{ij}), \quad m_{ij} =    \begin{cases} |b_{ii}|, & i=j, \\                 -|b_{ij}|, & i\ne j.     \end{cases}

For a nonsingular triangular T, M(T) is an M-matrix, and it easy to show that

\notag       |T^{-1}| \le |M(T)^{-1}|,

where the absolute value is taken componentwise. This bound, and weaker related bounds, can be useful for cheaply bounding the norm of the inverse of a triangular matrix. For example, with e denoting the vector of ones, since M(T)^{-1} is nonnegative we have

\notag    \|T^{-1}\|_{\infty} \le \|M(T)^{-1}\|_{\infty}                         = \|M(T)^{-1}e\|_{\infty},

and \|M(T)^{-1}e\|_{\infty} can be computed in O(n^2) flops by solving a triangular system, whereas computing T^{-1} costs O(n^3) flops.

More generally, if we have an LU factorization PA = LU of an M-matrix A\in\mathbb{R}^{n \times n} then, since A^{-1} \ge 0,

\notag   \|A^{-1}\|_{\infty} = \|A^{-1}e\|_{\infty}                       = \|U^{-1}L^{-1}Pe\|_{\infty}                       = \|U^{-1}L^{-1}e\|_{\infty}.

Therefore the norm of the inverse can be computed in O(n^2) flops with two triangular solves, instead of the O(n^3) flops that would be required if A^{-1} were to be formed explicitly.

Connections with Symmetric Positive Definite Matrices

There are many analogies between M-matrices and symmetric positive definite matrices. For example, every principal submatrix of a symmetric positive definite matrix is symmetric positive definite and every principal submatrix of an M-matrix is an M-matrix. Indeed if \widetilde{B} is a principal submatrix of a nonnegative B then \rho(\widetilde{B}) \le \rho(B), which follows from (\dagger) for the \infty-norm (for example). Hence on taking principal submatrices in (*) we have s > \rho(\widetilde{B}) with the same s.

A symmetric M-matrix is known as a Stieltjes matrix, and such a matrix is positive definite. An example of a Stieltjes matrix is minus the second difference matrix (the tridiagonal matrix arising from a central difference discretization of a second derivative), illustrated for n = 4 by

\notag     A_4 = \left[\begin{array}{@{\mskip 5mu}c*{3}{@{\mskip 15mu} r}@{\mskip 5mu}}      2 &   -1  &      &    \\     -1 &    2  &  -1  &    \\        &    -1 &   2  &  -1 \\        &       &  -1  &  2            \end{array}\right], \quad   A_4^{-1} =    \begin{bmatrix}   \frac{4}{5} & \frac{3}{5} & \frac{2}{5} & \frac{1}{5}\\[\smallskipamount]   \frac{3}{5} & \frac{6}{5} & \frac{4}{5} & \frac{2}{5}\\[\smallskipamount]   \frac{2}{5} & \frac{4}{5} & \frac{6}{5} & \frac{3}{5}\\[\smallskipamount]   \frac{1}{5} & \frac{2}{5} & \frac{3}{5} & \frac{4}{5} \end{bmatrix}.

LU Factorization

Since the leading principal submatrices of an M-matrix A have positive determinant it follows that A has an LU factorization with U having positive diagonal elements. However, more is true, as the next result shows.

Theorem 2.

An M-matrix A has an LU factorization A = LU in which L and U are M-matrices.

Proof. We can write

\notag    A =    \begin{array}[b]{@{\mskip27mu}c@{\mskip-20mu}c@{\mskip-10mu}c@{}}    \scriptstyle 1 &    \scriptstyle n-1 &    \\    \multicolumn{2}{c}{        \left[\begin{array}{c@{~}c@{~}}                  \alpha & b^T \\                  c^T    & E  \\              \end{array}\right]}    & \mskip-14mu\          \begin{array}{c}              \scriptstyle 1 \\              \scriptstyle n-1              \end{array}    \end{array},   \quad \alpha > 0, \quad b\le 0, \quad c \le 0.

The first stage of LU factorization is

\notag   A = \begin{bmatrix}           \alpha & b^T \\              c   & E       \end{bmatrix}    =       \begin{bmatrix}               1          & 0   \\              \alpha^{-1}c & I       \end{bmatrix}       \begin{bmatrix}               \alpha  & b^T   \\                 0     & S       \end{bmatrix} = L_1U_1, \quad S = E - \alpha^{-1} c\mskip1mu b^T,

where S is the Schur complement of \alpha in A. The first column of L_1 and the first row of U_1 are of the form required for a triangular M-matrix. We have

\notag   A^{-1} = U_1^{-1}L_1^{-1} =       \begin{bmatrix}               \alpha^{-1} & -\alpha^{-1}b^TS^{-1}  \\                     0     & S^{-1}       \end{bmatrix}       \begin{bmatrix}               1          & 0   \\              -\alpha^{-1}c & I       \end{bmatrix}     =       \begin{bmatrix}               \times          & \times   \\              \times & S^{-1}       \end{bmatrix}.

Since A^{-1} \ge 0 it follows that S^{-1} \ge 0. It is easy to see that S\in Z_n, and hence Theorem 1 shows that S is an M-matrix. The result follows by induction.

The question now arises of what can be said about the numerical stability of LU factorization of an M-matrix. To answer it we use another characterization of M-matrices, that DA is strictly diagonally dominant by columns for some diagonal matrix D = \mathrm{diag}(d_i) with d_i>0 for all i, that is,

\notag        d_j|a_{jj}| > \sum_{i\ne j} d_i |a_{ij}|, \quad j=1\colon n.

(This condition can also be written as d^TA > 0 because of the sign pattern of A.) Matrices that are diagonally dominant by columns have the properties that an LU factorization without pivoting exists, the growth factor \rho_n \le 2, and partial pivoting does not require row interchanges. The effect of row scaling on LU factorization is easy to see:

\notag   A = LU \;\Rightarrow\; DA = DLD^{-1} \cdot DU       \equiv \widetilde{L} \widetilde{U},

where \widetilde{L} is unit lower triangular, so that \widetilde{L} and \widetilde{U} are the LU factors of DA. It is easy to see that the growth factor bound of 2 for a matrix diagonally dominant by columns translates into the bound

\notag       \rho_n \le 2\mskip1mu\displaystyle\frac{\max_i d_i}{\min_i d_i} \qquad(\ddagger)

for an M-matrix, which was obtained by Funderlic, Neumann, and Plemmons (1982). Unfortunately, this bound can be large. Consider the matrix

\notag  A = \begin{bmatrix}             \epsilon& 0& -1\\            -1& 1& -1\\             0& 0&  1    \end{bmatrix} \in Z_3, \quad 0 < \epsilon < 1.

We have

\notag  A^{-1} =    \begin{bmatrix}        \displaystyle\frac{1}{\epsilon}& 0& \displaystyle\frac{1}{\epsilon}\\[\bigskipamount]        \displaystyle\frac{1}{\epsilon}& 1 & \displaystyle\frac{1 + \epsilon}{\epsilon}\\[\bigskipamount]                  0&           0&     1    \end{bmatrix} \ge 0,

so A is an M-matrix. The (2,3) element of the LU factor U of A is -1 - 1/\epsilon, which means that

\notag     \rho_3 \ge \displaystyle\frac{1}{\epsilon} + 1,

and this lower bound can be arbitrarily large. One can verify experimentally that numerical instability is possible when \rho_3 is large, in that the computed LU factors have a large relative residual. We conclude that pivoting is necessary for numerical stability in LU factorization of M-matrices.

Stationery Iterative Methods

A stationery iterative method for solving a linear system Ax = b is based on a splitting A = M - N with M nonsingular, and has the form Mx_{k+1} = Nx_k + b. This iteration converges for all starting vectors x_0 if \rho(M^{-1}N) < 1. Much interest has focused on regular splittings, which are defined as ones for which M^{-1}\ge 0 and N \ge 0. An M-matrix has the important property that \rho(M^{-1}N) < 1 for every regular splitting, and it follows that the Jacobi iteration, the Gauss-Seidel iteration, and the successive overrelaxation (SOR) iteration (with 0 < \omega \le 1) are all convergent for M-matrices.

Matrix Square Root

The principal square root A^{1/2} of an M-matrix A is an M-matrix, and it is the unique such square root. An expression for A^{1/2} follows from (*):

\notag \begin{aligned}      A^{1/2} &= s^{1/2}(I - C)^{1/2} \quad (C = s^{-1}B, ~\rho(C) < 1),\\              &= s^{1/2} \sum_{j=0}^{\infty} {\frac{1}{2} \choose j} (-C)^j. \end{aligned}

This expression does not necessarily provide the best way to compute A^{1/2}.

Singular M-Matrices

The theory of M-matrices extends to the case where the condition on s is relaxed to s \ge \rho(B) in (*), though the theory is more complicated and extra conditions such as irreducibility are needed for some results. Singular M-matrices occur in Markov chains (Berman and Plemmons, 1994, Chapter 8), for example. Let P be the transition matrix of a Markov chain. Then P is stochastic, that is, nonnegative with unit row sums, so Pe = e. A nonnegative vector y with y^Te = 1 such that y^T P = y^T is called a stationary distribution vector and is of interest for describing the properties of the Markov chain. To compute y we can solve the singular system Ay = (I - P^T)y = 0. Clearly, A\in Z_n and \rho(P) = 1, so A is a singular M-matrix.


A more general concept is that of H-matrix: A\in\mathbb{R}^{n \times n} is an H-matrix if the comparison matrix M(A) is an M-matrix. Many results for M-matrices extend to H-matrices. For example, for an H-matrix with positive diagonal elements the principal square root exists and is the unique square root that is an H-matrix with positive diagonal elements. Also, the growth factor bound (\ddagger) holds for any H-matrix for which DA is diagonally dominant by columns.


This is a minimal set of references, which contain further useful references within.

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

Eigenvalue Inequalities for Hermitian Matrices

The eigenvalues of Hermitian matrices satisfy a wide variety of inequalities. We present some of the most useful and explain their implications. Proofs are omitted, but as Parlett (1998) notes, the proofs of the Courant–Fischer, Weyl, and Cauchy results are all consequences of the elementary fact that if the sum of the dimensions of two subspaces of \mathbb{C}^n exceeds n then the subspaces have a nontrivial intersection.

The eigenvalues of a Hermitian matrix A\in\mathbb{C}^{n\times n} are real and we order them \lambda_n\le \lambda_{n-1} \le \cdots \le \lambda_1. Note that in some references, such as Horn and Johnson (2013), the reverse ordering is used, with \lambda_n the largest eigenvalue. When it is necessary to specify what matrix \lambda_k is an eigenvalue of we write \lambda_k(A): the kth largest eigenvalue of A. All the following results also hold for symmetric matrices over \mathbb{R}^{n\times n}.

Quadratic Form

The function f(x) = x^*Ax/x^*x is the quadratic form x^*Ax for A evaluated on the unit sphere, since f(x) = f(x/\|x\|_2). As A is Hermitian it has a spectral decomposition A = Q\Lambda Q^*, where Q is unitary and \Lambda = \mathrm{diag}(\lambda_i). Then

f(x) = \displaystyle\frac{x^*Q\Lambda Q^*x}{x^*x}             = \displaystyle\frac{y^*\Lambda y}{y^*y}             = \displaystyle\frac{\sum_{i=1}^{n}\lambda_i y_i^2}                                 {\sum_{i=1}^{n}y_i^2} \quad (y = Q^*x),

from which is it clear that

\notag  \lambda_n = \displaystyle\min_{x\ne0} \displaystyle\frac{x^*Ax}{x^*x}, \quad  \lambda_1 = \displaystyle\max_{x\ne0} \displaystyle\frac{x^*Ax}{x^*x}, \qquad(*)

with equality when x is an eigenvector corresponding to \lambda_n and \lambda_1, respectively, This characterization of the extremal eigenvalues of A as the extrema of f is due to Lord Rayleigh (John William Strutt), and f(x) is called a Rayleigh quotient. The intermediate eigenvalues correspond to saddle points of f.

Courant–Fischer Theorem

The Courant–Fischer theorem (1905) states that every eigenvalue of a Hermitian matrix A\in\mathbb{C}^{n\times n} is the solution of both a min-max problem and a max-min problem over suitable subspaces of \mathbb{C}^n.

Theorem (Courant–Fischer).

For a Hermitian A\in\mathbb{C}^{n\times n},

\notag \begin{aligned}    \lambda_k &= \min_{\dim(S)=n-k+1} \, \max_{0\ne x\in S} \frac{x^*Ax}{x^*x}\\              &= \max_{\dim(S)= k} \, \min_{0\ne x\in S} \frac{x^*Ax}{x^*x},                  \quad k=1\colon n. \end{aligned}

Note that the equalities (*) are special cases of these characterizations.

In general there is no useful formula for the eigenvalues of a sum A+B of Hermitian matrices. However, the Courant–Fischer theorem yields the upper and lower bounds

\notag  \lambda_k(A) + \lambda_n(B) \le \lambda_k(A+B) \le \lambda_k(A) + \lambda_1(B),   \qquad (1)

from which it follows that

\notag   \max_k|\lambda_k(A+B)-\lambda_k(A)| \le \max(|\lambda_n(B)|,|\lambda_1(B)|)     = \|B\|_2.

This inequality shows that the eigenvalues of a Hermitian matrix are well conditioned under perturbation. We can rewrite the inequality in the symmetric form

\notag   \max_k |\lambda_k(A)-\lambda_k(B)| \le \|A-B\|_2.

If B is positive semidefinite then (1) gives

\notag    \lambda_k(A) \le \lambda_k(A + B),    \quad k = 1\colon n, \qquad (2)

while if B is positive definite then strict inequality holds for all i. These bounds are known as the Weyl monotonicity theorem.

Weyl’s Inequalities

Weyl’s inequalities (1912) bound the eigenvalues of A+B in terms of those of A and B.

Theorem (Weyl).

For Hermitian A,B\in\mathbb{C}^{n\times n} and i,j = 1\colon n,

\notag \begin{aligned}     \lambda_{i+j-1}(A+B) &\le \lambda_i(A) + \lambda_j(B),     \quad i+j \le n+1, \qquad (3)\\     \lambda_i(A) + \lambda_j(B) &\le \lambda_{i+j-n}(A+B).     \quad i+j \ge n+1, \qquad (4) \end{aligned}

The Weyl inequalities yield much information about the effect of low rank perturbations. Consider a positive semidefinite rank-1 perturbation B = zz^*. Inequality (3) with j = 1 gives

\notag     \lambda_i(A+B) \le \lambda_i(A) + z^*z,       \quad i = 1\colon n

(which also follows from (1)). Inequality (3) with j = 2, combined with (2), gives

\notag     \lambda_{i+1}(A) \le \lambda_{i+1}(A + zz^*) \le \lambda_i(A),       \quad i = 1\colon n-1. \qquad (5)

These inequalities confine each eigenvalue of A + zz^* to the interval between two adjacent eigenvalues of A; the eigenvalues of A + zz^* are said to interlace those of A. The following figure illustrates the case n = 4, showing a possible configuration of the eigenvalues \lambda_i of A and \mu_i of A + zz^*.

weyl_fig.jpg A specific example, in MATLAB, is

>> n = 4; eig_orig = 5:5+n-1
>> D = diag(eig_orig); eig_pert = eig(D + ones(n))'
eig_orig =
     5     6     7     8
eig_pert =
   5.2961e+00   6.3923e+00   7.5077e+00   1.0804e+01

Since \mathrm{trace}(A + zz^*) = \mathrm{trace}(A) + z^*z and the trace is the sum of the eigenvalues, we can write

\notag       \lambda_i(A + zz^*) = \lambda_i(A) + \theta_i z^*z,

where the \theta_i are nonnegative and sum to 1. If we greatly increase z^*z, the norm of the perturbation, then most of the increase in the eigenvalues is concentrated in the largest, since (5) bounds how much the smaller eigenvalues can change:

>> eig_pert = eig(D + 100*ones(n))'
eig_pert =
   5.3810e+00   6.4989e+00   7.6170e+00   4.0650e+02

More generally, if B has p positive eigenvalues and q negative eigenvalues then (3) with j = p+1 gives

\notag     \lambda_{i+p}(A+B) \le \lambda_i(A),      \quad i = 1\colon n-p,

while (4) with j = n-q gives

\notag     \lambda_i(A) \le \lambda_{i-q}(A + B),     \quad i = q+1\colon n.

So the inertia of B (the number of negative, zero, and positive eigenvalues) determines how far the eigenvalues can move as measured relative to the indexes of the eigenvalues of A.

An important implication of the last two inequalities is for the case A = I, for which we have

\notag \begin{aligned}  \lambda_{i+p}(I+B) &\le 1, \quad i = 1 \colon n-p, \\  \lambda_{i-q}(I+B) &\ge 1, \quad i = q+1 \colon n. \end{aligned}

Exactly p+q eigenvalues appear in one of these inequalities and n-(p+q) appear in both. Therefore n - (p+q) of the eigenvalues are equal to 1 and so only \mathrm{rank}(B) = p+q eigenvalues can differ from 1. So perturbing the identity matrix by a Hermitian matrix of rank r changes at most r of the eigenvalues. (In fact, it changes exactly r eigenvalues, as can be seen from a spectral decomposition.)

Finally, if B has rank r then \lambda_{r+1}(B) \le 0 and \lambda_{n-r}(B) \ge 0 and so taking j = r+1 in (3) and j = n-r in (4) gives

\notag   \begin{aligned}     \lambda_{i+r}(A+B) &\le \lambda_i(A),      ~~\qquad\qquad i = 1\colon n-r, \\         \lambda_i(A) &\le \lambda_{i-r}(A + B), ~~\quad i = r+1\colon n.   \end{aligned}

Cauchy Interlace Theorem

The Cauchy interlace theorem relates the eigenvalues of successive leading principal submatrices of a Hermitian matrix. We denote the leading principal submatrix of A of order k by A_k = A(1\colon k, 1\colon k).

Theorem (Cauchy).

For a Hermitian A\in\mathbb{C}^{n\times n},

\notag  \lambda_{i+1}(A_{k+1}) \le \lambda_i(A_k) \le \lambda_i(A_{k+1}),    \quad i = 1\colon k, \quad k=1\colon n-1.

The theorem says that the eigenvalues of A_k interlace those of A_{k+1} for all k. Two immediate implications are that (a) if A is Hermitian positive definite then so are all its leading principal submatrices and (b) appending a row and a column to a Hermitian matrix does not decrease the largest eigenvalue or increase the smallest eigenvalue.

Since eigenvalues are unchanged under symmetric permutations of the matrix, the theorem can be reformulated to say that the eigenvalues of any principal submatrix of order n-1 interlace those of A. A generalization to principal submatrices of order n-\ell is given in the next result.


If B is a principal submatrix of order n-\ell of a Hermitian A\in\mathbb{C}^{n\times n} then

\notag  \lambda_{i+\ell}(A) \le \lambda_i(B) \le \lambda_i(A),    \quad i=1\colon n-\ell.

Majorization Results

It follows by taking x to be a unit vector e_i in the formula \lambda_1 = \max_{x\ne0} x^*Ax/(x^*x) that \lambda_1 \ge a_{ii} for all i. And of course the trace of A is the sum of the eigenvalues: \sum_{i=1}^n a_{ii} = \sum_{i=1}^n \lambda_i. These relations are the first and last in a sequence of inequalities relating sums of eigenvalues to sums of diagonal elements obtained by Schur in 1923.

Theorem (Schur).

For a Hermitian A\in\mathbb{C}^{n\times n},

\notag     \displaystyle\sum_{i=1}^k \lambda_i \ge \displaystyle\sum_{i=1}^k \widetilde{a}_{ii},     \quad k=1\colon n,

where \{\widetilde{a}_{ii}\} is the set of diagonal elements of A arranged in decreasing order: \widetilde{a}_{11} \ge \cdots \ge \widetilde{a}_{nn}.

These inequalities say that the vector [\lambda_1,\dots,\lambda_n] of eigenvalues majorizes the ordered vector [\widetilde{a}_{11},\dots,\widetilde{a}_{nn}] of diagonal elements.

An interesting special case is a correlation matrix, a symmetric positive semidefinite matrix with unit diagonal, for which the inequalities are

\notag     \lambda_1 \ge 1, \quad     \lambda_1+ \lambda_2\ge 2, \quad \dots, \quad     \lambda_1+ \lambda_2 + \cdots + \lambda_{n-1} \ge n-1,

and \lambda_1+ \lambda_2 + \cdots + \lambda_n = n. Here is an illustration in MATLAB.

>> n = 5; rng(1); A = gallery('randcorr',n);
>> e = sort(eig(A)','descend'), partial_sums = cumsum(e)
e =
  2.2701e+00   1.3142e+00   9.5280e-01   4.6250e-01   3.6045e-04
partial_sums =
  2.2701e+00   3.5843e+00   4.5371e+00   4.9996e+00   5.0000e+00

Ky Fan (1949) proved a majorization relation between the eigenvalues of A, B, and A+B:

\notag   \displaystyle\sum_{i=1}^k \lambda_i(A+B) \le   \displaystyle\sum_{i=1}^k \lambda_i(A) +   \displaystyle\sum_{i=1}^k \lambda_i(B), \quad k = 1\colon n.

For k = 1, the inequality is the same as the upper bound of (1), and for k = n it is an equality: \mathrm{trace}(A+B) = \mathrm{trace}(A) + \mathrm{trace}(B).

Ostrowski’s Theorem

For a Hermitian A and a nonsingular X, the transformation A\to X^*AX is a congruence transformation. Sylvester’s law of inertia says that congruence transformations preserve the inertia. A result of Ostrowski (1959) goes further by providing bounds on the ratios of the eigenvalues of the original and transformed matrices.

Theorem (Ostrowski).

For a Hermitian A\in \mathbb{C}^{n\times n} and X\in\mathbb{C}^{n\times n},

\lambda_k(X^*AX) = \theta_k \lambda_k(A), \quad k=1\colon n,

where \lambda_n(X^*X) \le \theta_k \le \lambda_1(X^*X).

If X is unitary then X^*X = I and so Ostrowski’s theorem reduces to the fact that a congruence with a unitary matrix is a similarity transformation and so preserves eigenvalues. The theorem shows that the further X is from being unitary the greater the potential change in the eigenvalues.

Ostrowski’s theorem can be generalized to the situation where X is rectangular (Higham and Cheng, 1998).


The results we have described are strongly interrelated. For example, the Courant–Fischer theorem and the Cauchy interlacing theorem can be derived from each other, and Ostrowski’s theorem can be proved using the Courant–Fischer Theorem.


SIAM CSE21 Minisymposium on Reduced Precision Arithmetic and Stochastic Rounding

This minisymposium took place at the SIAM Conference on Computational Science and Engineering, March 2, 2021. This page makes available slides from some of the talks.

Minisymposium description: Reduced precision floating-point arithmetic, such as IEEE half precision and bfloat16, is increasingly available in hardware. Low precision computations promise major increases in speed and reductions in data communication costs, but they also bring an increased risk of overflow, underflow, and loss of accuracy. One way to improve the results of low precision computations is to use stochastic rounding instead of round to nearest, and this is proving popular in machine learning. This minisymposium will discuss recent advances in exploitation and analysis of reduced precision arithmetic and stochastic rounding.

Algorithms for Stochastically Rounded Elementary Arithmetic Operations in IEEE 754 Floating-Point Arithmetic Massimiliano Fasi, Örebro University, Sweden; Mantas Mikaitis, University of Manchester, United Kingdom. Abstract. Slides.

Reduced Precision Elementary Functions Jean-Michel Muller, ENS Lyon, France. Abstract. Slides.

Effect of Reduced Precision and Stochastic Rounding in the Numerical Solution of Parabolic Equations Matteo Croci and Michael B. Giles, University of Oxford, United Kingdom. Abstract. Slides.

Stochastic Rounding and its Probabilistic Backward Error Analysis Michael P. Connolly and Nicholas J. Higham, University of Manchester, United Kingdom; Theo Mary, Sorbonne Universités and CNRS, France. Abstract. Slides.

Stochastic Rounding in Weather and Climate Models Milan Kloewer, Edmund Paxton, and Matthew Chantry, University of Oxford, United Kingdom Abstract. Slides.

Matrix Rank Relations

Matrix rank is an important concept in linear algebra. While rank deficiency can be a sign of an incompletely or improperly specified problem (a singular system of linear equations, for example), in some problems low rank of a matrix is a desired property or outcome. Here we present some fundamental rank relations in a concise form useful for reference. These are all immediate consequences of the singular value decomposition (SVD), but we give elementary (albeit not entirely self-contained) proofs of them.

The rank of a matrix A\in\mathbb{R}^{m\times n} is the maximum number of linearly independent columns, which is the dimension of the range space of A, \mathrm{range}(A) = \{\, Ax: x \in\mathbb{R}^n \,\}. An important but non-obvious fact is that this is the same as the maximum number of linearly independent rows (see (5) below).

A rank-1 matrix has the form xy^*, where x and y are nonzero vectors. Every column is a multiple of x and every row is a multiple of y^*. A sum of k rank-1 matrices has the form

\notag    A = \displaystyle\sum_{i=1}^{k} x_iy_i^*      =  \begin{bmatrix} x_1 & x_2 & \dots & x_k \end{bmatrix}         \begin{bmatrix} y_1^* \\ y_2^* \\ \vdots\\ y_k^* \end{bmatrix}      \equiv XY^*.         \qquad (0)

Each column of A is a linear combination of the vectors x_1, x_2, …, x_k, so A has at most k linearly independent columns, that is, A has rank at most k. In fact, \mathrm{rank}(A) = k if X and Y have rank k, as follows from (4) below. Any rank-k matrix can be written in the form (0) with X and Y of rank k; indeed this is the full-rank factorization below.

Here are some fundamental rank equalities and inequalities.

Rank-Nullity Theorem

The rank-nullity theorem says that

\notag    \boxed{ \mathrm{rank}(A) +  \mathrm{dim}( \mathrm{null}(A) ) = n,     \quad A\in\mathbb{R}^{m\times n},}

where \mathrm{null}(A) = \{\, x \in\mathbb{R}^n: Ax = 0 \,\} is the null space of A.

Rank Bound

The rank cannot exceed the number of columns, or, by (5) below, the number of rows:

\notag    \boxed{ \mathrm{rank}(A) \le \min(m,n), \quad A\in\mathbb{C}^{m\times n}. }

Rank of a Sum

For any A and B of the same dimension,

\notag     \boxed{|\mathrm{rank}(A) - \mathrm{rank}(B)| \le    \mathrm{rank}(A+B) \le \mathrm{rank}(A) + \mathrm{rank}(B).} \qquad (1)

The upper bound follows from the fact that the dimension of the sum of two subspaces cannot exceed the sum of the dimensions of the subspaces. Interestingly, the upper bound is also a corollary of the bound (3) for the rank of a matrix product, because

\notag   \begin{aligned}    \mathrm{rank}(A+B) &= \mathrm{rank}\biggl(  \begin{bmatrix} A & B  \end{bmatrix}  \begin{bmatrix} I \\ I  \end{bmatrix} \biggr)\\  &\le  \min\biggl(\mathrm{rank}\bigl(\begin{bmatrix} A & B  \end{bmatrix}\bigr),  \mathrm{rank}\biggl(\begin{bmatrix} I \\ I  \end{bmatrix} \biggr)\biggr)\\  &\le \mathrm{rank}\bigl(\begin{bmatrix} A & B  \end{bmatrix}\bigr)\\   &\le \mathrm{rank}(A) + \mathrm{rank}(B).   \end{aligned}

For the lower bound, writing A = -B + A+B and applying the upper bound gives \mathrm{rank}(A) \le \mathrm{rank}(-B) + \mathrm{rank}(A+B) = \mathrm{rank}(B) + \mathrm{rank}(A+B), and likewise with the roles of A and B interchanged.

Rank of A and A^*A

For any A,

\notag    \boxed{\mathrm{rank}(A^*A) = \mathrm{rank}(A).}  \qquad (2)

Indeed Ax = 0 implies A^*Ax = 0, and A^*Ax = 0 implies 0 = x^*A^*Ax = (Ax)^*(Ax), which implies Ax = 0. Hence the null spaces of A and A^*A are the same. The equality (2) follows from the rank-nullity theorem.

Rank of a General Product

For any A and B for which the product AB is defined,

\notag   \boxed{\mathrm{rank}(AB) \le \min\bigl( \mathrm{rank}(A), \mathrm{rank}(B) \bigr).}     \qquad (3)

If B = [b_1,\dots,b_n] then AB = [Ab_1,\dots,Ab_n], so the columns of AB are linear combinations of those of A and so AB cannot have more linearly independent columns than A, that is, \mathrm{rank}(AB) \le \mathrm{rank}(A). Using (5) below, we then have

\notag  \mathrm{rank}(AB) = \mathrm{rank}(B^*A^*)   \le \mathrm{rank}(B^*) = \mathrm{rank}(B).

The latter inequality can be proved without using (5) (our proof of which uses (3)), as follows. Suppose \mathrm{rank}(B) < \mathrm{rank}(AB) = r. Let the columns of Y span \mathrm{range}(AB), so that Y has r columns and Y = ABZ for some matrix Z with r columns. Now \mathrm{rank}(BZ) \le \mathrm{rank}(B) < r by the first part, so BZg = 0 for some nonzero g. But then Yg = ABZg = 0, which contradicts the linear independence of the columns of Y, so we must have \mathrm{rank}(B) \ge \mathrm{rank}(AB).

Rank of a Product of Full-Rank Matrices

We have

\notag    \boxed{  \mathrm{rank}(AB) = r, \quad     A\in\mathbb{C}^{m\times r}, \;     B\in\mathbb{C}^{r\times n}, \;     \mathrm{rank}(A) = \mathrm{rank}(B) = r .}     \qquad (4)

We note that A^*A and BB^* are both nonsingular r\times r matrices by (2), so their product has rank r. Using (3),

\notag   r = \mathrm{rank}(A^*A BB^*)     \le \mathrm{rank}(A B) \le r,

and hence \mathrm{rank}(A B) = r.

Another important relation is

\notag   \boxed{ \mathrm{rank}(XAY ) = \mathrm{rank}(A), \quad    X\in\mathbb{C}^{m\times m} \;\mathrm{and}\;    Y\in\mathbb{C}^{n\times n}\; \mathrm{nonsingular}. }

This is a consequence of the equality \mathrm{range}(XAY) = X\mathrm{range}(A)Y for nonsingular X and Y.

Ranks of A and A^*

By (2) and (3) we have \mathrm{rank}(A) = \mathrm{rank}(A^*A) \le \mathrm{rank}(A^*). Interchanging the roles of A and A^* gives \mathrm{rank}(A^*) \le \mathrm{rank}(A) and so

\notag   \boxed{ \mathrm{rank}(A^*) = \mathrm{rank}(A). } \qquad (5)

In other words, the rank of A is equal to the maximum number of linearly independent rows as well as the maximum number of linearly independent columns.

Full-Rank Factorization

A\in\mathbb{C}^{m \times n} has rank r if and only if A = GH for some G\in\mathbb{C}^{m \times r} and H\in\mathbb{C}^{r \times n}, both of rank r, and this is called a full-rank factorization. The existence of such a factorization implies that \mathrm{rank}(A) = r by (4). Conversely, suppose that A has rank r. Let the columns of X\in\mathbb{C}^{m \times r} form a basis for the range space of A. Then there are r-vectors y_j such that a_j = Xy_j, j = 1\colon n, and with Y = [y_1,y_2,\dots, y_n] we have A = XY. Finally, r = \mathrm{rank}(A)     = \mathrm{rank}(XY)     \le \mathrm{rank}(Y) by (3), and since \mathrm{rank}(Y) \le r we have \mathrm{rank}(Y) = r.

Rank and Minors

A characterization of rank that is sometimes used as the definition is that it is the size of the largest nonsingular square submatrix. Equivalently, the rank is the size of the largest nonzero minor, where a minor of size k is the determinant of a k\times k submatrix.

rank(AB) and rank(BA)

Although AB and BA have some properties in common when both products are defined (notably they have the same nonzero eigenvalues), \mathrm{rank}(AB) is not always equal to \mathrm{rank}(BA). A simple example is A = x and B = y^* with x and y orthogonal vectors: AB = xy^* but BA = y^*x = 0. An example with square A and B is

\notag   \begin{gathered}   A = \begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix}, \quad   B = \begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix}, \\    \mathrm{rank}(AB) =    \mathrm{rank}\biggl( \begin{bmatrix} 0 & 0 \\ 0 & 0 \end{bmatrix}    \biggr) = 0, \quad    \mathrm{rank}(BA) =    \mathrm{rank}\biggl( \begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix}    \biggr) = 1.   \end{gathered}

Note that A = e_1e_2^T and B = e_1e_1^T, where e_i has 1 in the ith position and zeros everywhere else. Such matrices are easy to manipulate in this form (e.g., AB = e_1 (e_2^Te_1)e_1^T = 0) and are useful for constructing examples.

How to Find Rank

If we have a full-rank factorization of A then we can read off the rank from the dimensions of the factors. But finding a full-rank factorization is a nontrivial task. The ultimate full-rank factorization is the SVD

\notag     A = U\Sigma V^T,

where U\in\mathbb{R}^{m\times m} and V\in\mathbb{R}^{n\times n} are orthogonal, \Sigma = \mathrm{diag}(\sigma_1,\dots, \sigma_p)\in\mathbb{R}^{m\times n}, where p = \min(m,n), and \sigma_1\ge \sigma_2\ge \cdots \ge \sigma_r > 0 = \sigma_{r+1} = \cdots = \sigma_p = 0. The rank of A is r, the number of nonzero singular values.

In floating-point arithmetic, the standard algorithms for computing the SVD are numerically stable, that is, the computed singular values are the exact singular values of a matrix A + \Delta A with \|\Delta A\|_2 \le c_{m,n}u\|A\|, where c_{m,n} is a constant and u is the unit roundoff. Unfortunately, A + \Delta A will typically be full rank when A is rank deficient. For example, consider this computation.

>> n = 4; A = zeros(n); A(:) = 1:n^2, svd(A)
A =
     1     5     9    13
     2     6    10    14
     3     7    11    15
     4     8    12    16
ans =

The matrix has rank 2 and the two zero singular values are approximated by computed singular values of order 10^{-15}. In general, we have no way to know whether tiny computed singular values signify exactly zero singular values. In practice, one typically defines a numerical rank based on a threshold and regards computed singular values less than the threshold as zero. Indeed the MATLAB rank function computes the rank as the number of singular values exceeding 2u \max(m,n)\widehat{\sigma}_1, where \widehat{\sigma}_1 is the largest computed singular value. If the data from which the matrix is constructed is uncertain then the definition of numerical rank should take into account the level of uncertainty in the data. Dealing with rank deficiency in the presence of data errors and in finite precision arithmetic is a tricky business.


An excellent reference for further rank relations is Horn and Johnson. Stewart describes some of the issues associated with rank-deficient matrices in practical computation.

Diagonally Perturbing a Symmetric Matrix to Make It Positive Definite

Suppose A is a matrix that is symmetric but not positive definite. What is the best way to perturb the diagonal to make A positive definite? We want to compute a vector d such that

\notag     A(d) = A + D, \quad D = \mathrm{diag}(d),

is positive definite. Since the positive definite matrices form an open set there is no minimal d, so we relax the requirement to be that A is positive semidefinite. The perturbation D needs to make any negative eigenvalues of A become nonnegative. We will require all the entries of d to be nonnegative. Denote the eigenvalues of A by \lambda_n(A) \le \lambda_{n-1}(A) \le \cdots \le \lambda_1(A) and assume that \lambda_n(A) < 0.

A natural choice is to take D to be a multiple of the identity matrix. For d_i \equiv \delta, A(d) has eigenvalues \lambda_i + \delta, and so the smallest possible \delta is \delta = - \lambda_n(A). This choice of D shifts all the diagonal elements by the same amount, which might be undesirable for a matrix with widely varying diagonal elements.

When the diagonal entries of A are positive another possibility is to take d_i = \alpha a_{ii}, so that each diagonal entry undergoes a relative perturbation of size \alpha. Write D_A = \mathrm{diag}(a_{ii}) and note that C = D_A^{-1/2}A D_A^{-1/2} is symmetric with unit diagonal. Then

\notag     A + \alpha D_A = D_A^{1/2}(C + \alpha I)D_A^{1/2}.

Since A + \alpha D_A is positive semidefinite if and only if C + \alpha I is positive semidefinite, the smallest possible \alpha is \alpha = -\lambda_n(C).

More generally, we can treat the d_i as n independent variables and ask for the solution of the optimization problem

\notag    \min \|d\| ~~\mathrm{subject~to}~~ A + \mathrm{diag}(d)               ~\mathrm{positive~semidefinite}, ~d \ge 0.   \qquad(\dagger)

Of particular interest are the norms \|d\|_1 = \sum_i d_i = \mathrm{trace}(D) (since d\ge0) and \|d\|_\infty = \max_i d_i.

If A+D is positive semidefinite then from standard eigenvalue inequalities,

\notag       0 \le \lambda_n(A+D) \le  \lambda_n(A) +  \lambda_1(D),

so that

\notag       \max_i d_i \ge -\lambda_n(A).

Since d_i \equiv -\lambda_n(A) satisfies the constraints of (\dagger), this means that this d solves (\dagger) for the \infty-norm, though the solution is obviously not unique in general.

For the 1– and 2-norms, (\dagger) does not have an explicit solution, but it can be solved by semidefinite programming techniques.

Another approach to finding a suitable D is to compute a modified Cholesky factorization. Given a symmetric A, such a method computes a perturbation E such that A + E = R^TR for an upper triangular R with positive diagonal elements, so that A + E is positive definite. The methods of Gill, Murray, and Wright (1981) and Schnabel and Eskow (1990) compute a diagonal E. The cost in flops is essentially the same as that of computing a Cholesky factorization (n^3/3) flops), so this approach is likely to require fewer flops than computing the minimum eigenvalue or solving an optimization problem, but the perturbations produces will not be optimal.


We take the 5\times 5 Fiedler matrix

>> A = gallery('fiedler',5)
A =
     0     1     2     3     4
     1     0     1     2     3
     2     1     0     1     2
     3     2     1     0     1
     4     3     2     1     0

The smallest eigenvalue is -5.2361, so A+D is positive semidefinite for D_1 = 5.2361I. The Gill–Murray–Wright method gives D_{\mathrm{GMW}} with diagonal elements

2.0000e+01   4.6159e+00   1.3194e+00   2.5327e+00   1.0600e+01

and has \lambda_5(A+D_{\mathrm{GMW}}) = 0.5196 while the Schnabel–Eskow method gives D_{\mathrm{SE}} with diagonal elements

6     6     6     6     6

and has \lambda_5(A+D_{\mathrm{SE}}) = 0.7639. If we increase the diagonal elements of D_1 by 0.5 to give comparable smallest eigenvalues for the perturbed matrices then we have

  \Vert d\Vert_{\infty} \Vert d \Vert_1
Shift 5.2361 26.180
Gill-Murray-Wright 20.000 39.068
Schnabel–Eskow 6.000 30.000


When Does Thresholding Preserve Positive Definiteness?

Does a symmetric positive definite matrix remain positive definite when we set one or more elements to zero? This question arises in thresholding, in which elements of absolute value less than some tolerance are set to zero. Thresholding is used in some applications to remove correlations thought to be spurious, so that only statistically significant ones are retained.

We will focus on the case where just one element is changed and consider an arbitrary target value rather than zero. Given an n\times n symmetric positive definite matrix A we define A(t) to be the matrix resulting from adding t to the (i,j) and (j,i) elements and we ask when is A(t) positive definite. We can write

\notag   A(t) = A + t(e_i^{}e_j^T + e_j^{}e_i^T) \equiv A + tE_{ij},

where e_i is the ith column of the identity matrix. The perturbation E_{ij} has rank 2, with eigenvalues -1, 1, and 0 repeated n-2 times. Hence we can write E_{ij} in the form E_{ij} = pp^T - qq^T, where p^Tp = q^Tq = 1 and p^Tq = 0. Adding pp^T to A causes each eigenvalue to increase or stay the same, while subtracting qq^T decreases or leaves unchanged each eigenvalue. However, more is true: after each of these rank-1 perturbations the eigenvalues of the original and perturbed matrices interlace, by Weyl’s theorem. Hence, with the eigenvalues of A ordered as \lambda_n(A) \le \cdots \le \lambda_1(A), we have (Horn and Johnson, Cor. 4.3.7)

\notag \begin{aligned}   \lambda_n(A(t)) &\le \lambda_{n-1}(A), \\   \lambda_{i+1}(A) &\le \lambda_i(A(t)) \le \lambda_{i-1}(A),    \quad i = 2\colon n-1, \\   \lambda_2(A) &\le \lambda_1(A(t)). \end{aligned}

Because A is positive definite these inequalities imply that \lambda_{n-1}(A(t)) \ge \lambda_n(A) > 0, so A(t) has at most one negative eigenvalue. Since \det(A(t)) is the product of the eigenvalues of A(t) this means that A(t) is positive definite precisely when \det(A(t)) > 0.

There is a simple expression for \det(A(t)), which follows from a lemma of Chan (1984), as explained by Georgescu, Higham, and Peters (2018):

\notag  \det(A(t)) = \det(A)\big(1+ 2t b_{ij} + t^2(b_{ij}^2-b_{ii}b_{jj})\big),

where B = A^{-1}. Hence the condition for A(t) to be positive definite is

\notag  q_{ij}(t) = 1 + 2t b_{ij} + t^2(b_{ij}^2-b_{ii}b_{jj}) > 0.

We can factorize

\notag     q_{ij}(t) = \Bigl( t\bigl(b_{ij}  - \sqrt{b_{ii}b_{jj}}\bigr) + 1 \Bigr)                 \Bigl( t\bigl(b_{ij}  + \sqrt{b_{ii}b_{jj}}\bigr) + 1 \Bigr),

so q_{ij}(t) > 0 for

\notag    t\in \left( \displaystyle\frac{-1}{ \sqrt{b_{ii}b_{jj}} + b_{ij} },                 \displaystyle\frac{1}{ \sqrt{b_{ii}b_{jj}} - b_{ij} } \right) =: I_{ij},

where the endpoints are finite because B, like A, is positive definite and so |b_{ij}| < \sqrt{b_{ii}b_{jj}}.

The condition for A to remain positive definite when a_{ij} is set to zero is q_{ij}(-a_{ij}) > 0, or equivalently -a_{ij} \in I_{ij}. To check either of these conditions we need just b_{ij}, b_{ii}, and b_{jj}. These elements can be computed without computing the whole inverse by solving the equations Ab_k = e_k for k = i,j, for the kth column b_k of B, making use of a Cholesky factorization of A.

As an example, we consider the 4\times 4 Lehmer matrix, which has (i,j) element i/j for i \ge j:

\notag   A = \begin{bmatrix}         1           & \frac{1}{2}  & \frac{1}{3} & \frac{1}{4} \\[3pt]         \frac{1}{2} &           1  & \frac{2}{3} & \frac{1}{2} \\[3pt]         \frac{1}{3} &  \frac{2}{3} & 1           & \frac{3}{4} \\[3pt]         \frac{1}{4} &  \frac{1}{2} & \frac{3}{4} &  1         \end{bmatrix}.

The smallest eigenvalue of A is 0.208. Any off-diagonal element except the (2,4) element can be zeroed without destroying positive definiteness, and if the (2,4) element is zeroed then the new matrix has smallest eigenvalue -0.0249. For i=2 and j=4, the following plot shows in red \lambda_{\min}(A(t)) and in blue q_{24}(t); the black dots are the endpoints of the closure of the interval I_{24} = (-0.453,0.453) and the vertical black line is the value -a_{24}. Clearly, -a_{24} lies outside I_{24}, which is why zeroing this element causes a loss of positive definiteness. Note that I_{24} also tells us that we can increase a_{24} to any number less than 0.953 without losing definiteness.


Given a positive definite matrix and a set S of elements to be modified we may wish to determine subsets (including a maximal subset) of S for which the modifications preserve definiteness. Efficiently determining these subsets appears to be an open problem.

In practical applications thresholding may lead to an indefinite matrix. Definiteness must then be restored to obtain a valid correlation matrix. One way to do this is to find the nearest correlation matrix in the Frobenius norm such that the zeroed elements remain zero. This can be done by the alternating projections method with a projection to keep the zeroed elements fixed. Since the nearest correlation matrix is positive semidefinite, it is also desirable to to incorporate a lower bound \delta > 0 on the smallest eigenvalue, which corresponds to another projection. Both these projections are supported in the algorithm of Higham and Strabić (2016), implemented in the code at https://github.com/higham/anderson-accel-ncm. For the Lehmer matrix, the nearest correlation matrix with zero (2,4) element and eigenvalues at least \delta = 0.01 is (to four significant figures)

\notag   \begin{bmatrix}    1       &    0.4946  &    0.3403  &    0.2445  \\    0.4946  &    1       &    0.6439  &    0       \\    0.3403  &    0.6439  &    1       &    0.7266  \\    0.2445  &    0       &    0.7266  &    1    \end{bmatrix}.

A related question is for what patterns of elements that are set to zero is positive definiteness guaranteed to be preserved for all positive definite A? Clearly, setting all the off-diagonal elements to zero preserves definiteness, since the diagonal of a positive definite matrix is positive. Guillot and Rajaratnam (2012) show that the answer to the question is that the new matrix must be a symmetric permutation of a block diagonal matrix. However, for particular A this restriction does not necessarily hold, as the Lehmer matrix example shows.


What Is a Unitarily Invariant Norm?

A norm on \mathbb{C}^{m \times n} is unitarily invariant if \|UAV\| = \|A\| for all unitary U\in\mathbb{C}^{m \times m} and V\in\mathbb{C}^{n\times n} and for all A\in\mathbb{C}^{m \times n}. One can restrict the definition to real matrices, though the term unitarily invariant is still typically used.

Two widely used matrix norms are unitarily invariant: the 2-norm and the Frobenius norm. The unitary invariance follows from the definitions. For the 2-norm, for any unitary U and V, using the fact that \|Uz\|_2 = \|z\|_2, we obtain

\notag \begin{aligned}    \|UAV\|_2 &= \max_{x\ne0} \frac{\|UAVx\|_2}{\|x\|_2}              = \max_{x\ne0} \frac{\|AVx\|_2}{\|x\|_2}\\              &= \max_{x\ne0} \frac{\|Ay\|_2}{\|V^*y\|_2} \quad(y = Vx)\\              &= \max_{y\ne0} \frac{\|Ay\|_2}{\|y\|_2}  = \|A\|_2. \end{aligned}

For the Frobenius norm, using \|A\|_F^2 = \mathrm{trace}(A^*A),

\notag \begin{aligned}    \|UAV\|_F^2 &= \mathrm{trace}(V^*A^*U^* \cdot UAV)\\                &= \mathrm{trace}(V^*AAV)\\                &= \mathrm{trace}(A^*A) = \|A\|_F^2, \end{aligned}

since the trace is invariant under similarity transformations.

More insight into unitarily invariant norms comes from recognizing a connection with the singular value decomposition

\notag   A = P\Sigma Q^*, \quad P^*P = I_m, \quad Q^*Q = I_n, \quad                 \Sigma = \mathrm{diag}(\sigma_i), \quad     \sigma_1 \ge \cdots \ge \sigma_q \ge 0.

Clearly, \|A\| = \|\Sigma\|, so \|A\| depends only on the singular values. Indeed, for the 2-norm and the Frobenius norm we have \|A\|_2 = \sigma_1 and \|A\|_F = (\sum_{i=1}^q \sigma_i^2)^{1/2}. Here, and throughout this article, q = \min(m,n). Another implication of the singular value dependence is that \|A\| = \|A^*\| for all A for any unitarily invariant norm.

There is a beautiful characterization of unitarily invariant norms in terms of symmetric gauge functions, which are functions f:\mathbb{R}^q\to\mathbb{R}^q such that f is an absolute norm on \mathbb{R}^q and f(Px) = f(x) for any permutation matrix P and all x. An absolute norm is one with the property that \|x\| = \||x|\| for all x, and this condition is equivalent to the monotonicity condition that |x| \le |y| implies \|x\| \le \|y\| for all x and y.


A norm on \mathbb{C}^{m \times n} is unitarily invariant if and only if \|A\| = f(\sigma_1, \dots, \sigma_q) for all A for some symmetric gauge function f, where \sigma_1, \dots, \sigma_q are the singular values of A.

The matrix 2-norm and the Frobenius norm correspond to f being the vector \infty-norm and the 2-norm, respectively. More generally, we can take for f any vector p-norm, obtaining the class of Schatten p-norms:

\notag   \|A\|_p^S  = \biggl(\displaystyle\sum_{i=1}^q \sigma_i^p\biggr)^{1/p}, \quad 1\le    p\le\infty.

The Schatten 1-norm is the sum of the singular values, \|A\|_1^S = \sum_{i=1}^q \sigma_i, which is called the trace norm or nuclear norm. It can act as a proxy for the rank of a matrix. The trace norm can be expressed as \|A\|_1^S = \mathrm{trace}(H)            = \mathrm{trace}\bigl((A^*A)^{1/2}\bigr), where A = UH is a polar decomposition.

Another class of unitarily invariant norms is the Ky Fan k-norms

\notag   \|A\|_{(k)} = \displaystyle\sum_{i=1}^k \sigma_i, \quad 1\le k \le n.

We have \|A\|_{(n)} = \|A\|_1^S and \|A\|_{(1)} = \|A\|_2.

Among unitarily invariant norms, the 2-norm and the Frobenius norm are widely usd in numerical analysis and matrix analysis. The nuclear norm is used in problems involving matrix rank minimization, such as matrix completion problems.

The benefit of the concept of unitarily invariant norm is that one can prove certain results for this whole class of norms, obtaining results for the particular norms of interest as special cases. Here are three important examples.

  • For A\in\mathbb{C}^{n\times n}, the matrix (A+A^*)/2 is the nearest Hermitian matrix to A in any unitarily invariant norm.
  • For A\in\mathbb{C}^{m\times n} (m\ge n), the unitary polar factor is the nearest matrix with orthonormal columns to A in any unitarily invariant norm.
  • The best rank-k approximation to A\in\mathbb{C}^{m\times n} in any unitarily invariant norm is obtained by setting all the singular values beyond the kth to zero in the SVD of A. This result, which generalizes the Eckart-Young theorem, which covers the 2- and Frobenius norm instances, is an easy consequence of the following result of Mirsky (1960).


    Let A,B\in\mathbb{C}^{m\times n} have SVDs with diagonal matrices \Sigma_A, \Sigma_B \in\mathbb{R}^{m\times n}, where the diagonal elements are arranged in nonincreasing order. Then \|A-B\| \ge \|\Sigma_A-\Sigma_B\| for every unitarily invariant norm.

We also give a useful matrix norm inequality. For any matrices A, B, and C for which the product ABC is defined,

\notag     \|ABC\| \le \|A\|_2 \mskip2mu \|B\| \mskip2mu \|C\|_2

holds for any unitarily invariant norm, and in fact, any two of the norms on the right-hand side can be 2-norms.


This is a minimal set of references, which contain further useful references within.

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.