What Is an LU Factorization?

An LU factorization of an n\times n matrix A is a factorization A = LU, where L is unit lower triangular and U is upper triangular. “Unit” means that L has ones on the diagonal. Example:

\notag    \left[\begin{array}{rrrr}      3 & -1 & 1 & 1\\     -1 & 3 & 1 & -1\\     -1 & -1 & 3 & 1\\      1 & 1 & 1 & 3    \end{array}\right]   =    \left[\begin{array}{rrrr}     1 & 0 & 0 & 0\\    -\frac{1}{3} & 1 & 0 & 0\\    -\frac{1}{3} & -\frac{1}{2} & 1 & 0\\     \frac{1}{3} & \frac{1}{2} & 0 & 1    \end{array}\right]    \left[\begin{array}{rrrr}    3 & -1 & 1 & 1\\    0 & \frac{8}{3} & \frac{4}{3} & -\frac{2}{3}\\    0 & 0 & 4 & 1\\    0 & 0 & 0 & 3    \end{array}\right]. \qquad (1)

An LU factorization simplifies the solution of many problems associated with linear systems. In particular, solving a linear system Ax = b reduces to solving the triangular systems Ly = b and Ux = y, since then b = L(Ux).

For a given A, an LU factorization may or may not exist, and if it does exist it may not be unique. Conditions for existence and uniqueness are given in the following result (see Higham, 2002, Thm. 9.1 for a proof). Denote by A_k = A(1\colon k,1\colon k) the leading principal submatrix of A of order k.

Theorem 1. The matrix A\in\mathbb{R}^{n\times n} has a unique LU factorization if and only if A_k is nonsingular for k=1\colon n-1. If A_k is singular for some 1\le k \le n-1 then the factorization may exist, but if so it is not unique.

Note that the (non)singularity of A plays no role in Theorem 1. However, if A is nonsingular and has an LU factorization then the factorization is unique. Indeed if A has LU factorizations A = L_1U_1 = L_2U_2 then the U_i are necessarily nonsingular and so L_2^{-1}L_1 = U_2U_1^{-1}. The left side of this equation is unit lower triangular and the right side is upper triangular; therefore both sides must equal the identity matrix, which means that L_1 = L_2 and U_1 = U_2, as required.

Equating leading principal submatrices in A = LU gives A_k = L_k U_k, which implies that \det(A_k) = \det(U_k) = u_{11} u_{22} \dots u_{kk}. Hence u_{kk} = \det(A_k)/\det(A_{k-1}). In fact, such determinantal formulas hold for all the elements of L and U:

\notag    \begin{aligned}    \ell_{ij} &= \frac{ \det\bigl( A( [1:j-1, \, i], 1:j ) \bigr) }{ \det( A_j ) },             \quad i > j, \\    u_{ij} &= \frac{ \det\bigl( A( 1:i, [1:i-1, \, j] ) \bigr) }                   { \det( A_{i-1} ) },             \quad i \le j.    \end{aligned}

Here, A(u,v), where u and v are vectors of subscripts, denotes the submatrix formed from the intersection of the rows indexed by u and the columns indexed by v.

Relation with Gaussian Elimination

LU factorization is intimately connected with Gaussian elimination. Recall that Gaussian elimination transforms a matrix A^{(1)} = A\in\mathbb{R}^{n\times n} to upper triangular form U = A^{(n)} in n-1 stages. At the kth stage, multiples of row k are added to the later rows to eliminate the elements below the diagonal in column k, using the formulas

\notag     a_{ij}^{(k+1)} = a_{ij}^{(k)} - m_{ik} a_{kj}^{(k)}, \quad                        i = k+1 \colon n, \; j = k+1 \colon n,

where the quantities m_{ik} = a_{ik}^{(k)} / a_{kk}^{(k)} are the multipliers. Of course each a_{kk}^{(k)} must be nonzero for these formulas to be defined, and this is connected with the conditions of Theorem 1, since u_{kk} = a_{kk}^{(k)}. The final U is the upper triangular LU factor, with u_{ij} = a_{ij}^{(i)} for j\ge i, and \ell_{ij} = m_{ij} for i > j, that is, the multipliers make up the L factor (for a proof of these properties see any textbook on numerical linear algebra).

The matrix factorization viewpoint is well established as a powerful paradigm for thinking and computing. Separating the computation of LU factorization from its application is beneficial. For example, given A = LU we saw above how to solve Ax = b. If we need to solve for another right-hand side b_2 we can just solve Ly_2 = b_2 and Ux_2 = y_2, re-using the LU factorization. Similarly, solving A^Tz = c reduces to solving the triangular systems U^T w = c and L^Tz = w.


An LU factorization can be computed by directly solving for the components of L and U in the equation A = LU. Indeed because L has unit diagonal the first row of U is the same as the first row of A, and a_{k1} = \ell_{k1} u_{11} = \ell_{k1} a_{11} then determines the first column of L. One can go on to determine the kth row of U and the kth row of L, for k = 2\colon n. This leads to the Doolittle method, which involves inner products of partial rows of L and partial columns of U.

Given the equivalence between LU factorization and Gaussian elimination we can also employ the Gaussian elimination equations:

\notag \begin{array}{l} \%~kji~\mathrm{form~of~LU~factorization.}\\ \mbox{for}~k=1:n-1  \\ \qquad \mbox{for}~  j=k+1:n \\ \qquad \qquad \mbox{for}~  i=k+1:n \\ \qquad\qquad\qquad a_{ij}^{(k+1)} = a_{ij}^{(k)} - a_{ik}^{(k)}a_{kj}^{(k)} / a_{kk}^{(k)}\\ \qquad\qquad\mbox{end}\\ \qquad\mbox{end}\\ \mbox{end}\\ \end{array}

This kji ordering of the loops in the factorization is the basis of early Fortran implementations of LU factorization, such as that in LINPACK. The inner loop travels down the columns of A, accessing contiguous elements of A since Fortran stores arrays by column. Interchanging the two inner loops gives the kij ordering, which updates the matrix a row at a time, and is appropriate for a language such as C that stores arrays by row.

The ijk and jik orderings correspond to the Doolittle method. The last two of the 3! = 6 orderings are the ikj and jki orderings, to which we will return later.

Schur Complements

For A\in\mathbb{R}^{n\times n} with \alpha = a_{11} \ne 0 we can write

\notag   A =  \begin{bmatrix}         \alpha & b^T \\           c & D        \end{bmatrix}    =        \begin{bmatrix}         1 & 0 \\           c/\alpha & I_{n-1}        \end{bmatrix}       \begin{bmatrix}         \alpha  & b^T \\          0 & D - cb^T/\alpha        \end{bmatrix} = : L_1U_1. \qquad (2)

The (n-1)\times (n-1) matrix S = D - cb^T/\alpha is called the Schur complement of \alpha in A.

The first row and column of L_1 and U_1 have the correct forms for a unit lower triangular matrix and an upper triangular matrix, respectively. If we can find an LU factorization S = L_2U_2 then

\notag      A =        \begin{bmatrix}         1 & 0 \\           c/\alpha & L_2        \end{bmatrix}       \begin{bmatrix}         \alpha  & b^T \\          0 & U_2        \end{bmatrix}

is an LU factorization of A. Note that this is simply another way to express the kji algorithm above.

For several matrix structures it is immediate that \alpha \ne 0. If we can show that the Schur complement inherits the same structure then it follows by induction that we can compute the factorization for S, and so an LU factorization of A exists. Classes of matrix for which a_{11} \ne 0 and A being in the class implies the Schur complement S is also in the class include

  • symmetric positive definite matrices,
  • M-matrices,
  • matrices (block) diagonally dominant by rows or columns.

(The proofs of these properties are nontrivial.) Note that the matrix (1) is row diagonally dominant, as is its U factor, as must be the case since its rows are contained in Schur complements.

We say that A has upper bandwidth q if a_{ij} = 0 for j>i+q and lower bandwidth p if a_{ij} = 0 for i>j+p. Another use of (2) is to show that L and U inherit the bandwidths of A.

Theorem 2. Let A\in\mathbb{R}^{n\times n} have lower bandwidth p and upper bandwidth q. If A has an LU factorization then L has lower bandwidth p and U has upper bandwidth q.

Proof. In (2), the first column of L_1 and the first row of U_1 have the required structure and S has upper bandwidth q and lower bandwidth p, since c and b have only p and q nonzero components, respectively. The result follows by induction.

Block Implementations

In order to achieve high performance on modern computers with their hierarchical memories, LU factorization is implemented in a block form expressed in terms of matrix multiplication and the solution of multiple right-hand side triangular systems. We describe two block forms of LU factorization. First, consider a block form of (2) with block size p, where A_{11} is p \times p:

\notag   A =  \begin{bmatrix}          A_{11} & A_{12}\\          A_{21} & A_{22}        \end{bmatrix}    =        \begin{bmatrix}         L_{11} & 0 \\           L_{21}& I_{n-p}        \end{bmatrix}       \begin{bmatrix}         U_{11} & U_{12} \\          0 & S        \end{bmatrix}.

Here, S is the Schur complement of A_{11} in A, given by S = A_{22} - A_{21}A_{11}^{-1}A_{12}. This leads to the following algorithm:

  1. Factor A_{11} = L_{11}U_{11}.
  2. Solve L_{11}U_{12} = A_{12} for U_{12}.
  3. Solve L_{21}U_{11} = A_{21} for L_{21}.
  4. Form S = A_{22}-L_{21}U_{12}.
  5. Repeat steps 1–4 on S to obtain S = L_{22}U_{22}.

The factorization on step 1 can be done by any form of LU factorization. This algorithm is known as a right-looking algorithm, since it accesses data to the right of the block being worked on (in particular, at each stage lines 2 and 4 access the last few columns of the matrix).

An alternative algorithm can derived by considering a block 3\times 3 partitioning, in which we assume we have already computed the first block column of L and U:

\notag   A =  \begin{bmatrix}          A_{11} & A_{12} & A_{13}\\          A_{21} & A_{22} & A_{23}\\          A_{31} & A_{32} & A_{33}        \end{bmatrix}    =        \begin{bmatrix}         L_{11} & 0     & 0 \\         L_{21} & L_{22}& 0 \\         L_{31} & L_{32}  & I        \end{bmatrix}       \begin{bmatrix}         U_{11} & U_{12} & \times \\              0 & U_{22} & \times \\              0 &  0     & \times        \end{bmatrix}.

We now compute the middle block column of L and U, comprising p columns, by

  1. Solve L_{11}U_{12} = A_{12} for U_{12}.
  2. Factor A_{22}-L_{21}U_{12} = L_{22}U_{22}.
  3. Solve L_{32}U_{22} = A_{32} - L_{31}U_{12} for L_{32}.
  4. Repartition so that the first two block columns become a single block column and repeat steps 1–4.

This algorithm corresponds to the jki ordering. Note that the Schur complement is updated only a block column at a time. Because the algorithm accesses data only to the left of the block column being worked on, it is known as a left-looking algorithm.

Our description of these block algorithms emphasizes the mathematical ideas. The implementation details, especially for the left-looking algorithm, are not trivial. The optimal choice of block size p will depend on the machine, but p is typically in the range 64512.

An important point is that all these different forms of LU factorization, no matter which ijk ordering or which value of p, carry out the same operations. The only difference is the order in which the operations are performed (and the order in which the data is accessed). Even the rounding errors are the same for all versions (assuming the use of “plain vanilla” floating-point arithmetic).

Rectangular Matrices

Although it is most commonly used for square matrices, LU factorization is defined for rectangular matrices, too. If A\in\mathbb{R}^{m\times n} then the factorization has the form A = LU with L\in\mathbb{R}^{m\times m} lower triangular and U\in\mathbb{R}^{m\times n} upper trapezoidal. The conditions for existence and uniqueness of an LU factorization of A are the same as those for A(1\colon p, 1\colon p), where p = \min(m,n).

Block LU Factorization

Another form of LU factorization relaxes the structure of L and U from triangular to block triangular, with L having identity matrices on the diagonal:

\notag  L = \begin{bmatrix} I     &        &            &    \\                     L_{21} &  I     &            &    \\                     \vdots &        &  \ddots    &    \\                     L_{m1} &  \dots &  L_{m,m-1} &  I \end{bmatrix}, \quad   U = \begin{bmatrix} U_{11} & U_{12} & \dots      &  U_{1m}   \\                            & U_{22} &            &  \vdots   \\                            &        &  \ddots    & U_{m-1,m} \\                            &        &            & U_{mm}    \end{bmatrix}.

Note that U is not, in general, upper triangular.

An example of a block LU factorization is

\notag     A =      \left[ \begin{array}{rr|rr}      0  &  1  &  1  &  1  \\     -1  &  1  &  1  &  1  \\\hline     -2  &  3  &  4  &  2  \\     -1  &  2  &  1  &  3  \\             \end{array}      \right]      =      \left[ \begin{array}{cc|cc}      1  &  0  &  0  &  0  \\      0  &  1  &  0  &  0  \\\hline      1  &  2  &  1  &  0  \\      1  &  1  &  0  &  1  \\             \end{array}      \right]      \left[ \begin{array}{rr|rr}      0  &  1  &  1  &  1  \\     -1  &  1  &  1  &  1  \\\hline      0  &  0  &  1  & -1  \\      0  &  0  & -1  &  1  \\             \end{array}      \right].

LU factorization fails on A because of the zero (1,1) pivot. This block LU factorization corresponds to using the leading 2\times 2 principal submatrix of A to eliminate the elements in the (3\colon 4,1\colon 2) submatrix. In the context of a linear system Ax=b, we have effectively solved for the variables x_1 and x_2 in terms of x_3 and x_4 and then substituted for x_1 and x_2 in the last two equations.

Conditions for the existence of a block LU factorization are analogous to, but less stringent than, those for LU factorization in Theorem 1.

Theorem 3. The matrix A\in\mathbb{R}^{n\times n} has a unique block LU factorization if and only if the first m-1 leading principal block submatrices of A are nonsingular.

The conditions in Theorem 3 can be shown to be satisfied if A is block diagonally dominant by rows or columns.

Note that to solve a linear system Ax = b using a block LU factorization we need to solve Ly = b and Ux = y, but the latter system is not triangular and requires the solution of systems U_{ii}x_i = y_i involving the diagonal blocks of U, which would normally be done by (standard) LU factorization.


If A has a unique LU factorization then for a small enough perturbation \Delta A an LU factorization A + \Delta A = (L + \Delta L)(U + \Delta U) exists. To first order, this equation is \Delta A = \Delta L U + L \Delta U, which gives

\notag   L^{-1}\Delta A \mskip2mu U^{-1} = L^{-1}\Delta L  + \Delta U \mskip2mu  U^{-1}.

Since \Delta L is strictly lower triangular and \Delta U is upper triangular, we have, to first order,

\notag         \Delta L = L \mskip 1mu \mathrm{tril}( L^{-1}\Delta A U^{-1} ), \quad         \Delta U = \mathrm{triu}( L^{-1}\Delta A U^{-1} )U,

where \mathrm{tril} denotes the strictly lower triangular part and \mathrm{triu} the strictly upper triangular part. Clearly, the sensitivity of the LU factors depends on the inverses of L and U. However, in most situations, such as when we are solving a linear system Ax = b, it is the backward stability of the LU factors, not their sensitivity, that is relevant.

Pivoting and Numerical Stability

Since not all matrices have an LU factorization, we need the option of applying row and column interchanges to ensure that the pivots are nonzero unless the column in question is already in triangular form.

In finite precision computation it is important that computed LU factors \widehat L and \widehat U are numerically stable in the sense that \widehat L \widehat U = A + \Delta A with \|\Delta A\|\le c_n u \|A\|, where c_n is a constant and u is the unit roundoff. For certain matrix properties, such as diagonal dominance by rows or columns, numerical stability is guaranteed, but in general it is necessary to incorporate row interchanges, or row or column interchanges, in order to obtain a stable factorization.

See What Is the Growth Factor for Gaussian Elimination? for details of pivoting strategies and see Randsvd Matrices with Large Growth Factors for some recent research on growth factors.


This is a minimal set of references, which contain further useful references within.

Related Blog Posts

Fifty “What Is” Articles

Photo by Elisa on Unsplash

Last week I posted the fiftieth in my “What Is” series of articles. I began the series just over a year ago, in March 2020. The original aim was to provide “brief descriptions of important concepts in numerical analysis and related areas, with a focus on topics that arise in my research”, and the articles were meant to be short, widely accessible, and contain a minimum of mathematical symbols, equations, and citations. I have largely kept to these aims, though for some topics there is a lot to say and I have been more lengthy.

The articles are also available in PDF form on GitHub.

Below is a list of all the “What Is” articles published at the time of writing, in alphabetical order.

If there is a topic you would like me to cover, please put it in the comments below.

  1. What Is a Block Matrix?
  2. What Is a Cholesky Factorization?
  3. What Is a Companion Matrix?
  4. What Is a Condition Number?
  5. What Is a Correlation Matrix?
  6. What is a Diagonally Dominant Matrix?
  7. What Is a Fractional Matrix Power?
  8. What Is a Fréchet Derivative?
  9. What Is a Generalized Inverse?
  10. What Is a Hadamard Matrix?
  11. What Is a Householder Matrix?
  12. What Is a Matrix Function?
  13. What Is a Matrix Square Root?
  14. What Is a Matrix?
  15. What Is a Modified Cholesky Factorization?
  16. What Is a (Non)normal Matrix?
  17. What Is a QR Factorization?
  18. What Is a Random Orthogonal Matrix?
  19. What is a Sparse Matrix?
  20. What Is a Symmetric Positive Definite Matrix?
  21. What Is a Unitarily Invariant Norm?
  22. What Is an M-Matrix?
  23. What Is an Orthogonal Matrix?
  24. What Is Backward Error?
  25. What Is Bfloat16 Arithmetic?
  26. What Is Floating-Point Arithmetic?
  27. What Is IEEE Standard Arithmetic?
  28. What is Numerical Stability?
  29. What Is Rounding?
  30. What Is Stochastic Rounding?
  31. What Is the Adjugate of a Matrix?
  32. What is the Cayley–Hamilton Theorem?
  33. What Is the Complex Step Approximation?
  34. What Is the CS Decomposition?
  35. What Is the Gerstenhaber Problem?
  36. What Is the Growth Factor for Gaussian Elimination?
  37. What Is the Hilbert Matrix?
  38. What is the Kronecker Product?
  39. What Is the Log-Sum-Exp Function?
  40. What Is the Matrix Exponential?
  41. What Is the Matrix Logarithm?
  42. What Is the Matrix Sign Function?
  43. What Is the Matrix Unwinding Function?
  44. What Is the Nearest Positive Semidefinite Matrix?
  45. What Is the Nearest Symmetric Matrix?
  46. What is the Polar Decomposition?
  47. What Is the Sherman–Morrison–Woodbury Formula?
  48. What Is the Singular Value Decomposition?
  49. What Is the Softmax Function?
  50. What Is the Sylvester Equation?

What is a Diagonally Dominant Matrix?

Matrices arising in applications often have diagonal elements that are large relative to the off-diagonal elements. In the context of a linear system this corresponds to relatively weak interactions between the different unknowns. We might expect a matrix with a large diagonal to be assured of certain properties, such as nonsingularity. However, to ensure nonsingularity it is not enough for each diagonal element to be the largest in its row. For example, the matrix

\notag  \left[\begin{array}{rrr}     3 & -1 & -2\\    -2 &  3 & -1\\    -2 & -1 &  3   \end{array}\right] \qquad (1)

is singular because [1~1~1]^T is a null vector. A useful definition of a matrix with large diagonal requires a stronger property.

A matrix A\in\mathbb{C}^{n\times n} is diagonally dominant by rows if

\notag        |a_{ii}| \ge \displaystyle\sum_{j\ne i} |a_{ij}|, \quad i=1\colon n. \qquad (2)

It is strictly diagonally dominant by rows if strict inequality holds in (2) for all i. A is (strictly) diagonally dominant by columns if A^T is (strictly) diagonally dominant by rows.

Diagonal dominance on its own is not enough to ensure nonsingularity, as the matrix (1) shows. Strict diagonal dominance does imply nonsingularity, however.

Theorem 1.

If A\in\mathbb{C}^{n\times n} is strictly diagonally dominant by rows or columns then it is nonsingular.

Proof. Since A is nonsingular if and only if A^T is nonsingular, it suffices to consider diagonal dominance by rows. For any nonzero x let y = Ax and choose k so that |x_k| = \|x\|_{\infty}. Then the kth equation of y = Ax can be written

\notag    a_{kk}x_k = y_k - \displaystyle\sum_{j\ne k} a_{kj}x_j,

which gives

\notag    |a_{kk}|\|x\|_{\infty} = |a_{kk}||x_k|     \le |y_k| + \displaystyle\sum_{j\ne k} |a_{kj}||x_j|     \le |y_k| + \|x\|_\infty \displaystyle\sum_{j\ne k} |a_{kj}|.

Using (2), we have

\notag    |y_k| \ge \|x\|_{\infty} \Bigl(|a_{kk}| - \displaystyle\sum_{j\ne k} |a_{kj}|\Bigr) > 0.    \qquad (3)

Therefore y\ne0 and so A is nonsingular. ~\square

Diagonal dominance plus two further conditions is enough to ensure nonsingularity. We need the notion of irreducibility. A matrix A\in\mathbb{R}^{n\times n} is irreducible if there does not exist a permutation matrix P such that

\notag       P^TAP = \begin{bmatrix} A_{11} & A_{12} \\                                  0   & A_{22} \end{bmatrix}

with A_{11} and A_{22} square matrices. Irreducibility is equivalent to the directed graph of A being strongly connected.

Theorem 2.

If A\in\mathbb{C}^{n\times n} is irreducible and diagonally dominant by rows with strict inequality in (2) for some i then it is nonsingular.

Proof. The proof is by contradiction. Suppose there exists x\ne 0 such that Ax = 0. Define

\notag   G = \{\, j: |x_j| = \|x\|_{\infty} \,\},   \quad   H = \{\, j: |x_j| < \|x\|_{\infty} \,\}.

The ith equation of Ax = 0 can be written

\notag       a_{ii}x_i = - \displaystyle\sum_{j\ne i} a_{ij}x_j                 = - \displaystyle\sum_{j\in G \atop j\ne i } a_{ij}x_j                   - \displaystyle\sum_{j\in H \atop j\ne i } a_{ij}x_j. \qquad (4)

Hence for i = r\in G,

\notag |a_{rr}| \le \displaystyle\sum_{j\in G \atop j\ne r } |a_{rj}|       + \displaystyle\sum_{j\in H \atop j\ne r } |a_{rj}|\frac{|x_j|}{\|x\|_\infty}.

The set H is nonempty, because if it were empty then we would have |x_j| = \|x\|_\infty for all j and if there is strict inequality in (2) for i = m, then putting i = m in (4) would give |a_{mm}| \le \sum_{j\ne m} |a_{mj}| |x_j|/|x_m|             =  \sum_{j\ne m} |a_{mj}|, which is a contradiction. Hence as long as a_{rj}\ne0 for some j\in H, we obtain |a_{rr}| <  \sum_{j\ne r } |a_{rj}|, which contradicts the diagonal dominance. Therefore we must have a_{rj}= 0 for all j\in H and all r\in G. This means that all the rows indexed by G have zeros in the columns indexed by H, which means that A is reducible. This is a contradiction, so A must be nonsingular. ~\square

The obvious analogue of Theorem 2 holds for column diagonal dominance.

As an example, the n\times n symmetric tridiagonal matrix (minus the second difference matrix)

\notag  T_n = \left[\begin{array}{@{\mskip 5mu}c*{4}{@{\mskip 15mu} r}@{\mskip 5mu}}      2 &   -1  &          &         & \\     -1 &    2  &  -1      &         & \\        &    -1 &   2      &  \ddots & \\        &       &  \ddots  &  \ddots & -1\\        &       &          &  -1     & 2            \end{array}\right], \qquad (5)

is row diagonally dominant with strict inequality in the first and last diagonal dominance relations. It can also be shown to be irreducible and so it is nonsingular by Theorem 2. If we replace t_{11} or t_{nn} by 1, then T remains nonsingular by the same argument. What if we replace both t_{11} and t_{nn} by 1? We can answer this question by using an observation of Strang. If we define the rectangular matrix

\notag  L_n = \begin{bmatrix}                1  &      &        &  \\                -1 &  1   &        &  \\                   & -1   & \ddots &  \\                   &      & \ddots & 1 \\                   &      &        &  -1     \end{bmatrix} \in\mathbb{R}^{(n+1)\times n}

then T_n = L_n^T L_n and

\notag \widetilde{T}_{n+1}  = \begin{bmatrix}                        1 &-1      &        &      & \\                       -1 & 2      & \ddots &      & \\                          & \ddots & \ddots &  -1  & \\                          &        &   -1    &  2  & -1\\                          &        &         &  -1 & 1                    \end{bmatrix}                          = L_n L_n^T \in \mathbb{R}^{(n+1) \times (n+1)}.

Since in general AB and BA have the same nonzero eigenvalues, we conclude that \Lambda(\widetilde{T}_{n+1})  = \Lambda(T_n) \cup \{0\}, where \Lambda(\cdot) denotes the spectrum. Hence T_n is symmetric positive definite and \widetilde{T}_n is singular and symmetric positive semidefinite.

Relation to Gershgorin’s Theorem

Theorem 1 can be used to obtain information about the location of the eigenvalues of a matrix. Indeed if \lambda is an eigenvalue of A then A - \lambda I is singular and hence cannot be strictly diagonally dominant, by Theorem 1. So |a_{ii}-\lambda| > \sum_{j\ne i} |a_{ij}| cannot be true for all i. Gershgorin’s theorem is simply a restatement of this fact.

Theorem 3 (Gershgorin’s theorem).

The eigenvalues of A\in\mathbb{C}^{n\times n} lie in the union of the n discs in the complex plane

\notag      D_i = \Big\{ z\in\mathbb{C}: |z-a_{ii}| \le \displaystyle\sum_{j\ne i}      |a_{ij}|\Big\}, \quad i=1\colon n.

If A is symmetric with positive diagonal elements and satisfies the conditions of Theorem 1 or Theorem 2 then it is positive definite. Indeed the eigenvalues are real and so in Gershgorin’s theorem the discs are intervals and a_{ii} - z \le |z-a_{ii}| \le \sum_{j\ne i}^n |a_{ij}|, so z \ge |a_{ii}| - \sum_{j\ne i}^n |a_{ij}| \ge 0, so the eigenvalues are nonnegative, and hence positive since nonzero. This provides another proof that the matrix T_n in (5) is positive definite.

Generalized Diagonal Dominance

In some situations A is not diagonally dominant but a row or column scaling of it is. For example, the matrix

\notag   A = \begin{bmatrix}         1   & 1   & 0 \\         2/3 & 2   & 1/4 \\         2/3 & 1/2 & 1       \end{bmatrix}

is not diagonally dominant by rows or columns but

\notag   A \, \mathrm{diag}(3,2,4)    = \begin{bmatrix}         3   & 2   & 0 \\         2   & 4   & 1   \\         2   & 1   & 4       \end{bmatrix}

is strictly diagonally dominant by rows.

A matrix A\in\mathbb{C}^{n\times n} is generalized diagonally dominant by rows if AD is diagonally dominant by rows for some diagonal matrix D = \mathrm{diag}(d_i) with d_i > 0 for all i, that is, if

\notag      |a_{ii}|d_i \ge \displaystyle\sum_{j\ne i} |a_{ij}|d_j, \quad i=1\colon n. \qquad (6)

It is easy to see that if A is irreducible and there is strictly inequality in (6) for some i then A is nonsingular by Theorem 2.

It can be shown that A is generalized diagonally dominant by rows if and only if it is an H-matrix, where an H-matrix is a matrix for which the comparison matrix M(A), defined by

\notag   M(A) = (m_{ij}), \quad m_{ij} =    \begin{cases} |a_{ii}|, & i=j, \\                 -|a_{ij}|, & i\ne j,     \end{cases}

is an M-matrix (see What Is an M-Matrix?).

Block Diagonal Dominance

A matrix A\in\mathbb{C}^{n\times n} is block diagonally dominant by rows if, for a given norm and block m\times m partitioning A = (A_{ij}), the diagonal blocks A_{jj} are all nonsingular and

\notag     \displaystyle\sum_{j\ne i} \|A_{ij}\| \le  \|A_{ii}^{-1}\|^{-1}, \quad i = 1\colon m.   \label{bdd}

A is block diagonally dominant by columns if A^T is block diagonally dominant by rows. If the blocks are all 1\times 1 then block diagonal dominance reduces to the usual notion of diagonal dominance. Block diagonal dominance holds for certain block tridiagonal matrices arising in the discretization of PDEs.

Analogues of Theorems 1 and 2 giving conditions under which block diagonal dominance implies nonsingularity are given by Feingold and Varga (1962).

Bounding the Inverse

If a matrix is strictly diagonally dominant then we can bound its inverse in terms of the minimum amount of diagonal dominance. For full generality, we state the bound in terms of generalized diagonal dominance.

Theorem 4.

If A\in\mathbb{C}^{n\times n} and AD is strictly diagonally dominant by rows for a diagonal matrix D = \mathrm{diag}(d_i) with d_i > 0 for all i, then

\notag   \|A^{-1}\|_\infty \le \displaystyle\frac{\|D\|_{\infty}}{\alpha},

where \alpha = \min_i (|a_{ii}|d_i - \sum_{j\ne i} |a_{ij}|d_j).

Proof. Assume first that D = I. Let y satisfy \|A^{-1}\|_{\infty} = \|A^{-1}y\|_{\infty} / \|y\|_{\infty} and let x = A^{-1}y. Applying (3) gives \|A^{-1}\|_{\infty} = \|x\|_{\infty} / \|y\|_{\infty} \le \alpha^{-1}. The result is obtained on applying this bound to AD and using \|A^{-1}\|_{\infty} \le \|D\|_{\infty} \|(AD)^{-1}\|_{\infty}. ~\square.

Another bound for A^{-1} when A is strictly diagonally dominant by rows can be obtained by writing A = D(I - E), where D = \mathrm{diag}(a_{ii}), e_{ii} = 0, and e_{ij} = -a_{ij}/a_{ii} for i\ne j. It is easy to see that \|E\|_\infty < 1, which gives another proof that A is nonsingular. Then

\notag  \begin{aligned}   |A^{-1}| &= |(I-E)^{-1}D^{-1}|            = |I + E + E^2 + \cdots | |D^{-1}|\\            &\le (I + |E| + |E|^2 + \cdots ) |D|^{-1}\\             &= (I - |E|)^{-1} |D|^{-1}\\             &= M(A)^{-1}.  \end{aligned}

This bound implies that M(A)^{-1} \ge 0, so in view of its sign pattern M(A) is an M-matrix, which essentially proves one direction of the H-matrix equivalence in the previous section. The same bound holds if A is diagonally dominant by columns, by writing A = (I-E)D.

An upper bound also holds for block diagonal dominance.

Theorem 5.

If A\in\mathbb{C}^{n\times n} is block diagonally dominant by rows then

\notag   \|A^{-1}\|_\infty \le \displaystyle\frac{1}{\alpha}.

where \alpha = \min_i ( \|A_{ii}^{-1}\|^{-1} - \sum_{j\ne i} \|A_{ij}\| ).

It is interesting to note that the inverse of a strictly row diagonally dominant matrix enjoys a form of diagonal dominance, namely that the largest element in each column is on the diagonal.

Theorem 6.

If A\in\mathbb{C}^{n\times n} is strictly diagonally dominant by rows then B = A^{-1} satisfies |b_{ij}| < |b_{jj}| for all i\ne j.

Proof. For i\ne j we have \sum_{k=1}^n a_{ik}b_{kj} = 0. Let \beta_j = \max_k |b_{kj}|. Taking absolute values in a_{ii}b_{ij} = -\sum_{k\ne i}a_{ik}b_{kj} gives

\notag  |a_{ii}||b_{ij}| \le \beta_j \sum_{k\ne i} |a_{ik}| < \beta_j |a_{ii}|,

or |b_{ij}| < \beta_j, since a_{ii} \ne 0. This inequality holds for all i\ne j, so we must have \beta_j = |b_{jj}|, which gives the result.

Historical Remarks

Theorems 1 and 2 have a long history and have been rediscovered many times. Theorem 1 was first stated by Lévy (1881) with additional assumptions. In a short but influential paper, Taussky (1949) pointed out the recurring nature of the theorems and gave simple proofs (our proof of Theorem 2 is Taussky’s). Schneider (1977) attributes the surge in interest in matrix theory in the 1950s and 1960s to Taussky’s paper and a few others by her, Brauer, Ostrowski, and Wielandt. The history of Gershgorin’s theorem (published in 1931) is intertwined with that of Theorems 1 and 2; see Varga’s 2004 book for details.

Theorems 4 and 5 are from Varah (1975) and Theorem 6 is from Ostrowski (1952).


This is a minimal set of references, which contain further useful references within.

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

What Is a Companion Matrix?

A companion matrix C\in\mathbb{C}^{n\times n} is an upper Hessenberg matrix of the form

\notag   C = \begin{bmatrix} a_{n-1} & a_{n-2} & \dots  &\dots & a_0 \\                 1       & 0       & \dots  &\dots &  0 \\                 0       & 1       & \ddots &      &  \vdots \\                 \vdots  &         & \ddots & 0    &  0 \\                 0       &  \dots  & \dots  & 1    &  0           \end{bmatrix}.

Alternatively, C can be transposed and permuted so that the coefficients a_i appear in the first or last column or the last row. By expanding the determinant about the first row it can be seen that

\notag    \det(\lambda I - C) = \lambda^n - a_{n-1}\lambda^{n-1} - \cdots                          - a_1\lambda - a_0, \qquad (*)

so the coefficients in the first row of C are the coefficients of its characteristic polynomial. (Alternatively, in \lambda I - C add \lambda^{n-j} times the jth column to the last column for j = 1:n-1, to obtain p(\lambda)e_1 as the new last column, and expand the determinant about the last column.) MacDuffee (1946) introduced the term “companion matrix” as a translation from the German “Begleitmatrix”.

Setting \lambda = 0 in (*) gives \det(C) = (-1)^{n+1} a_0, so C is nonsingular if and only if a_0 \ne 0. The inverse is

\notag   C^{-1} = \begin{bmatrix}                 0 & 1 & 0 &\dots& 0 \cr                 0 & 0 & 1 &\ddots& 0 \cr               \vdots & & \ddots & \ddots & 0\cr               \vdots & &       & \ddots & 1\cr     \displaystyle\frac{1}{a_0} & -\displaystyle\frac{a_{n-1}}{a_0}                   & -\displaystyle\frac{a_{n-2}}{a_0} & \dots                   & -\displaystyle\frac{a_1}{a_0}           \end{bmatrix}.

Note that P^{-1}C^{-1}P is in companion form, where P = I_n(n:-1:1,:) is the reverse identity matrix, and the coefficients are those of the polynomial -\lambda^n p(1/\lambda), whose roots are the reciprocals of those of p.

A companion matrix has some low rank structure. It can be expressed as a unitary matrix plus a rank-1 matrix:

\notag   C = \begin{bmatrix} 0 & 0 & \dots  &\dots & 1   \\                 1       & 0       & \dots  &\dots &  0 \\                 0       & 1       & \ddots &      &  0 \\                 \vdots  &         & \ddots & 0    &  0 \\                 0       &  \dots  & \dots  & 1    &  0           \end{bmatrix} + e_1 \begin{bmatrix} a_{n-1} & a_{n-2} & \dots & a_0-1           \end{bmatrix}. \qquad (1)

Also, C^{-T} differs from C in just the first and last columns, so C^{-T} = C + E, where E is a rank-2 matrix.

If \lambda is an eigenvalue of C then [\lambda^{n-1}, \lambda^{n-2}, \dots, \lambda, 1]^T is a corresponding eigenvector. The last n-1 rows of \lambda I - C are clearly linearly independent for any \lambda, which implies that C is nonderogatory, that is, no two Jordan blocks in the Jordan canonical form contain the same eigenvalue. In other words, the characteristic polynomial is the same as the minimal polynomial.

The MATLAB function compan takes as input a vector [p_1,p_2, \dots, p_{n+1}] of the coefficients of a polynomial, p_1x^n + p_2 x^{n-1} + \cdots + p_n x + p_{n+1}, and returns the companion matrix with a_{n-1} = -p_2/p_1, …, a_0 = -p_{n+1}/p_1.

Perhaps surprisingly, the singular values of C have simple representations, found by Kenney and Laub (1988):

\notag \begin{aligned}    \sigma_1^2 &= \displaystyle\frac{1}{2} \left( \alpha + \sqrt{\alpha^2 - 4 a_0^2} \right), \\    \sigma_i^2 &= 1, \qquad i=2\colon n-1, \\    \sigma_n^2 &= \displaystyle\frac{1}{2} \left( \alpha - \sqrt{\alpha^2 - 4 a_0^2} \right), \end{aligned}

where \alpha = 1 + a_0^2 + \cdots + a_{n-1}^2. These formulae generalize to block companion matrices, as shown by Higham and Tisseur (2003).


Companion matrices arise naturally when we convert a high order difference equation or differential equation to first order. For example, consider the Fibonacci numbers 1, 1, 2, 3, 5, \dots, which satisfy the recurrence f_n = f_{n-1} + f_{n-2} for n \ge 2, with f_0 = f_1 = 1. We can write

\notag     \begin{bmatrix} f_n \\ f_{n-1} \end{bmatrix}      = \begin{bmatrix} 1 & 1 \\ 1 & 0 \end{bmatrix}         \begin{bmatrix} f_{n-1} \\ f_{n-2} \end{bmatrix}       = \begin{bmatrix} 1 & 1 \\ 1 & 0 \end{bmatrix}^2         \begin{bmatrix} f_{n-2} \\ f_{n-3} \end{bmatrix}      = \cdots       = \begin{bmatrix} 1 & 1 \\ 1 & 0 \end{bmatrix}^{n-1}         \begin{bmatrix} f_{1} \\ f_{0} \end{bmatrix},

where \left[\begin{smallmatrix}1 & 1 \\ 1 & 0 \end{smallmatrix}\right] is a companion matrix. This expression can be used to compute f_n in O(\log_2n) operations by computing the matrix power using binary powering.

As another example, consider the differential equation

\notag        y''' = b_2 y'' + b_1 y' + b_0 y.

Define new variables

z_1 = y'', \quad z_2 = y', \quad z_3 = y.


\notag \begin{aligned}      z_1' &= b_2 z_1 + b_1 z_2 + b_0 z_3,\\      z_2' &= z_1\\      z_3' &=  z_2, \end{aligned}


\notag      \begin{bmatrix} z_1 \\ z_2\\ z_3 \end{bmatrix}'  =   \begin{bmatrix} b_2 & b_1 & b_0 \\                 1       & 0    &   0 \\                 0       & 1    &   0 \\           \end{bmatrix}      \begin{bmatrix} z_1 \\ z_2\\ z_3 \end{bmatrix},

so the third order scalar equation has been converted into a first order system with a companion matrix as coefficient matrix.

Computing Polynomial Roots

The MATLAB function roots takes as input a vector of the coefficients of a polynomial and returns the roots of the polynomial. It computes the eigenvalues of the companion matrix associated with the polynomial using the eig function. As Moler (1991) explained, MATLAB used this approach starting from the first version of MATLAB, but it does not take advantage of the structure of the companion matrix, requiring O(n^3) flops and O(n^2) storage instead of the O(n^2) flops and O(n) storage that should be possible given the structure of C. Since the early 2000s much research has aimed at deriving methods that achieve this objective, but numerically stable methods proved elusive. Finally, a backward stable algorithm requiring O(n^2) flops and O(n) storage was developed by Aurentz, Mach, Vandebril, and Watkins (2015). It uses the QR algorithm and exploits the unitary plus low rank structure shown in (1). Here, backward stability means that the computed roots are the eigenvalues of C + \Delta C for some \Delta C with \|\Delta C\| \le c_n u \|C\|. It is not necessarily the case that the computed roots are the exact roots of a polynomial with coefficients a_i + \Delta a_i with |\Delta a_i| \le c_n u \max_i |a_i| for all i.

Rational Canonical Form

It is an interesting observation that

\notag      \begin{bmatrix}                  0  &  0   & 0    & 1   \\                  0  &  0   & 1    & -a_3\\                  0  &  1   & -a_3 & -a_2 \\                  1  & -a_3 & -a_2 & -a_1      \end{bmatrix}      \begin{bmatrix}                a_3 & a_2  & a_1  & a_0 \\                 1    & 0  &   0  & 0 \\                 0    &  1 &   0  & 0 \\                 0    &  0 &   1  & 0      \end{bmatrix}                 =      \begin{bmatrix}                0 & 0    & 1     & 0 \\                0 & 1    & -a_3  & 0 \\                1 & -a_3 & -a_2  & 0  \\                0 & 0    & 0     & a_0 \\      \end{bmatrix}.

Multiplying by the inverse of the matrix on the left we express the 4\times 4 companion matrix as the product of two symmetric matrices. The obvious generalization of this factorization to n\times n matrices shows that we can write

\notag   C = S_1S_2,  \quad S_1 = S_1^T, \quad S_2= S_2^T. \qquad (2)

We need the rational canonical form of a matrix, described in the next theorem, which Halmos (1991) calls “the deepest theorem of linear algebra”. Let \mathbb{F} denote the field \mathbb{R} or \mathbb{C}.

Theorem 1 (rational canonical form).

If A\in\mathbb{F}^{n\times n} then A = X^{-1} C X where X\in\mathbb{F}^{n\times n} is nonsingular and C = \mathrm{diag}(C_i)\in\mathbb{F}^{n\times n}, with each C_i a companion matrix.

The theorem says that every matrix is similar over the underlying field to a block diagonal matrix composed of companion matrices. Since we do not need it, we have omitted from the statement of the theorem the description of the C_i in terms of the irreducible factors of the characteristic polynomial. Combining the factorization (2) and Theorem 1 we obtain

\notag \begin{aligned}   A &= X^{-1}CX = X^{-1} S_1S_2 X \\     & = X^{-1}S_1X^{-T} \cdot X^T S_2 X \\     & \equiv \widetilde{S}_1 \widetilde{S}_2,\quad    \widetilde{S}_1 = \widetilde{S}_1^T, \quad    \widetilde{S}_2 = \widetilde{S}_2^T. \end{aligned}

Since S_1 is nonsingular, and since S_2 can alternatively be taken nonsingular by considering the factorization of A^T, this proves a theorem of Frobenius.

Theorem 2 (Frobenius, 1910).

For any A\in\mathbb{F}^{n\times n} there exist symmetric S_1,S_2\in\mathbb{F}^{n\times n}, either one of which can be taken nonsingular, such that A = S_1 S_2.

Note that if A = S_1S_2 with the S_i symmetric then AS_1 = S_1S_2S_1 = S_1A^T = (AS_1)^T, so AS_1 is symmetric. Likewise, S_2A is symmetric.


Fiedler (2003) noted that a companion matrix can be factorized into the product of n simpler factors, n-1 of them being the identity matrix with a 2\times 2 block placed on the diagonal, and he used this factorization to determine a matrix \widetilde{C} similar to C. For n = 5 it is

\notag \widetilde{C} = \begin{bmatrix}    a_4 & a_3 & 1  & 0   & 0  \\      1 & 0   & 0  & 0   & 0  \\      0 & a_2 & 0  & a_1 & 1  \\      0 & 1   & 0  & 0   & 0 \\      0 & 0   & 0  & a_0 & 0 \end{bmatrix} = \left[\begin{array}{cc|cc|c}    a_4 & 1 &     &   &   \\      1 & 0 &     &   &   \\\hline        &   & a_2 & 1 &   \\        &   &  1  & 0 &   \\\hline        &   &     &   &  a_0 \end{array}\right] \left[\begin{array}{c|cc|cc}      1  &     &   &      & \\\hline         & a_3 & 1 &      & \\         &  1  & 0 &      & \\\hline         &     &   &  a_1 & 1 \\         &     &   &   1  & 0 \end{array}\right].

In general, Fielder’s construction yields an n\times n pentadiagonal matrix \widetilde{C} that is not simply a permutation similarity of C. The fact that \widetilde{C} has block diagonal factors opens the possibility of obtaining new methods for finding the eigenvalues of C. This line of research has been extensively pursued in the context of polynomial eigenvalue problems (see Mackey, 2013).


The companion matrix is associated with the monomial basis representation of the characteristic polynomial. Other polynomial bases can be used, notably orthogonal polynomials, and this leads to generalizations of the companion matrix having coefficients on the main diagonal and the subdiagonal and superdiagonal. Good (1961) calls the matrix resulting from the Chebyshev basis a colleague matrix. Barnett (1981) calls the matrices corresponding to orthogonal polynomials comrade matrices, and for a general polynomial basis he uses the term confederate matrices. Generalizations of the properties of companion matrices can be derived for these classes of matrices.

Bounds for Polynomial Roots

Since the roots of a polynomial are the eigenvalues of the associated companion matrix, or a Fiedler matrix similar to it, or indeed the associated comrade matrix or confederate matrix, one can obtain bounds on the roots by applying any available bounds for matrix eigenvalues. For example, since any eigenvalue \lambda of matrix A satisfies |\lambda| \le \|A\|, by taking the 1-norm and the \infty-norm of the companion matrix C we find that any root \lambda of the polynomial (*) satisfies

\notag \begin{aligned} |\lambda| &\le \max\bigl(|a_0|, 1 + \max_{j = 1:n-1} |a_j| \bigr), \\ |\lambda| &\le \max(1, |a_{n-1}| + |a_{n-2}| + \cdots + |a_0|), \end{aligned}

either of which can be the smaller. A rich variety of such bounds is available, and these techniques extend to matrix polynomials and the corresponding block companion matrices.


This is a minimal set of references, which contain further useful references within.

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

What Is an M-Matrix?

An M-matrix is a matrix A\in\mathbb{R}^{n \times n} of the form

\notag      A = sI - B, \quad \mathrm{where}~B \ge 0~\mathrm{and}~s > \rho(B).      \qquad (*)

Here, \rho(B) is the spectral radius of B, that is, the largest modulus of any eigenvalue of B, and B \ge 0 denotes that B has nonnegative entries. An M-matrix clearly has nonpositive off-diagonal elements. It also has positive diagonal elements, which can be shown using the result that

\notag  \rho(A) = \lim_{k\to\infty} \|A^k\|^{1/k}   \qquad (\dagger)

for any consistent matrix norm:

\notag  s > \rho(B) = \lim_{k\to\infty} \|B^k\|_{\infty}^{1/k}          \ge  \lim_{k\to\infty} \|\mathrm{diag}(b_{ii})^k\|_{\infty}^{1/k}          = \max_i b_{ii}.

Although the definition of an M-matrix does not specify s, we can set it to \max_i a_{ii}. Indeed let s satisfy (*) and set \widetilde{s} = \max_i a_{ii} and \widetilde{B} = \widetilde{s}I - A. Then \widetilde{B} \ge 0, since \widetilde{b}_{ii} = \widetilde{s} - a_{ii} \ge 0 and \widetilde{b}_{ij} = -a_{ij} = b_{ij} \ge 0 for i \ne j. Furthermore, for a nonnegative matrix the spectral radius is an eigenvalue, by the Perron–Frobenius theorem, so \rho(B) is an eigenvalue of B and \rho(\widetilde{B)} is an eigenvalue of \widetilde{B}. Hence \rho(\widetilde{B}) = \rho( (\widetilde{s}-s)I + B) = \widetilde{s}  -s + \rho(B) < \widetilde{s}.

The concept of M-matrix was introduced by Ostrowski in 1937. M-matrices arise in a variety of scientific settings, including in finite difference methods for PDEs, input-output analysis in economics, and Markov chains in stochastic processes.

An immediate consequence of the definition is that the eigenvalues of an M-matrix lie in the open right-half plane, which means that M-matrices are special cases of positive stable matrices. Hence an M-matrix is nonsingular and the determinant, being the product of the eigenvalues, is positive. Moreover, since C = s^{-1}B satisfies \rho(C) < 1,

\notag     A^{-1} = s^{-1}(I - C)^{-1}            = s^{-1}(I + C + C^2 + \cdots) \ge 0.

In fact, nonnegativity of the inverse characterizes M-matrices. Define

\notag     Z_n = \{ \, A \in \mathbb{R}^{n\times n}: a_{ij} \le 0, \; i\ne j \,\}.

Theorem 1.

A nonsingular matrix A\in Z_n is an M-matrix if and only if A^{-1} \ge 0.

Sometimes an M-matrix is defined to be a matrix with nonpositive off-diagonal elements and a nonnegative inverse. In fact, this condition is just one of a large number of conditions equivalent to a matrix with nonpositive off-diagonal elements being an M-matrix, fifty of which are given in Berman and Plemmons (1994, Chap. 6).

It is easy to check from the definitions, or using Theorem 1, that a triangular matrix T with positive diagonal and nonpositive off-diagonal is an M-matrix. An example is

\notag     T_4 = \left[\begin{array}{@{\mskip 5mu}c*{3}{@{\mskip 15mu} r}@{\mskip 5mu}}      1 &   -1  &  -1  & -1 \\        &    1  &  -1  & -1 \\        &       &   1  & -1 \\        &       &      &  1            \end{array}\right], \quad   T_4^{-1} =    \begin{bmatrix}    1 & 1 & 2 & 4\\   & 1 & 1 & 2\\   &   & 1 & 1\\   &   &   & 1    \end{bmatrix}.

Bounding the Norm of the Inverse

An M-matrix can be constructed from any nonsingular triangular matrix by taking the comparison matrix. The comparison matrix associated with a general B\in\mathbb{R}^{n \times n} is the matrix

\notag   M(B) = (m_{ij}), \quad m_{ij} =    \begin{cases} |b_{ii}|, & i=j, \\                 -|b_{ij}|, & i\ne j.     \end{cases}

For a nonsingular triangular T, M(T) is an M-matrix, and it easy to show that

\notag       |T^{-1}| \le |M(T)^{-1}|,

where the absolute value is taken componentwise. This bound, and weaker related bounds, can be useful for cheaply bounding the norm of the inverse of a triangular matrix. For example, with e denoting the vector of ones, since M(T)^{-1} is nonnegative we have

\notag    \|T^{-1}\|_{\infty} \le \|M(T)^{-1}\|_{\infty}                         = \|M(T)^{-1}e\|_{\infty},

and \|M(T)^{-1}e\|_{\infty} can be computed in O(n^2) flops by solving a triangular system, whereas computing T^{-1} costs O(n^3) flops.

More generally, if we have an LU factorization PA = LU of an M-matrix A\in\mathbb{R}^{n \times n} then, since A^{-1} \ge 0,

\notag   \|A^{-1}\|_{\infty} = \|A^{-1}e\|_{\infty}                       = \|U^{-1}L^{-1}Pe\|_{\infty}                       = \|U^{-1}L^{-1}e\|_{\infty}.

Therefore the norm of the inverse can be computed in O(n^2) flops with two triangular solves, instead of the O(n^3) flops that would be required if A^{-1} were to be formed explicitly.

Connections with Symmetric Positive Definite Matrices

There are many analogies between M-matrices and symmetric positive definite matrices. For example, every principal submatrix of a symmetric positive definite matrix is symmetric positive definite and every principal submatrix of an M-matrix is an M-matrix. Indeed if \widetilde{B} is a principal submatrix of a nonnegative B then \rho(\widetilde{B}) \le \rho(B), which follows from (\dagger) for the \infty-norm (for example). Hence on taking principal submatrices in (*) we have s > \rho(\widetilde{B}) with the same s.

A symmetric M-matrix is known as a Stieltjes matrix, and such a matrix is positive definite. An example of a Stieltjes matrix is minus the second difference matrix (the tridiagonal matrix arising from a central difference discretization of a second derivative), illustrated for n = 4 by

\notag     A_4 = \left[\begin{array}{@{\mskip 5mu}c*{3}{@{\mskip 15mu} r}@{\mskip 5mu}}      2 &   -1  &      &    \\     -1 &    2  &  -1  &    \\        &    -1 &   2  &  -1 \\        &       &  -1  &  2            \end{array}\right], \quad   A_4^{-1} =    \begin{bmatrix}   \frac{4}{5} & \frac{3}{5} & \frac{2}{5} & \frac{1}{5}\\[\smallskipamount]   \frac{3}{5} & \frac{6}{5} & \frac{4}{5} & \frac{2}{5}\\[\smallskipamount]   \frac{2}{5} & \frac{4}{5} & \frac{6}{5} & \frac{3}{5}\\[\smallskipamount]   \frac{1}{5} & \frac{2}{5} & \frac{3}{5} & \frac{4}{5} \end{bmatrix}.

LU Factorization

Since the leading principal submatrices of an M-matrix A have positive determinant it follows that A has an LU factorization with U having positive diagonal elements. However, more is true, as the next result shows.

Theorem 2.

An M-matrix A has an LU factorization A = LU in which L and U are M-matrices.

Proof. We can write

\notag    A =    \begin{array}[b]{@{\mskip27mu}c@{\mskip-20mu}c@{\mskip-10mu}c@{}}    \scriptstyle 1 &    \scriptstyle n-1 &    \\    \multicolumn{2}{c}{        \left[\begin{array}{c@{~}c@{~}}                  \alpha & b^T \\                  c^T    & E  \\              \end{array}\right]}    & \mskip-14mu\          \begin{array}{c}              \scriptstyle 1 \\              \scriptstyle n-1              \end{array}    \end{array},   \quad \alpha > 0, \quad b\le 0, \quad c \le 0.

The first stage of LU factorization is

\notag   A = \begin{bmatrix}           \alpha & b^T \\              c   & E       \end{bmatrix}    =       \begin{bmatrix}               1          & 0   \\              \alpha^{-1}c & I       \end{bmatrix}       \begin{bmatrix}               \alpha  & b^T   \\                 0     & S       \end{bmatrix} = L_1U_1, \quad S = E - \alpha^{-1} c\mskip1mu b^T,

where S is the Schur complement of \alpha in A. The first column of L_1 and the first row of U_1 are of the form required for a triangular M-matrix. We have

\notag   A^{-1} = U_1^{-1}L_1^{-1} =       \begin{bmatrix}               \alpha^{-1} & -\alpha^{-1}b^TS^{-1}  \\                     0     & S^{-1}       \end{bmatrix}       \begin{bmatrix}               1          & 0   \\              -\alpha^{-1}c & I       \end{bmatrix}     =       \begin{bmatrix}               \times          & \times   \\              \times & S^{-1}       \end{bmatrix}.

Since A^{-1} \ge 0 it follows that S^{-1} \ge 0. It is easy to see that S\in Z_n, and hence Theorem 1 shows that S is an M-matrix. The result follows by induction.

The question now arises of what can be said about the numerical stability of LU factorization of an M-matrix. To answer it we use another characterization of M-matrices, that DA is strictly diagonally dominant by columns for some diagonal matrix D = \mathrm{diag}(d_i) with d_i>0 for all i, that is,

\notag        d_j|a_{jj}| > \sum_{i\ne j} d_i |a_{ij}|, \quad j=1\colon n.

(This condition can also be written as d^TA > 0 because of the sign pattern of A.) Matrices that are diagonally dominant by columns have the properties that an LU factorization without pivoting exists, the growth factor \rho_n \le 2, and partial pivoting does not require row interchanges. The effect of row scaling on LU factorization is easy to see:

\notag   A = LU \;\Rightarrow\; DA = DLD^{-1} \cdot DU       \equiv \widetilde{L} \widetilde{U},

where \widetilde{L} is unit lower triangular, so that \widetilde{L} and \widetilde{U} are the LU factors of DA. It is easy to see that the growth factor bound of 2 for a matrix diagonally dominant by columns translates into the bound

\notag       \rho_n \le 2\mskip1mu\displaystyle\frac{\max_i d_i}{\min_i d_i} \qquad(\ddagger)

for an M-matrix, which was obtained by Funderlic, Neumann, and Plemmons (1982). Unfortunately, this bound can be large. Consider the matrix

\notag  A = \begin{bmatrix}             \epsilon& 0& -1\\            -1& 1& -1\\             0& 0&  1    \end{bmatrix} \in Z_3, \quad 0 < \epsilon < 1.

We have

\notag  A^{-1} =    \begin{bmatrix}        \displaystyle\frac{1}{\epsilon}& 0& \displaystyle\frac{1}{\epsilon}\\[\bigskipamount]        \displaystyle\frac{1}{\epsilon}& 1 & \displaystyle\frac{1 + \epsilon}{\epsilon}\\[\bigskipamount]                  0&           0&     1    \end{bmatrix} \ge 0,

so A is an M-matrix. The (2,3) element of the LU factor U of A is -1 - 1/\epsilon, which means that

\notag     \rho_3 \ge \displaystyle\frac{1}{\epsilon} + 1,

and this lower bound can be arbitrarily large. One can verify experimentally that numerical instability is possible when \rho_3 is large, in that the computed LU factors have a large relative residual. We conclude that pivoting is necessary for numerical stability in LU factorization of M-matrices.

Stationery Iterative Methods

A stationery iterative method for solving a linear system Ax = b is based on a splitting A = M - N with M nonsingular, and has the form Mx_{k+1} = Nx_k + b. This iteration converges for all starting vectors x_0 if \rho(M^{-1}N) < 1. Much interest has focused on regular splittings, which are defined as ones for which M^{-1}\ge 0 and N \ge 0. An M-matrix has the important property that \rho(M^{-1}N) < 1 for every regular splitting, and it follows that the Jacobi iteration, the Gauss-Seidel iteration, and the successive overrelaxation (SOR) iteration (with 0 < \omega \le 1) are all convergent for M-matrices.

Matrix Square Root

The principal square root A^{1/2} of an M-matrix A is an M-matrix, and it is the unique such square root. An expression for A^{1/2} follows from (*):

\notag \begin{aligned}      A^{1/2} &= s^{1/2}(I - C)^{1/2} \quad (C = s^{-1}B, ~\rho(C) < 1),\\              &= s^{1/2} \sum_{j=0}^{\infty} {\frac{1}{2} \choose j} (-C)^j. \end{aligned}

This expression does not necessarily provide the best way to compute A^{1/2}.

Singular M-Matrices

The theory of M-matrices extends to the case where the condition on s is relaxed to s \ge \rho(B) in (*), though the theory is more complicated and extra conditions such as irreducibility are needed for some results. Singular M-matrices occur in Markov chains (Berman and Plemmons, 1994, Chapter 8), for example. Let P be the transition matrix of a Markov chain. Then P is stochastic, that is, nonnegative with unit row sums, so Pe = e. A nonnegative vector y with y^Te = 1 such that y^T P = y^T is called a stationary distribution vector and is of interest for describing the properties of the Markov chain. To compute y we can solve the singular system Ay = (I - P^T)y = 0. Clearly, A\in Z_n and \rho(P) = 1, so A is a singular M-matrix.


A more general concept is that of H-matrix: A\in\mathbb{R}^{n \times n} is an H-matrix if the comparison matrix M(A) is an M-matrix. Many results for M-matrices extend to H-matrices. For example, for an H-matrix with positive diagonal elements the principal square root exists and is the unique square root that is an H-matrix with positive diagonal elements. Also, the growth factor bound (\ddagger) holds for any H-matrix for which DA is diagonally dominant by columns.


This is a minimal set of references, which contain further useful references within.

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

What Is a Unitarily Invariant Norm?

A norm on \mathbb{C}^{m \times n} is unitarily invariant if \|UAV\| = \|A\| for all unitary U\in\mathbb{C}^{m \times m} and V\in\mathbb{C}^{n\times n} and for all A\in\mathbb{C}^{m \times n}. One can restrict the definition to real matrices, though the term unitarily invariant is still typically used.

Two widely used matrix norms are unitarily invariant: the 2-norm and the Frobenius norm. The unitary invariance follows from the definitions. For the 2-norm, for any unitary U and V, using the fact that \|Uz\|_2 = \|z\|_2, we obtain

\notag \begin{aligned}    \|UAV\|_2 &= \max_{x\ne0} \frac{\|UAVx\|_2}{\|x\|_2}              = \max_{x\ne0} \frac{\|AVx\|_2}{\|x\|_2}\\              &= \max_{x\ne0} \frac{\|Ay\|_2}{\|V^*y\|_2} \quad(y = Vx)\\              &= \max_{y\ne0} \frac{\|Ay\|_2}{\|y\|_2}  = \|A\|_2. \end{aligned}

For the Frobenius norm, using \|A\|_F^2 = \mathrm{trace}(A^*A),

\notag \begin{aligned}    \|UAV\|_F^2 &= \mathrm{trace}(V^*A^*U^* \cdot UAV)\\                &= \mathrm{trace}(V^*AAV)\\                &= \mathrm{trace}(A^*A) = \|A\|_F^2, \end{aligned}

since the trace is invariant under similarity transformations.

More insight into unitarily invariant norms comes from recognizing a connection with the singular value decomposition

\notag   A = P\Sigma Q^*, \quad P^*P = I_m, \quad Q^*Q = I_n, \quad                 \Sigma = \mathrm{diag}(\sigma_i), \quad     \sigma_1 \ge \cdots \ge \sigma_q \ge 0.

Clearly, \|A\| = \|\Sigma\|, so \|A\| depends only on the singular values. Indeed, for the 2-norm and the Frobenius norm we have \|A\|_2 = \sigma_1 and \|A\|_F = (\sum_{i=1}^q \sigma_i^2)^{1/2}. Here, and throughout this article, q = \min(m,n). Another implication of the singular value dependence is that \|A\| = \|A^*\| for all A for any unitarily invariant norm.

There is a beautiful characterization of unitarily invariant norms in terms of symmetric gauge functions, which are functions f:\mathbb{R}^q\to\mathbb{R}^q such that f is an absolute norm on \mathbb{R}^q and f(Px) = f(x) for any permutation matrix P and all x. An absolute norm is one with the property that \|x\| = \||x|\| for all x, and this condition is equivalent to the monotonicity condition that |x| \le |y| implies \|x\| \le \|y\| for all x and y.


A norm on \mathbb{C}^{m \times n} is unitarily invariant if and only if \|A\| = f(\sigma_1, \dots, \sigma_q) for all A for some symmetric gauge function f, where \sigma_1, \dots, \sigma_q are the singular values of A.

The matrix 2-norm and the Frobenius norm correspond to f being the vector \infty-norm and the 2-norm, respectively. More generally, we can take for f any vector p-norm, obtaining the class of Schatten p-norms:

\notag   \|A\|_p^S  = \biggl(\displaystyle\sum_{i=1}^q \sigma_i^p\biggr)^{1/p}, \quad 1\le    p\le\infty.

The Schatten 1-norm is the sum of the singular values, \|A\|_1^S = \sum_{i=1}^q \sigma_i, which is called the trace norm or nuclear norm. It can act as a proxy for the rank of a matrix. The trace norm can be expressed as \|A\|_1^S = \mathrm{trace}(H)            = \mathrm{trace}\bigl((A^*A)^{1/2}\bigr), where A = UH is a polar decomposition.

Another class of unitarily invariant norms is the Ky Fan k-norms

\notag   \|A\|_{(k)} = \displaystyle\sum_{i=1}^k \sigma_i, \quad 1\le k \le n.

We have \|A\|_{(n)} = \|A\|_1^S and \|A\|_{(1)} = \|A\|_2.

Among unitarily invariant norms, the 2-norm and the Frobenius norm are widely usd in numerical analysis and matrix analysis. The nuclear norm is used in problems involving matrix rank minimization, such as matrix completion problems.

The benefit of the concept of unitarily invariant norm is that one can prove certain results for this whole class of norms, obtaining results for the particular norms of interest as special cases. Here are three important examples.

  • For A\in\mathbb{C}^{n\times n}, the matrix (A+A^*)/2 is the nearest Hermitian matrix to A in any unitarily invariant norm.
  • For A\in\mathbb{C}^{m\times n} (m\ge n), the unitary polar factor is the nearest matrix with orthonormal columns to A in any unitarily invariant norm.
  • The best rank-k approximation to A\in\mathbb{C}^{m\times n} in any unitarily invariant norm is obtained by setting all the singular values beyond the kth to zero in the SVD of A. This result, which generalizes the Eckart-Young theorem, which covers the 2- and Frobenius norm instances, is an easy consequence of the following result of Mirsky (1960).


    Let A,B\in\mathbb{C}^{m\times n} have SVDs with diagonal matrices \Sigma_A, \Sigma_B \in\mathbb{R}^{m\times n}, where the diagonal elements are arranged in nonincreasing order. Then \|A-B\| \ge \|\Sigma_A-\Sigma_B\| for every unitarily invariant norm.

We also give a useful matrix norm inequality. For any matrices A, B, and C for which the product ABC is defined,

\notag     \|ABC\| \le \|A\|_2 \mskip2mu \|B\| \mskip2mu \|C\|_2

holds for any unitarily invariant norm, and in fact, any two of the norms on the right-hand side can be 2-norms.


This is a minimal set of references, which contain further useful references within.

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

What Is the Nearest Positive Semidefinite Matrix?

Given a symmetric matrix and a nonnegative number \delta, what is the nearest symmetric matrix whose eigenvalues are all at least \delta? In other words, how can we project a symmetric matrix onto the set of symmetric positive semidefinite matrices with eigenvalues at least \delta?

Distance can be measured in any norm, but the most common choice is the Frobenius norm, \|A\|_F = (\sum_{i,j = 1}^n |a_{ij}|^2)^{1/2}.

This problem occurs in a very wide range of applications. A typical scenario is that a covariance matrix is approximated in some way that does not guarantee positive semidefinitess, for example by treating blocks of the matrix independently. In machine learning, some methods use indefinite kernels and these can require an indefinite similarity matrix to be replaced by a semidefinite one. When \delta > 0, the problem arises when the matrix is positive definite but ill conditioned and a matrix of smaller condition number is required, or when rounding errors result in small negative eigenvalues and a “safely positive definite” matrix is wanted.

The following theorem gives the solution to the problem for the Frobenius norm.

Theorem (Cheng and Higham, 1998).

Let the symmetric matrix A\in\mathbb{R}^{n \times n} have the spectral decomposition A = Q \mskip1mu\mathrm{diag}(\lambda_i) \mskip1muQ^T and let \delta \ge 0. The unique matrix with smallest eigenvalue at least \delta nearest to A in the Frobenius norm is given by

\notag        X = Q \mskip1mu \mathrm{diag}(\tau_i) \mskip1mu Q^T, \quad  \tau_i = \begin{cases} \lambda_i, & \lambda_i \geq \delta\\                         \delta, & \lambda_i < \delta.           \end{cases}

The theorem says that there is a unique nearest matrix and that is has the same eigenvectors as A with any eigenvalues less than \delta increased to \delta. When \delta = 0, another way to express this result is in terms of the polar decomposition A = UH, where U is orthogonal and H is symmetric positive definite (with H = Q \mskip1mu\mathrm{diag}(|\lambda_i|)Q^T, since A is symmetric): X = (A + H)/2.

One can pose the same nearness problem for nonsymmetric A. For the Frobenius norm there is again a unique nearest symmetric matrix whose eigenvalues are all at least \delta, and it is given by applying the theorem to the symmetric part B of A (which is the nearest symmetric matrix to A), where

\notag   A = \frac{1}{2} (A + A^T ) + \frac{1}{2} (A - A^T )  \equiv B + C,    \quad(B = B^T, ~C = -C^T).

For the 2-norm the answer is known for nonsymmetric A only for \delta = 0, so we take \delta = 0 from this point on. We write X \ge 0 to denote that the symmetric matrix X is positive semidefinite and use a superscript 1/2 to denote the positive semidefinite square root.

Theorem (Halmos, 1972).

The 2-norm distance from A\in\mathbb{R}^{n \times n} to the nearest positive semidefinite matrix is

\notag   d_2(A) = \min\{\, r \ge 0: r^2I + C^2 \ge 0 ~\mathrm{and}~ G(r) \ge 0 \,\},


\notag    G(r) = B + \bigl(r^2I + C^2\bigr)^{1/2},

and a nearest matrix is

\notag   X = B + (d_2(A)^2I + C^2)^{1/2}.

When A is symmetric, so that C = 0, the nearest matrix and the distance reduce to

\notag   X = B + d_2(A)I, \quad d_2(A) = \max\{\, |\lambda_i| : \lambda_i < 0 \,\}.

Clearly, X is not, in general, unique because when A is symmetric, instead of shifting B by a multiple of I we could just perturb the negative eigenvalue to zero, as with the optimal Frobenius norm perturbation, without changing the 2-norm of the perturbation. A distinguishing feature of X in Halmos’s theorem is that X-P \ge 0 for any other nearest positive semidefinite matrix P (Bouldin, 1973, Theorem 4.2); thus X has the minimum number of zero eigenvalues over all nearest positive semidefinite matrices.

The Frobenius norm solution X_F is an approximate minimizer of the 2-norm distance since (Higham, 1988)

\notag   d_2(A) \le \| A - X_F \|_F \le 2 d_2(A).

Halmos’s theorem simplifies the computation of d_2(A) because it reduces the minimization problem to one dimension. However, the problem is nonlinear and has no closed form solution for general A, so iterative methods are needed to compute d_2(A). Two algorithms are developed in Higham (1988). Both are based on the following properties of the matrix G(r) in the theorem: \lambda_{\min}(G(r)) is monotone increasing on [\,\rho(C), \infty), and either \lambda_{\min}(G(\rho(C))) \ge 0, in which case d_2(A) = \rho(C), or \lambda_{\min}(G(r)) has a unique zero r^* = d_2(A) > \rho(C). The first algorithm uses the bisection method, determining the sign of \lambda_{\min}(G(r)) by attempting a Cholesky factorization of G(r): the sign is nonnegative if the factorization exists and negative otherwise. This approach is attractive if d_2(A) is required only to low accuracy. For higher accuracy computations it is better to apply a more rapidly converging zero finder to f(r) = \lambda_{\min} (G(r)) = 0. A hybrid Newton–bisection method is used in Higham (1988).


For a numerical example (in MATLAB), we take the Jordan block

\notag  A =  \begin{bmatrix}  0  & 1  & 0  & 0  & 0  \\  0  & 0  & 1  & 0  & 0  \\  0  & 0  & 0  & 1  & 0  \\  0  & 0  & 0  & 0  & 1  \\  0  & 0  & 0  & 0  & 0  \end{bmatrix}.

The symmetric part of A, B = (A+A^T)/2, has two negative eigenvalues and a zero eigemnvalue:

>> A = gallery('jordbloc',5,0); As = sym(A); eig_B = eig((As+As')/2)'
eig_B =
[-1/2, 0, 1/2, -3^(1/2)/2, 3^(1/2)/2]
>> double(eig_B)
ans =
  -5.0000e-01            0   5.0000e-01  -8.6603e-01   8.6603e-01

The nearest symmetric positive semidefinite matrix to A in the Frobenius norm can be computed in MATLAB as

B = (A + A')/2; [Q,D] = eig(B); d = diag(D); X_F = Q*diag(max(d,0))*Q';

We can improve this code by using the implicit expansion feature of MATLAB to avoid forming a diagonal matrix. Since the computed result is not exactly symmetric because of rounding errors, we also need to replace it by the nearest symmetric matrix:

[Q,d] = eig(B,'vector'); X_F = Q*(max(d,0).*Q'); X_F = (X_F + X_F')/2;

We obtain

X_F = 
 1.972e-01   2.500e-01   1.443e-01  -4.163e-17  -5.283e-02  
 2.500e-01   3.415e-01   2.500e-01   9.151e-02  -1.665e-16  
 1.443e-01   2.500e-01   2.887e-01   2.500e-01   1.443e-01  
-4.163e-17   9.151e-02   2.500e-01   3.415e-01   2.500e-01  
-5.283e-02  -1.665e-16   1.443e-01   2.500e-01   1.972e-01

which has eigenvalues

-3.517e-17   7.246e-17   9.519e-17   5.000e-01   8.660e-01  

Notice that the three eigenvalues that would be zero in exact arithmetic are of order the unit roundoff. A nearest matrix in the 2-norm, as given by Halmos’s theorem. is

X_2 = 
 8.336e-01   5.000e-01   1.711e-01   0.000e+00  -1.756e-02  
 5.000e-01   6.625e-01   5.000e-01   1.887e-01   0.000e+00  
 1.711e-01   5.000e-01   6.450e-01   5.000e-01   1.711e-01  
 0.000e+00   1.887e-01   5.000e-01   6.625e-01   5.000e-01  
-1.756e-02   0.000e+00   1.711e-01   5.000e-01   8.336e-01 

and its eigenvalues are

-7.608e-17   1.281e-01   5.436e-01   1.197e+00   1.769e+00  

The distances are

\notag  \begin{aligned}   1.732  &= \|A - X_F\|_F < \|A - X_2\|_F = 2.207,\\   0.9872 &= \|A - X_2\|_2 < \|A - X_F\|_2 = 1.0355.   \end{aligned}


This is a minimal set of references, which contain further useful references within.

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

What Is the Nearest Symmetric Matrix?

What is the nearest symmetric matrix to a real, nonsymmetric square matrix A? This question arises whenever a symmetric matrix is approximated by formulas that do not respect the symmetry. For example, a finite difference approximation to a Hessian matrix can be nonsymmetric even though the Hessian is symmetric. In some cases, lack of symmetry is caused by rounding errors. The natural way to symmetrize A is to replace it by (A + A^T)/2. Is this the best choice?

As our criterion of optimality we take that \| A - X \| is minimized over symmetric X for some suitable norm. Fan and Hoffman (1955) showed that (A + A^T)/2 is a solution in any unitarily invariant norm. A norm is unitarily invariant if \|A\| = \|UAV\| for all unitary U and V. Such a norm depends only on the singular values of A, and hence \|A\| = \|A^T\| since A and A^T have the same singular values. The most important examples of unitarily invariant norms are the 2-norm and the Frobenius norm.

The proof that (A+A^T)/2 is optimal is simple. For any symmetric Y,

\notag  \begin{aligned}   \Bigl\| A - \frac{1}{2}(A + A^T) \Bigr\|         &=  \frac{1}{2} \|A - A^T \|          = \frac{1}{2} \| A - Y + Y^T - A^T \| \\         &\le \frac{1}{2} \| A - Y \| + \frac{1}{2} \|(Y - A)^T \| \\         &= \| A - Y\|.   \end{aligned}

Simple examples of a matrix and a nearest symmetric matrix are

\notag    A = \begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix}, \;\;    X = \begin{bmatrix} 0 & \frac{1}{2} \\ \frac{1}{2} & 0 \end{bmatrix},\qquad    A = \begin{bmatrix} 0 & 1 \\ -1 & 0 \end{bmatrix}, \;\;    X = \begin{bmatrix} 0 & 0 \\ 0 & 0 \end{bmatrix}.

Note that any A can be written

\notag   A = \frac{1}{2} (A + A^T ) + \frac{1}{2} (A - A^T )  \equiv B + C,

where B and C are the symmetric part and the skew-symmetric part of A, respectively, so the nearest symmetric matrix to A is the symmetric part of A.

For the Frobenius norm, (A + A^T)/2 is the unique nearest symmetric matrix, which follows from the fact that \|S + K\|_F^2 = \|S\|_F^2 + \|K\|_F^2 for symmetric S and skew-symmetric K. For the 2-norm, however, the nearest symmetric matrix is not unique in general. An example of non-uniqueness is

\notag    A = \begin{bmatrix} 0 & 0 & 0 \\ 0 & 0 & 1 \\ 0 & 0 & 0 \end{bmatrix}, \quad    X = \begin{bmatrix} 0 & 0 & 0 \\ 0 & 0 & \frac{1}{2} \\ 0 & \frac{1}{2} & 0 \end{bmatrix}, \quad    Y = \begin{bmatrix} \theta & 0 & 0 \\ 0 & 0 & \frac{1}{2} \\ 0 & \frac{1}{2} & 0 \end{bmatrix},

for which \|A - X\|_2 = 0.5, and \|A - Y\|_2 = 0.5 for any \theta such that |\theta| \le 0.5.

Entirely analogous to the above is the nearest skew-symmetric matrix problem, for which the solution is the skew-symmetric part for any unitarily invariant norm. For complex matrices, these results generalize in the obvious way: (A + A^*)/2 is the nearest Hermitian matrix to A and (A - A^*)/2 is the nearest skew-Hermitian matrix to A in any unitarily invariant norm.


This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

What Is the Softmax Function?

The softmax function takes as input a real n-vector x and returns the vector g with elements given by

\notag   \qquad\qquad   g_j(x) =  \displaystyle\frac{\mathrm{e}^{x_j}}{\sum_{i=1}^n \mathrm{e}^{x_i}}, \quad          j=1\colon n.   \qquad\qquad (*)

It arises in machine learning, game theory, and statistics. Since \mathrm{e}^{x_j} \ge 0 and \sum_{j=1}^n g_j = 1, the softmax function is often used to convert a vector x into a vector of probabilities, with the more positive entries giving the larger probabilities.

The softmax function is the gradient of the log-sum-exp function

\notag      \mathrm{lse}(x) = \log\displaystyle\sum_{i=1}^n \mathrm{e}^{x_i},

where \log is the natural logarithm, that is, g_j(x) = (\partial/\partial x_j) \mathrm{lse}(x).

The following plots show the two components of softmax for n = 2. Note that they are constant on lines x_1 - x_2 = \mathrm{constant}, as shown by the contours.


Here are some examples:

>> softmax([-1 0 1])
ans =
>> softmax([-1 0 10])
ans =

Note how softmax increases the relative weighting of the larger components over the smaller ones. The MATLAB function softmax used here is available at https://github.com/higham/logsumexp-softmax.

A concise alternative formula, which removes the denominator of (*) by rewriting it as the exponential of \mathrm{lse}(x) and moving it into the numerator, is

\notag   \qquad\qquad     g_j = \exp\bigl(x_j - \mathrm{lse}(x)\bigr).   \qquad\qquad (\#)

Straightforward evaluation of softmax from either (*) or (\#) is not recommended, because of the possibility of overflow. Overflow can be avoided in (*) by shifting the components of x, just as for the log-sum-exp function, to obtain

\notag   \qquad\qquad   g_j(x) =  \displaystyle\frac{\mathrm{e}^{x_j-\max(x)}}{\sum_{i=1}^n \mathrm{e}^{x_i-\max(x)}}, \quad             j=1\colon n.   \qquad\qquad (\dagger)

where \max(x) = \max_i x_i. It can be shown that computing softmax via this formula is numerically reliable. The shifted version of (\#) tends to be less accurate, so (\dagger) is preferred.


This is a minimal set of references, which contain further useful references within.

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

What Is the Log-Sum-Exp Function?

The log-sum-exp function takes as input a real n-vector x and returns the scalar

\notag      \mathrm{lse}(x) = \log \displaystyle\sum_{i=1}^n \mathrm{e}^{x_i},

where \log is the natural logarithm. It provides an approximation to the largest element of x, which is given by the \max function, \max(x) = \max_i x_i. Indeed,

\notag   \mathrm{e}^{\max(x)} \le \displaystyle\sum_{i=1}^n \mathrm{e}^{x_i}                        \le n \mskip1mu \mathrm{e}^{\max(x)},

and on taking logs we obtain

\notag   \qquad\qquad       \max(x) \le \mathrm{lse}(x) \le \max(x) + \log n.   \qquad\qquad (*)

The log-sum-exp function can be thought of as a smoothed version of the max function, because whereas the max function is not differentiable at points where the maximum is achieved in two different components, the log-sum-exp function is infinitely differentiable everywhere. The following plots of \mathrm{lse}(x) and \max(x) for n = 2 show this connection.


The log-sum-exp function appears in a variety of settings, including statistics, optimization, and machine learning.

For the special case where x = [0~t]^T, we obtain the function f(t) = \log(1+\mathrm{e}^t), which is known as the softplus function in machine learning. The softplus function approximates the ReLU (rectified linear unit) activation function \max(t,0) and satisfies, by (*),

\notag     \max(t,0) \le f(t) \le \max(t,0) + \log 2.

Two points are worth noting.

  • While \log(x_1 + x_2) \ne \log x_1 + \log x_2, in general, we do (trivially) have \log(x_1 + x_2) = \mathrm{lse}(\log x_1,\log x_2), and more generally \log(x_1 + x_2 + \cdots + x_n) = \mathrm{lse}(\log x_1,\log   x_2,\dots,\log x_n).
  • The log-sum-exp function is not to be confused with the exp-sum-log function: \exp \sum_{i=1}^n \log x_i = x_1x_2\dots x_n.

Here are some examples:

>> format long e
>> logsumexp([1 2 3])
ans =

>> logsumexp([1 2 30])
ans =

>> logsumexp([1 2 -3])
ans =

The MATLAB function logsumexp used here is available at https://github.com/higham/logsumexp-softmax.

Straightforward evaluation of log-sum-exp from its definition is not recommended, because of the possibility of overflow. Indeed, \exp(x) overflows for x = 12, x = 89, and x = 710 in IEEE half, single, and double precision arithmetic, respectively. Overflow can be avoided by writing

\notag \begin{aligned}   \mathrm{lse}(x) &= \log \sum_{i=1}^n \mathrm{e}^{x_i}          = \log \sum_{i=1}^n \mathrm{e}^a \mathrm{e}^{x_i - a}          = \log  \left(\mathrm{e}^a\sum_{i=1}^n \mathrm{e}^{x_i - a}\right), \end{aligned}

which gives

\notag     \mathrm{lse}(x) = a + \log\displaystyle\sum_{i=1}^n \mathrm{e}^{x_i - a}.

We take a = \max(x), so that all exponentiations are of nonpositive numbers and therefore overflow is avoided. Any underflows are harmless. A refinement is to write

\notag     \qquad\qquad     \mathrm{lse}(x) = \max(x) + \mathrm{log1p}\Biggl( \displaystyle\sum_{i=1 \atop i\ne k}^n                                 \mathrm{e}^{x_i - \max(x)}\Biggr),     \qquad\qquad (\#)

where x_k = \max(x) (if there is more than one such k, we can take any of them). Here, \mathrm{log1p}(x) = \log(1+x) is a function provided in MATLAB and various other languages that accurately evaluates \log(1+x) even when x is small, in which case 1+x would suffer a loss of precision if it was explicitly computed.

Whereas the original formula involves the logarithm of a sum of nonnegative quantities, when \max(x) < 0 the shifted formula (\#) computes \mathrm{lse}(x) as the sum of two terms of opposite sign, so could potentially suffer from numerical cancellation. It can be shown by rounding error analysis, however, that computing log-sum-exp via (\#) is numerically reliable.


This is a minimal set of references, which contain further useful references within.

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.