What Is a Pseudo-Orthogonal Matrix?

A matrix Q\in\mathbb{R}^{n\times n} is pseudo-orthogonal if

\notag      Q^T \Sigma Q = \Sigma, \qquad (1)

where \Sigma = \mathrm{diag}(\pm 1) is a signature matrix. A matrix Q satisfying (1) is also known as a J-orthogonal matrix, where J is another notation for a signature matrix. Of course, if \Sigma = I then Q is orthogonal.

It is easy to show that Q^T is also pseudo-orthogonal. Furthermore, Q is clearly nonsingular and it satisfies

\notag      Q = \Sigma Q^{-T}\Sigma. \qquad (2)

Since \Sigma is orthogonal, this equation implies that \|Q\|_\ell = \|Q^{-T}\|_\ell = \|Q^{-1}\|_\ell and hence that

\notag   \kappa_p(Q) = \|Q\|_\ell^2, \quad \ell = 2,F. \qquad(3).

What are some examples of pseudo-orthogonal matrices? For n = 2 and \Sigma = \left[\begin{smallmatrix}1 & 0 \\ 0 & -1 \end{smallmatrix}\right], Q is of the form

\notag   Q =   \begin{bmatrix}    a & b \\ c & d    \end{bmatrix}, \quad   ab - cd = 0, \quad a^2 - c^2 = 1, \quad b^2 - d^2 = -1,

which includes the matrices

\notag     Q = \begin{bmatrix} \cosh \theta & -\sinh\theta \\                         -\sinh\theta & \cosh\theta \end{bmatrix},       \quad \theta\in\mathbb{R}. \qquad (4)

The Lorentz group, representing symmetries of the spacetime of special relativity, corresponds to 4\times 4 matrices with \Sigma = \mathrm{diag}(1,1,1,-1).

Equation (2) shows that Q is similar to the inverse of its transpose and hence (since every matrix is similar to its transpose) similar to its inverse. It follows that if \lambda is an eigenvalue of Q then \lambda^{-1} is also an eigenvalue and it has the same algebraic and geometric multiplicities as \lambda.

By permuting rows and columns in (1) we can arrange that

\notag        \Sigma = \Sigma_{p,q} := \begin{bmatrix} I_p & 0   \\                             0  & -I_q            \end{bmatrix}. \qquad (5)

We assume that \Sigma has this form throughout the rest of this article. This form of \Sigma allows us to conveniently characterize matrices that are both orthogonal and pseudo-orthogonal. Such a matrix must satisfy \Sigma Q = Q\Sigma, which means that Q = \mathrm{diag}(Q_{11},Q_{22}), and any such orthogonal matrix is pseudo-orthogonal.

Applications

Pseudo-orthogonal matrices arise in hyperbolic problems, that is, problems where there is an underlying indefinite scalar product or weight matrix. An example is the problem of downdating the Cholesky factorization, where in the simplest case we have the Cholesky factorization C = R^T\!R of a symmetric positive definite C\in\mathbb{R}^{n\times n} and want the Cholesky factorization of \widetilde{C} = C - zz^T, which is assumed to be symmetric positive definite. A more general downdating problem is that we are given

\notag   A = \begin{array}[b]{cc}        \left[\begin{array}{@{}c@{}}                  A_1\\                  A_2              \end{array}\right]        & \mskip-22mu\          \begin{array}{l}              \scriptstyle p \\              \scriptstyle q          \end{array}    \end{array},    \quad p\ge n,

and the Cholesky factorization A^T\!A = R^T\!R and wish to obtain the Cholesky factor S of A_1^TA_1  = R^T\!R - A_2^TA_2. Note that R and S are n\times n. This task arises when we solve a regression problem after the observations corresponding to A_2 have been removed. The simple case above corresponds to removing one row (q = 1). Assuming that q \ll p, we would like to obtain S cheaply from R, and numerical stability considerations dictate that we should avoid explicitly forming A_1^TA_1. If we can find a pseudo-orthogonal matrix Q such that

\notag       Q \begin{bmatrix} R \\ A_2 \end{bmatrix}         =         \begin{bmatrix} S \\ 0 \end{bmatrix}, \qquad (6)

with \Sigma given by (5) and S\in\mathbb{R}^{n\times n} upper triangular, then

\notag     A_1^TA_1       = \begin{bmatrix} R   \\ A_2 \end{bmatrix}^T \Sigma         \begin{bmatrix} R   \\ A_2 \end{bmatrix}       = \begin{bmatrix} R   \\ A_2 \end{bmatrix}^T Q^T \Sigma Q         \begin{bmatrix} R   \\ A_2 \end{bmatrix}       = \begin{bmatrix} S   \\ 0   \end{bmatrix}^T \Sigma         \begin{bmatrix} S   \\ 0   \end{bmatrix}       = S^T\!S,

so S is the desired Cholesky factor.

The factorization (6) is called a hyperbolic QR factorization and it can be computed by using hyperbolic rotations to zero out the elements of A_2. A 2\times2 hyperbolic rotation has the form (4), and an n\times n hyperbolic rotation is an identity matrix with a 2\times 2 hyperbolic rotation embedded in it at the intersection of rows and columns i and j, for some i and j.

In general, a hyperbolic QR factorization of A\in\mathbb{R}^{m\times n} with m = p+q and p\ge n has the form QA = \left[\begin{smallmatrix} R \\ 0 \end{smallmatrix}\right] with Q pseudo-orthogonal with respect to \Sigma = \Sigma_{p,q} and R \in\mathbb{R}^{n\times n} upper triangular. The factorization exists if A^T\Sigma A is positive definite.

Another hyperbolic problem is the indefinite least squares problem

\notag        \min_x \,(b-Ax)^T \Sigma (b-Ax), \qquad (7)

where A\in\mathbb{R}^{m\times n}, m\ge n, and b\in\mathbb{R}^m are given, and \Sigma = \Sigma_{p,q} with m = p + q. For p=0 or q=0 we have the standard least squares (LS) problem and the quadratic form is definite, while for pq>0 the problem is to minimize a genuinely indefinite quadratic form. This problem arises, for example, in the area of optimization known as H^{\infty} smoothing.

The normal equations for (7) are A^T\Sigma Ax = A^T\Sigma b, and since the Hessian matrix of the quadratic objective function in (7) is A^T\Sigma A it follows that the indefinite least squares problem has a unique solution if and only if A^T\Sigma A is positive definite. To solve the problem we can use a hyperbolic QR factorization QA = \left[\begin{smallmatrix} R \\ 0 \end{smallmatrix}\right] to write

\notag \begin{aligned}     A^T\Sigma A &= A^T Q^T \Sigma Q A     = \begin{bmatrix} R \\ 0 \end{bmatrix}^T           \begin{bmatrix} I_p & 0   \\                           0  & -I_q                           \end{bmatrix}      \begin{bmatrix} R \\ 0 \end{bmatrix}      = R^T\!R, \\   A^T\Sigma b &= A^T Q^T\Sigma Q b          = \begin{bmatrix} R \\ 0 \end{bmatrix}^T \! \Sigma Q b. \end{aligned}

Solving the problem now reduces to solving the triangular system Rx = d, where d comprises the first n components of Qb. The same equation can also be obtained without using the normal equations by substituting the hyperbolic QR factorization into (7).

The Exchange Operator

A simple technique exists for converting pseudo-orthogonal matrices into orthogonal matrices and vice versa. Let A\in\mathbb{R}^{n\times n} with n = p + q, partition

\notag   A  = \mskip5mu    \begin{array}[b]{@{\mskip-20mu}c@{\mskip0mu}c@{\mskip-1mu}c@{}}    & \mskip10mu\scriptstyle p & \scriptstyle q \\       \mskip15mu          \begin{array}{r}              \scriptstyle p \\              \scriptstyle q          \end{array}~    &       \multicolumn{2}{c}{\mskip-15mu          \left[\begin{array}{c@{~}c@{~}}                  A_{11} & A_{12}\\                  A_{21} & A_{22}                \end{array}\right]       }    \end{array}, \qquad (8)

and assume A_{11} is nonsingular. The exchange operator is defined by

\notag    \mathrm{exc}(A) =       \begin{bmatrix}            A_{11}^{-1} & -A_{11}^{-1}A_{12} \\            A_{21}A_{11}^{-1} & A_{22} -A_{21}A_{11}^{-1}A_{12}      \end{bmatrix}.

It is easy to see that the exchange operator is involutory, that is,

\notag   \mathrm{exc}(\mathrm{exc}(A)) = A,

and moreover (recalling that \Sigma is given by (5)) that

\notag     \mathrm{exc}(\Sigma A\Sigma) = \Sigma \mathrm{exc}(A)\Sigma = \mathrm{exc}(A^T)^T.     \qquad(9)

The next result gives a formula for the inverse of \mathrm{exc}(A).

Lemma 1. Let A\in\mathbb{R}^{n\times n} with A_{11} nonsingular. If A is nonsingular and \mathrm{exc}(A^{-1}) exists then \mathrm{exc}(A) is nonsingular and \mathrm{exc}(A)^{-1} = \mathrm{exc}(A^{-1}).

Proof. Consider the equation

\notag    y =    \begin{bmatrix}    y_1 \\ y_2    \end{bmatrix}      =      \begin{bmatrix}            A_{11} & A_{12} \\            A_{21} & A_{22}      \end{bmatrix}    \begin{bmatrix}    x_1 \\ x_2    \end{bmatrix}  =  Ax.

By solving the first equation for x_1 and then eliminating x_1 from the second equation we obtain

\notag   \begin{bmatrix}    x_1 \\ y_2   \end{bmatrix}   =   \mathrm{exc}(A)   \begin{bmatrix}    y_1 \\ x_2   \end{bmatrix}. \qquad (10)

By the same argument applied to x = A^{-1}y, we have

\notag   \begin{bmatrix}    y_1 \\ x_2   \end{bmatrix}   =   \mathrm{exc}(A^{-1})   \begin{bmatrix}    x_1 \\ y_2   \end{bmatrix}.

Hence for any x_1 and y_2 there is a unique y_1 and x_2, which implies by (10) that \mathrm{exc}(A) is nonsingular and that \mathrm{exc}(A)^{-1} = \mathrm{exc}(A^{-1}). ~\square

Now we will show that the exchange operator maps pseudo-orthogonal matrices to orthogonal matrices and vice versa.

Theorem 2. Let A\in\mathbb{R}^{n\times n}. If A is pseudo-orthogonal then \mathrm{exc}(A) is orthogonal. If A is orthogonal and A_{11} is nonsingular then \mathrm{exc}(A) is pseudo-orthogonal.

Proof. If A is pseudo-orthogonal then A_{11}^TA_{11}  = I + A_{21}^TA_{21}, which implies that A_{11} is nonsingular. Since \Sigma A^T\Sigma = A^{-1}, it follows that A^{-1} also has a nonsingular (1,1) block and so \mathrm{exc}(A^{-1}) exists. Furthermore, using Lemma 1, \mathrm{exc}(\Sigma A^T\Sigma) = \mathrm{exc}(A^{-1}) = \mathrm{exc}(A)^{-1}. But (9) shows that \mathrm{exc}(\Sigma A^T\Sigma) = \mathrm{exc}(A)^T, and we conclude that \mathrm{exc}(A) is orthogonal.

Assume now that A is orthogonal with A_{11} nonsingular. Then \mathrm{exc}(A^T) = \mathrm{exc}(A^{-1}) exists and Lemma 1 shows that \mathrm{exc}(A) is nonsingular and \mathrm{exc}(A)^{-1} = \mathrm{exc}(A^{-1}) = \mathrm{exc}(A^T). Hence, using (9),

I = \mathrm{exc}(A^T) \mathrm{exc}(A) =         \Sigma\mathrm{exc}(A)^T\Sigma \cdot \mathrm{exc}(A),

which shows that \mathrm{exc}(A) is pseudo-orthogonal. ~\square

This MATLAB example uses the exchange operator to convert an orthogonal matrix obtained from a Hadamard matrix into a pseudo-orthogonal matrix.

>> p = 2; n = 4;
>> A = hadamard(n)/sqrt(n), Sigma = blkdiag(eye(p),-eye(n-p))
A =
   5.0000e-01   5.0000e-01   5.0000e-01   5.0000e-01
   5.0000e-01  -5.0000e-01   5.0000e-01  -5.0000e-01
   5.0000e-01   5.0000e-01  -5.0000e-01  -5.0000e-01
   5.0000e-01  -5.0000e-01  -5.0000e-01   5.0000e-01
Sigma =
     1     0     0     0
     0     1     0     0
     0     0    -1     0
     0     0     0    -1
>> Q = exc(A,p), Q'*Sigma*Q
Q =
     1     1    -1     0
     1    -1     0    -1
     1     0    -1    -1
     0     1    -1     1
ans =
     1     0     0     0
     0     1     0     0
     0     0    -1     0
     0     0     0    -1

The code uses the function

function X = exc(A,p)
%EXC     Exchange operator.
%   EXC(A,p) is the result of applying the exchange operator to 
%   the square matrix A, which is regarded as a block 2-by-2 
%   matrix with leading block of dimension p.  
%   p defaults to floor(n)/2.

[m,n] = size(A);
if m ~= n, error('Matrix must be square.'), end
if nargin < 2, p = floor(n/2); end

A11 = A(1:p,1:p);
A12 = A(1:p,p+1:n);
A21 = A(p+1:n,1:p);
A22 = A(p+1:n,p+1:n);

X21 = A11\A12;
X = [inv(A11) -X21;
     A21/A11  A22-A21*X21];

Hyperbolic CS Decomposition

For an orthogonal matrix expressed in block 2\times 2 form there is a close relationship between the singular value decompositions (SVDs) of the blocks, as revealed by the CS decomposition (see What Is the CS Decomposition?). An analogous decomposition holds for a pseudo-orthogonal matrix. Let Q\in\mathbb{R}^{n \times n} be pseudo-orthogonal with respect to \Sigma in (5), and suppose that Q is partitioned as

\notag    Q =    \begin{array}[b]{@{\mskip33mu}c@{\mskip-16mu}c@{\mskip-10mu}c@{}}    \scriptstyle p &    \scriptstyle n-p &    \\    \multicolumn{2}{c}{        \left[\begin{array}{c@{~}c@{~}}                  Q_{11}& Q_{12} \\                  Q_{21}& Q_{22} \\              \end{array}\right]}    & \mskip-12mu\          \begin{array}{c}              \scriptstyle p \\              \scriptstyle n-p              \end{array}    \end{array}, \quad p \le \displaystyle\frac{n}{2}.

Then there exist orthogonal matrices U_1,V_1\in\mathbb{R}^{p \times p} and U_2,V_2\in\mathbb{R}^{q \times q} such that

\notag    \begin{bmatrix}  U_1^T & 0\\                          0   & U_2^T    \end{bmatrix}    \begin{bmatrix}  Q_{11} & Q_{12}\\                          Q_{21} & Q_{22}    \end{bmatrix}    \begin{bmatrix}  V_1 & 0\\                          0   & V_2    \end{bmatrix}    =    \begin{array}[b]{@{\mskip35mu}c@{\mskip30mu}c@{\mskip-10mu}c@{}c}    \scriptstyle p &    \scriptstyle p &    \scriptstyle n-2p &    \\    \multicolumn{3}{c}{    \left[\begin{array}{c@{~}|c@{~}c}    C &   -S      & 0   \\    \hline   -S &    C      & 0   \\    0 &    0      & I_{n-2p}    \end{array}\right]}    & \mskip-12mu    \begin{array}{c}    \scriptstyle p \\    \scriptstyle p \\    \scriptstyle n-2p    \end{array}    \end{array}, \qquad (11)

where C = \mathrm{diag}(c_i), S = \mathrm{diag}(s_i), and C^2 - S^2  = I, with c_i > s_i \ge 0 for all i. This is the hyperbolic CS decomposition, and it can be proved by applying the CS decomposition of an orthogonal matrix to \mathrm{exc}(Q).

The leading principal submatrix \left[\begin{smallmatrix}C & -S \\ -S & C \end{smallmatrix}\right] in (11) generalizes the 2\times 2 matrix (4), and in fact it can be permuted into a direct sum of such matrices.

Note that the matrix on the right in (11) is symmetric positive definite. Therefore the singular values of Q are the eigenvalues of that matrix, namely

\notag    c_1 \pm s_1, \dots,  c_p \pm s_p; \quad    1~\mathrm{with~multiplicity~}n - 2p.

Since c_i^2 - s_i^2 = 1 for all i, the first 2p singular values occur in reciprocal pairs, hence the largest and smallest singular values satisfy \sigma_1 = \sigma_n^{-1}\ge 1 (with strict inequality unless p = 0). This gives another proof of (3).

Numerical Stability

While an orthogonal matrix is perfectly conditioned, a pseudo-orthogonal matrix can be arbitrarily ill conditioned, as follows from (3). For example, the MATLAB function gallery('randjorth') produces a random pseudo-orthogonal matrix with a default condition number of sqrt(1/eps).

>> rng(1); A = gallery('randjorth',2,2) % p = 2, n = 4
A =
   2.9984e+03  -4.2059e+02   1.5672e+03  -2.5907e+03
   1.9341e+03  -2.6055e+03   3.1565e+03  -7.5210e+02
   3.1441e+03  -6.2852e+02   1.8157e+03  -2.6427e+03
   1.6870e+03  -2.5633e+03   3.0204e+03  -5.4157e+02
>> cond(A)
ans =
   6.7109e+07

This means that algorithms that use pseudo-orthogonal matrices are potentially numerically unstable. Therefore algorithms need to be carefully constructed and rounding error analysis must be done to ensure that an appropriate form of numerical stability is obtained.

Notes

Pseudo-orthogonal matrices form the automorphism group of the scalar product defined by \langle x,y\rangle = x^T\Sigma y for x,y\in\mathbb{R}^n. More results for pseudo-orthogonal matrices can be obtained as special cases of results for automorphism groups of general scalar products. See, for example, Mackey, Mackey, and Tisseur (2006).

For \Sigma \ne \pm I the set of pseudo-orthogonal matrices is known to have four connected components, a topological property that can be proved using the hyperbolic CS decomposition (Motlaghian, Armandnejad, and Hall, 2018).

One can define pseudo-unitary matrices in an analogous way, as Q\in\mathbb{C}^{n\times n} such that Q^*\Sigma Q = \Sigma. These correspond to the automorphism group of the scalar product \langle x,y\rangle = x^*\Sigma y for x,y\in\mathbb{C}^n. The results we have discussed generalize in a straightforward way to pseudo-unitary matrices.

The exchange operator is also known as the principal pivot transform and as the sweep operator in statistics. Tsatsomeros (2000) gives a survey of its properties

The hyperbolic CS decomposition was derived by Lee (1948) and, according to Lee, was present in work of Autonne (1912).

References

This is a minimal set of references, which contain further useful references within.

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

What Is an LU Factorization?

An LU factorization of an n\times n matrix A is a factorization A = LU, where L is unit lower triangular and U is upper triangular. “Unit” means that L has ones on the diagonal. Example:

\notag    \left[\begin{array}{rrrr}      3 & -1 & 1 & 1\\     -1 & 3 & 1 & -1\\     -1 & -1 & 3 & 1\\      1 & 1 & 1 & 3    \end{array}\right]   =    \left[\begin{array}{rrrr}     1 & 0 & 0 & 0\\    -\frac{1}{3} & 1 & 0 & 0\\    -\frac{1}{3} & -\frac{1}{2} & 1 & 0\\     \frac{1}{3} & \frac{1}{2} & 0 & 1    \end{array}\right]    \left[\begin{array}{rrrr}    3 & -1 & 1 & 1\\    0 & \frac{8}{3} & \frac{4}{3} & -\frac{2}{3}\\    0 & 0 & 4 & 1\\    0 & 0 & 0 & 3    \end{array}\right]. \qquad (1)

An LU factorization simplifies the solution of many problems associated with linear systems. In particular, solving a linear system Ax = b reduces to solving the triangular systems Ly = b and Ux = y, since then b = L(Ux).

For a given A, an LU factorization may or may not exist, and if it does exist it may not be unique. Conditions for existence and uniqueness are given in the following result (see Higham, 2002, Thm. 9.1 for a proof). Denote by A_k = A(1\colon k,1\colon k) the leading principal submatrix of A of order k.

Theorem 1. The matrix A\in\mathbb{R}^{n\times n} has a unique LU factorization if and only if A_k is nonsingular for k=1\colon n-1. If A_k is singular for some 1\le k \le n-1 then the factorization may exist, but if so it is not unique.

Note that the (non)singularity of A plays no role in Theorem 1. However, if A is nonsingular and has an LU factorization then the factorization is unique. Indeed if A has LU factorizations A = L_1U_1 = L_2U_2 then the U_i are necessarily nonsingular and so L_2^{-1}L_1 = U_2U_1^{-1}. The left side of this equation is unit lower triangular and the right side is upper triangular; therefore both sides must equal the identity matrix, which means that L_1 = L_2 and U_1 = U_2, as required.

Equating leading principal submatrices in A = LU gives A_k = L_k U_k, which implies that \det(A_k) = \det(U_k) = u_{11} u_{22} \dots u_{kk}. Hence u_{kk} = \det(A_k)/\det(A_{k-1}). In fact, such determinantal formulas hold for all the elements of L and U:

\notag    \begin{aligned}    \ell_{ij} &= \frac{ \det\bigl( A( [1:j-1, \, i], 1:j ) \bigr) }{ \det( A_j ) },             \quad i > j, \\    u_{ij} &= \frac{ \det\bigl( A( 1:i, [1:i-1, \, j] ) \bigr) }                   { \det( A_{i-1} ) },             \quad i \le j.    \end{aligned}

Here, A(u,v), where u and v are vectors of subscripts, denotes the submatrix formed from the intersection of the rows indexed by u and the columns indexed by v.

Relation with Gaussian Elimination

LU factorization is intimately connected with Gaussian elimination. Recall that Gaussian elimination transforms a matrix A^{(1)} = A\in\mathbb{R}^{n\times n} to upper triangular form U = A^{(n)} in n-1 stages. At the kth stage, multiples of row k are added to the later rows to eliminate the elements below the diagonal in column k, using the formulas

\notag     a_{ij}^{(k+1)} = a_{ij}^{(k)} - m_{ik} a_{kj}^{(k)}, \quad                        i = k+1 \colon n, \; j = k+1 \colon n,

where the quantities m_{ik} = a_{ik}^{(k)} / a_{kk}^{(k)} are the multipliers. Of course each a_{kk}^{(k)} must be nonzero for these formulas to be defined, and this is connected with the conditions of Theorem 1, since u_{kk} = a_{kk}^{(k)}. The final U is the upper triangular LU factor, with u_{ij} = a_{ij}^{(i)} for j\ge i, and \ell_{ij} = m_{ij} for i > j, that is, the multipliers make up the L factor (for a proof of these properties see any textbook on numerical linear algebra).

The matrix factorization viewpoint is well established as a powerful paradigm for thinking and computing. Separating the computation of LU factorization from its application is beneficial. For example, given A = LU we saw above how to solve Ax = b. If we need to solve for another right-hand side b_2 we can just solve Ly_2 = b_2 and Ux_2 = y_2, re-using the LU factorization. Similarly, solving A^Tz = c reduces to solving the triangular systems U^T w = c and L^Tz = w.

Computation

An LU factorization can be computed by directly solving for the components of L and U in the equation A = LU. Indeed because L has unit diagonal the first row of U is the same as the first row of A, and a_{k1} = \ell_{k1} u_{11} = \ell_{k1} a_{11} then determines the first column of L. One can go on to determine the kth row of U and the kth row of L, for k = 2\colon n. This leads to the Doolittle method, which involves inner products of partial rows of L and partial columns of U.

Given the equivalence between LU factorization and Gaussian elimination we can also employ the Gaussian elimination equations:

\notag \begin{array}{l} \%~kji~\mathrm{form~of~LU~factorization.}\\ \mbox{for}~k=1:n-1  \\ \qquad \mbox{for}~  j=k+1:n \\ \qquad \qquad \mbox{for}~  i=k+1:n \\ \qquad\qquad\qquad a_{ij}^{(k+1)} = a_{ij}^{(k)} - a_{ik}^{(k)}a_{kj}^{(k)} / a_{kk}^{(k)}\\ \qquad\qquad\mbox{end}\\ \qquad\mbox{end}\\ \mbox{end}\\ \end{array}

This kji ordering of the loops in the factorization is the basis of early Fortran implementations of LU factorization, such as that in LINPACK. The inner loop travels down the columns of A, accessing contiguous elements of A since Fortran stores arrays by column. Interchanging the two inner loops gives the kij ordering, which updates the matrix a row at a time, and is appropriate for a language such as C that stores arrays by row.

The ijk and jik orderings correspond to the Doolittle method. The last two of the 3! = 6 orderings are the ikj and jki orderings, to which we will return later.

Schur Complements

For A\in\mathbb{R}^{n\times n} with \alpha = a_{11} \ne 0 we can write

\notag   A =  \begin{bmatrix}         \alpha & b^T \\           c & D        \end{bmatrix}    =        \begin{bmatrix}         1 & 0 \\           c/\alpha & I_{n-1}        \end{bmatrix}       \begin{bmatrix}         \alpha  & b^T \\          0 & D - cb^T/\alpha        \end{bmatrix} = : L_1U_1. \qquad (2)

The (n-1)\times (n-1) matrix S = D - cb^T/\alpha is called the Schur complement of \alpha in A.

The first row and column of L_1 and U_1 have the correct forms for a unit lower triangular matrix and an upper triangular matrix, respectively. If we can find an LU factorization S = L_2U_2 then

\notag      A =        \begin{bmatrix}         1 & 0 \\           c/\alpha & L_2        \end{bmatrix}       \begin{bmatrix}         \alpha  & b^T \\          0 & U_2        \end{bmatrix}

is an LU factorization of A. Note that this is simply another way to express the kji algorithm above.

For several matrix structures it is immediate that \alpha \ne 0. If we can show that the Schur complement inherits the same structure then it follows by induction that we can compute the factorization for S, and so an LU factorization of A exists. Classes of matrix for which a_{11} \ne 0 and A being in the class implies the Schur complement S is also in the class include

  • symmetric positive definite matrices,
  • M-matrices,
  • matrices (block) diagonally dominant by rows or columns.

(The proofs of these properties are nontrivial.) Note that the matrix (1) is row diagonally dominant, as is its U factor, as must be the case since its rows are contained in Schur complements.

We say that A has upper bandwidth q if a_{ij} = 0 for j>i+q and lower bandwidth p if a_{ij} = 0 for i>j+p. Another use of (2) is to show that L and U inherit the bandwidths of A.

Theorem 2. Let A\in\mathbb{R}^{n\times n} have lower bandwidth p and upper bandwidth q. If A has an LU factorization then L has lower bandwidth p and U has upper bandwidth q.

Proof. In (2), the first column of L_1 and the first row of U_1 have the required structure and S has upper bandwidth q and lower bandwidth p, since c and b have only p and q nonzero components, respectively. The result follows by induction.

Block Implementations

In order to achieve high performance on modern computers with their hierarchical memories, LU factorization is implemented in a block form expressed in terms of matrix multiplication and the solution of multiple right-hand side triangular systems. We describe two block forms of LU factorization. First, consider a block form of (2) with block size p, where A_{11} is p \times p:

\notag   A =  \begin{bmatrix}          A_{11} & A_{12}\\          A_{21} & A_{22}        \end{bmatrix}    =        \begin{bmatrix}         L_{11} & 0 \\           L_{21}& I_{n-p}        \end{bmatrix}       \begin{bmatrix}         U_{11} & U_{12} \\          0 & S        \end{bmatrix}.

Here, S is the Schur complement of A_{11} in A, given by S = A_{22} - A_{21}A_{11}^{-1}A_{12}. This leads to the following algorithm:

  1. Factor A_{11} = L_{11}U_{11}.
  2. Solve L_{11}U_{12} = A_{12} for U_{12}.
  3. Solve L_{21}U_{11} = A_{21} for L_{21}.
  4. Form S = A_{22}-L_{21}U_{12}.
  5. Repeat steps 1–4 on S to obtain S = L_{22}U_{22}.

The factorization on step 1 can be done by any form of LU factorization. This algorithm is known as a right-looking algorithm, since it accesses data to the right of the block being worked on (in particular, at each stage lines 2 and 4 access the last few columns of the matrix).

An alternative algorithm can derived by considering a block 3\times 3 partitioning, in which we assume we have already computed the first block column of L and U:

\notag   A =  \begin{bmatrix}          A_{11} & A_{12} & A_{13}\\          A_{21} & A_{22} & A_{23}\\          A_{31} & A_{32} & A_{33}        \end{bmatrix}    =        \begin{bmatrix}         L_{11} & 0     & 0 \\         L_{21} & L_{22}& 0 \\         L_{31} & L_{32}  & I        \end{bmatrix}       \begin{bmatrix}         U_{11} & U_{12} & \times \\              0 & U_{22} & \times \\              0 &  0     & \times        \end{bmatrix}.

We now compute the middle block column of L and U, comprising p columns, by

  1. Solve L_{11}U_{12} = A_{12} for U_{12}.
  2. Factor A_{22}-L_{21}U_{12} = L_{22}U_{22}.
  3. Solve L_{32}U_{22} = A_{32} - L_{31}U_{12} for L_{32}.
  4. Repartition so that the first two block columns become a single block column and repeat steps 1–4.

This algorithm corresponds to the jki ordering. Note that the Schur complement is updated only a block column at a time. Because the algorithm accesses data only to the left of the block column being worked on, it is known as a left-looking algorithm.

Our description of these block algorithms emphasizes the mathematical ideas. The implementation details, especially for the left-looking algorithm, are not trivial. The optimal choice of block size p will depend on the machine, but p is typically in the range 64512.

An important point is that all these different forms of LU factorization, no matter which ijk ordering or which value of p, carry out the same operations. The only difference is the order in which the operations are performed (and the order in which the data is accessed). Even the rounding errors are the same for all versions (assuming the use of “plain vanilla” floating-point arithmetic).

Rectangular Matrices

Although it is most commonly used for square matrices, LU factorization is defined for rectangular matrices, too. If A\in\mathbb{R}^{m\times n} then the factorization has the form A = LU with L\in\mathbb{R}^{m\times m} lower triangular and U\in\mathbb{R}^{m\times n} upper trapezoidal. The conditions for existence and uniqueness of an LU factorization of A are the same as those for A(1\colon p, 1\colon p), where p = \min(m,n).

Block LU Factorization

Another form of LU factorization relaxes the structure of L and U from triangular to block triangular, with L having identity matrices on the diagonal:

\notag  L = \begin{bmatrix} I     &        &            &    \\                     L_{21} &  I     &            &    \\                     \vdots &        &  \ddots    &    \\                     L_{m1} &  \dots &  L_{m,m-1} &  I \end{bmatrix}, \quad   U = \begin{bmatrix} U_{11} & U_{12} & \dots      &  U_{1m}   \\                            & U_{22} &            &  \vdots   \\                            &        &  \ddots    & U_{m-1,m} \\                            &        &            & U_{mm}    \end{bmatrix}.

Note that U is not, in general, upper triangular.

An example of a block LU factorization is

\notag     A =      \left[ \begin{array}{rr|rr}      0  &  1  &  1  &  1  \\     -1  &  1  &  1  &  1  \\\hline     -2  &  3  &  4  &  2  \\     -1  &  2  &  1  &  3  \\             \end{array}      \right]      =      \left[ \begin{array}{cc|cc}      1  &  0  &  0  &  0  \\      0  &  1  &  0  &  0  \\\hline      1  &  2  &  1  &  0  \\      1  &  1  &  0  &  1  \\             \end{array}      \right]      \left[ \begin{array}{rr|rr}      0  &  1  &  1  &  1  \\     -1  &  1  &  1  &  1  \\\hline      0  &  0  &  1  & -1  \\      0  &  0  & -1  &  1  \\             \end{array}      \right].

LU factorization fails on A because of the zero (1,1) pivot. This block LU factorization corresponds to using the leading 2\times 2 principal submatrix of A to eliminate the elements in the (3\colon 4,1\colon 2) submatrix. In the context of a linear system Ax=b, we have effectively solved for the variables x_1 and x_2 in terms of x_3 and x_4 and then substituted for x_1 and x_2 in the last two equations.

Conditions for the existence of a block LU factorization are analogous to, but less stringent than, those for LU factorization in Theorem 1.

Theorem 3. The matrix A\in\mathbb{R}^{n\times n} has a unique block LU factorization if and only if the first m-1 leading principal block submatrices of A are nonsingular.

The conditions in Theorem 3 can be shown to be satisfied if A is block diagonally dominant by rows or columns.

Note that to solve a linear system Ax = b using a block LU factorization we need to solve Ly = b and Ux = y, but the latter system is not triangular and requires the solution of systems U_{ii}x_i = y_i involving the diagonal blocks of U, which would normally be done by (standard) LU factorization.

Sensitivity

If A has a unique LU factorization then for a small enough perturbation \Delta A an LU factorization A + \Delta A = (L + \Delta L)(U + \Delta U) exists. To first order, this equation is \Delta A = \Delta L U + L \Delta U, which gives

\notag   L^{-1}\Delta A \mskip2mu U^{-1} = L^{-1}\Delta L  + \Delta U \mskip2mu  U^{-1}.

Since \Delta L is strictly lower triangular and \Delta U is upper triangular, we have, to first order,

\notag         \Delta L = L \mskip 1mu \mathrm{tril}( L^{-1}\Delta A U^{-1} ), \quad         \Delta U = \mathrm{triu}( L^{-1}\Delta A U^{-1} )U,

where \mathrm{tril} denotes the strictly lower triangular part and \mathrm{triu} the strictly upper triangular part. Clearly, the sensitivity of the LU factors depends on the inverses of L and U. However, in most situations, such as when we are solving a linear system Ax = b, it is the backward stability of the LU factors, not their sensitivity, that is relevant.

Pivoting and Numerical Stability

Since not all matrices have an LU factorization, we need the option of applying row and column interchanges to ensure that the pivots are nonzero unless the column in question is already in triangular form.

In finite precision computation it is important that computed LU factors \widehat L and \widehat U are numerically stable in the sense that \widehat L \widehat U = A + \Delta A with \|\Delta A\|\le c_n u \|A\|, where c_n is a constant and u is the unit roundoff. For certain matrix properties, such as diagonal dominance by rows or columns, numerical stability is guaranteed, but in general it is necessary to incorporate row interchanges, or row or column interchanges, in order to obtain a stable factorization.

See What Is the Growth Factor for Gaussian Elimination? for details of pivoting strategies and see Randsvd Matrices with Large Growth Factors for some recent research on growth factors.

References

This is a minimal set of references, which contain further useful references within.

Related Blog Posts

What’s New in MATLAB R2021a?

In this post I discuss some of the new features in MATLAB R2021a. As usual in this series, I focus on a few of the features most relevant to my interests. See the release notes for a detailed list of the many changes in MATLAB and its toolboxes.

Name=Value Syntax

In function calls that accept “name, value” pairs, separated by a comma, the values can now be specified with an equals sign. Example:

x = linspace(0,2*pi,100); y = tan(x);

% Existing syntax
plot(x,y,'Color','red','LineWidth',2)
plot(x,y,"Color","red","LineWidth",2)

% New syntax
plot(x,y,Color = "red",LineWidth = 2)
lw = 2; plot(x,y,Color = "red",LineWidth = lw) 

Note that the string can be given as a character vector in single quotes or as a string array in double quotes (string arrays were introduced in R2016b).

There are some limitations, including that all name=value arguments must appear after any comma separated pairs and after any positional arguments (arguments that must be passed to a function in a specific order).

Eigensystem of Skew-Symmetric Matrix

For skew-symmetric and skew-Hermitian matrices, the eig function now guarantees that the matrix of eigenvectors is unitary (to machine precision) and that the computed eigenvalues are pure imaginary. The code

rng(2); n = 5; A = gallery('randsvd',n,-1e3,2); A = 1i*A; 
[V,D] = eig(A); 
unitary_test = norm(V'*V-eye(n),1)
norm_real_part = norm(real(D),1)

produces

% R2020b
unitary_test =
   9.6705e-01
norm_real_part =
   8.3267e-17

% R2021a
unitary_test =
   1.9498e-15
norm_real_part =
     0

For this matrix MATLAB R2020b produces an eigenvector matrix that is far from being unitary and eigenvalues with a nonzero (but tiny) real part, whereas MATLAB R2021a produces real eigenvalues and eigenvectors that are unitary to machine precision.

Performance Improvements

Among the reported performance improvements are faster matrix multiplication for large sparse matrices (based on the use of the GraphBLAS: see here and here) and faster solution of multiple right-hand systems with a sparse coefficient matrix, both resulting from added support for multithreading.

Symbolic Math Toolbox

An interesting addition to the Symbolic Math Toolbox is the symmatrix class, which represents a symbolic matrix. An example of usage is

>> A = symmatrix('A',[2 2]); B = symmatrix('B',[2 2]); whos A B
  Name      Size            Bytes  Class        Attributes

  A         2x2                 8  symmatrix              
  B         2x2                 8  symmatrix              

>> X = A*B, Y = symmatrix2sym(X), whos X Y
X =
A*B
Y =
[A1_1*B1_1 + A1_2*B2_1, A1_1*B1_2 + A1_2*B2_2]
[A2_1*B1_1 + A2_2*B2_1, A2_1*B1_2 + A2_2*B2_2]
  Name      Size            Bytes  Class        Attributes

  X         2x2                 8  symmatrix              
  Y         2x2                 8  sym    

The range of functions that can be applied to a symmatrix is as follows:

>> methods symmatrix

Methods for class symmatrix:

adjoint         horzcat         mldivide        symmatrix       
cat             isempty         mpower          symmatrix2sym   
conj            isequal         mrdivide        tan             
cos             isequaln        mtimes          times           
ctranspose      kron            norm            trace           
det             latex           plus            transpose       
diff            ldivide         power           uminus          
disp            length          pretty          uplus           
display         log             rdivide         vertcat         
eq              matlabFunction  sin             
exp             minus           size            

Static methods:

empty         

In order to invert A*B in this example, or find its eigenvalues, use inv(Y) or eig(Y).

Fifty “What Is” Articles

elisa-rYrawNU0wH0-unsplash_square.jpg
Photo by Elisa on Unsplash

Last week I posted the fiftieth in my “What Is” series of articles. I began the series just over a year ago, in March 2020. The original aim was to provide “brief descriptions of important concepts in numerical analysis and related areas, with a focus on topics that arise in my research”, and the articles were meant to be short, widely accessible, and contain a minimum of mathematical symbols, equations, and citations. I have largely kept to these aims, though for some topics there is a lot to say and I have been more lengthy.

The articles are also available in PDF form on GitHub.

Below is a list of all the “What Is” articles published at the time of writing, in alphabetical order.

If there is a topic you would like me to cover, please put it in the comments below.

  1. What Is a Block Matrix?
  2. What Is a Cholesky Factorization?
  3. What Is a Companion Matrix?
  4. What Is a Condition Number?
  5. What Is a Correlation Matrix?
  6. What is a Diagonally Dominant Matrix?
  7. What Is a Fractional Matrix Power?
  8. What Is a Fréchet Derivative?
  9. What Is a Generalized Inverse?
  10. What Is a Hadamard Matrix?
  11. What Is a Householder Matrix?
  12. What Is a Matrix Function?
  13. What Is a Matrix Square Root?
  14. What Is a Matrix?
  15. What Is a Modified Cholesky Factorization?
  16. What Is a (Non)normal Matrix?
  17. What Is a QR Factorization?
  18. What Is a Random Orthogonal Matrix?
  19. What is a Sparse Matrix?
  20. What Is a Symmetric Positive Definite Matrix?
  21. What Is a Unitarily Invariant Norm?
  22. What Is an M-Matrix?
  23. What Is an Orthogonal Matrix?
  24. What Is Backward Error?
  25. What Is Bfloat16 Arithmetic?
  26. What Is Floating-Point Arithmetic?
  27. What Is IEEE Standard Arithmetic?
  28. What is Numerical Stability?
  29. What Is Rounding?
  30. What Is Stochastic Rounding?
  31. What Is the Adjugate of a Matrix?
  32. What is the Cayley–Hamilton Theorem?
  33. What Is the Complex Step Approximation?
  34. What Is the CS Decomposition?
  35. What Is the Gerstenhaber Problem?
  36. What Is the Growth Factor for Gaussian Elimination?
  37. What Is the Hilbert Matrix?
  38. What is the Kronecker Product?
  39. What Is the Log-Sum-Exp Function?
  40. What Is the Matrix Exponential?
  41. What Is the Matrix Logarithm?
  42. What Is the Matrix Sign Function?
  43. What Is the Matrix Unwinding Function?
  44. What Is the Nearest Positive Semidefinite Matrix?
  45. What Is the Nearest Symmetric Matrix?
  46. What is the Polar Decomposition?
  47. What Is the Sherman–Morrison–Woodbury Formula?
  48. What Is the Singular Value Decomposition?
  49. What Is the Softmax Function?
  50. What Is the Sylvester Equation?

What is a Diagonally Dominant Matrix?

Matrices arising in applications often have diagonal elements that are large relative to the off-diagonal elements. In the context of a linear system this corresponds to relatively weak interactions between the different unknowns. We might expect a matrix with a large diagonal to be assured of certain properties, such as nonsingularity. However, to ensure nonsingularity it is not enough for each diagonal element to be the largest in its row. For example, the matrix

\notag  \left[\begin{array}{rrr}     3 & -1 & -2\\    -2 &  3 & -1\\    -2 & -1 &  3   \end{array}\right] \qquad (1)

is singular because [1~1~1]^T is a null vector. A useful definition of a matrix with large diagonal requires a stronger property.

A matrix A\in\mathbb{C}^{n\times n} is diagonally dominant by rows if

\notag        |a_{ii}| \ge \displaystyle\sum_{j\ne i} |a_{ij}|, \quad i=1\colon n. \qquad (2)

It is strictly diagonally dominant by rows if strict inequality holds in (2) for all i. A is (strictly) diagonally dominant by columns if A^T is (strictly) diagonally dominant by rows.

Diagonal dominance on its own is not enough to ensure nonsingularity, as the matrix (1) shows. Strict diagonal dominance does imply nonsingularity, however.

Theorem 1.

If A\in\mathbb{C}^{n\times n} is strictly diagonally dominant by rows or columns then it is nonsingular.

Proof. Since A is nonsingular if and only if A^T is nonsingular, it suffices to consider diagonal dominance by rows. For any nonzero x let y = Ax and choose k so that |x_k| = \|x\|_{\infty}. Then the kth equation of y = Ax can be written

\notag    a_{kk}x_k = y_k - \displaystyle\sum_{j\ne k} a_{kj}x_j,

which gives

\notag    |a_{kk}|\|x\|_{\infty} = |a_{kk}||x_k|     \le |y_k| + \displaystyle\sum_{j\ne k} |a_{kj}||x_j|     \le |y_k| + \|x\|_\infty \displaystyle\sum_{j\ne k} |a_{kj}|.

Using (2), we have

\notag    |y_k| \ge \|x\|_{\infty} \Bigl(|a_{kk}| - \displaystyle\sum_{j\ne k} |a_{kj}|\Bigr) > 0.    \qquad (3)

Therefore y\ne0 and so A is nonsingular. ~\square

Diagonal dominance plus two further conditions is enough to ensure nonsingularity. We need the notion of irreducibility. A matrix A\in\mathbb{R}^{n\times n} is irreducible if there does not exist a permutation matrix P such that

\notag       P^TAP = \begin{bmatrix} A_{11} & A_{12} \\                                  0   & A_{22} \end{bmatrix}

with A_{11} and A_{22} square matrices. Irreducibility is equivalent to the directed graph of A being strongly connected.

Theorem 2.

If A\in\mathbb{C}^{n\times n} is irreducible and diagonally dominant by rows with strict inequality in (2) for some i then it is nonsingular.

Proof. The proof is by contradiction. Suppose there exists x\ne 0 such that Ax = 0. Define

\notag   G = \{\, j: |x_j| = \|x\|_{\infty} \,\},   \quad   H = \{\, j: |x_j| < \|x\|_{\infty} \,\}.

The ith equation of Ax = 0 can be written

\notag       a_{ii}x_i = - \displaystyle\sum_{j\ne i} a_{ij}x_j                 = - \displaystyle\sum_{j\in G \atop j\ne i } a_{ij}x_j                   - \displaystyle\sum_{j\in H \atop j\ne i } a_{ij}x_j. \qquad (4)

Hence for i = r\in G,

\notag |a_{rr}| \le \displaystyle\sum_{j\in G \atop j\ne r } |a_{rj}|       + \displaystyle\sum_{j\in H \atop j\ne r } |a_{rj}|\frac{|x_j|}{\|x\|_\infty}.

The set H is nonempty, because if it were empty then we would have |x_j| = \|x\|_\infty for all j and if there is strict inequality in (2) for i = m, then putting i = m in (4) would give |a_{mm}| \le \sum_{j\ne m} |a_{mj}| |x_j|/|x_m|             =  \sum_{j\ne m} |a_{mj}|, which is a contradiction. Hence as long as a_{rj}\ne0 for some j\in H, we obtain |a_{rr}| <  \sum_{j\ne r } |a_{rj}|, which contradicts the diagonal dominance. Therefore we must have a_{rj}= 0 for all j\in H and all r\in G. This means that all the rows indexed by G have zeros in the columns indexed by H, which means that A is reducible. This is a contradiction, so A must be nonsingular. ~\square

The obvious analogue of Theorem 2 holds for column diagonal dominance.

As an example, the n\times n symmetric tridiagonal matrix (minus the second difference matrix)

\notag  T_n = \left[\begin{array}{@{\mskip 5mu}c*{4}{@{\mskip 15mu} r}@{\mskip 5mu}}      2 &   -1  &          &         & \\     -1 &    2  &  -1      &         & \\        &    -1 &   2      &  \ddots & \\        &       &  \ddots  &  \ddots & -1\\        &       &          &  -1     & 2            \end{array}\right], \qquad (5)

is row diagonally dominant with strict inequality in the first and last diagonal dominance relations. It can also be shown to be irreducible and so it is nonsingular by Theorem 2. If we replace t_{11} or t_{nn} by 1, then T remains nonsingular by the same argument. What if we replace both t_{11} and t_{nn} by 1? We can answer this question by using an observation of Strang. If we define the rectangular matrix

\notag  L_n = \begin{bmatrix}                1  &      &        &  \\                -1 &  1   &        &  \\                   & -1   & \ddots &  \\                   &      & \ddots & 1 \\                   &      &        &  -1     \end{bmatrix} \in\mathbb{R}^{(n+1)\times n}

then T_n = L_n^T L_n and

\notag \widetilde{T}_{n+1}  = \begin{bmatrix}                        1 &-1      &        &      & \\                       -1 & 2      & \ddots &      & \\                          & \ddots & \ddots &  -1  & \\                          &        &   -1    &  2  & -1\\                          &        &         &  -1 & 1                    \end{bmatrix}                          = L_n L_n^T \in \mathbb{R}^{(n+1) \times (n+1)}.

Since in general AB and BA have the same nonzero eigenvalues, we conclude that \Lambda(\widetilde{T}_{n+1})  = \Lambda(T_n) \cup \{0\}, where \Lambda(\cdot) denotes the spectrum. Hence T_n is symmetric positive definite and \widetilde{T}_n is singular and symmetric positive semidefinite.

Relation to Gershgorin’s Theorem

Theorem 1 can be used to obtain information about the location of the eigenvalues of a matrix. Indeed if \lambda is an eigenvalue of A then A - \lambda I is singular and hence cannot be strictly diagonally dominant, by Theorem 1. So |a_{ii}-\lambda| > \sum_{j\ne i} |a_{ij}| cannot be true for all i. Gershgorin’s theorem is simply a restatement of this fact.

Theorem 3 (Gershgorin’s theorem).

The eigenvalues of A\in\mathbb{C}^{n\times n} lie in the union of the n discs in the complex plane

\notag      D_i = \Big\{ z\in\mathbb{C}: |z-a_{ii}| \le \displaystyle\sum_{j\ne i}      |a_{ij}|\Big\}, \quad i=1\colon n.

If A is symmetric with positive diagonal elements and satisfies the conditions of Theorem 1 or Theorem 2 then it is positive definite. Indeed the eigenvalues are real and so in Gershgorin’s theorem the discs are intervals and a_{ii} - z \le |z-a_{ii}| \le \sum_{j\ne i}^n |a_{ij}|, so z \ge |a_{ii}| - \sum_{j\ne i}^n |a_{ij}| \ge 0, so the eigenvalues are nonnegative, and hence positive since nonzero. This provides another proof that the matrix T_n in (5) is positive definite.

Generalized Diagonal Dominance

In some situations A is not diagonally dominant but a row or column scaling of it is. For example, the matrix

\notag   A = \begin{bmatrix}         1   & 1   & 0 \\         2/3 & 2   & 1/4 \\         2/3 & 1/2 & 1       \end{bmatrix}

is not diagonally dominant by rows or columns but

\notag   A \, \mathrm{diag}(3,2,4)    = \begin{bmatrix}         3   & 2   & 0 \\         2   & 4   & 1   \\         2   & 1   & 4       \end{bmatrix}

is strictly diagonally dominant by rows.

A matrix A\in\mathbb{C}^{n\times n} is generalized diagonally dominant by rows if AD is diagonally dominant by rows for some diagonal matrix D = \mathrm{diag}(d_i) with d_i > 0 for all i, that is, if

\notag      |a_{ii}|d_i \ge \displaystyle\sum_{j\ne i} |a_{ij}|d_j, \quad i=1\colon n. \qquad (6)

It is easy to see that if A is irreducible and there is strictly inequality in (6) for some i then A is nonsingular by Theorem 2.

It can be shown that A is generalized diagonally dominant by rows if and only if it is an H-matrix, where an H-matrix is a matrix for which the comparison matrix M(A), defined by

\notag   M(A) = (m_{ij}), \quad m_{ij} =    \begin{cases} |a_{ii}|, & i=j, \\                 -|a_{ij}|, & i\ne j,     \end{cases}

is an M-matrix (see What Is an M-Matrix?).

Block Diagonal Dominance

A matrix A\in\mathbb{C}^{n\times n} is block diagonally dominant by rows if, for a given norm and block m\times m partitioning A = (A_{ij}), the diagonal blocks A_{jj} are all nonsingular and

\notag     \displaystyle\sum_{j\ne i} \|A_{ij}\| \le  \|A_{ii}^{-1}\|^{-1}, \quad i = 1\colon m.   \label{bdd}

A is block diagonally dominant by columns if A^T is block diagonally dominant by rows. If the blocks are all 1\times 1 then block diagonal dominance reduces to the usual notion of diagonal dominance. Block diagonal dominance holds for certain block tridiagonal matrices arising in the discretization of PDEs.

Analogues of Theorems 1 and 2 giving conditions under which block diagonal dominance implies nonsingularity are given by Feingold and Varga (1962).

Bounding the Inverse

If a matrix is strictly diagonally dominant then we can bound its inverse in terms of the minimum amount of diagonal dominance. For full generality, we state the bound in terms of generalized diagonal dominance.

Theorem 4.

If A\in\mathbb{C}^{n\times n} and AD is strictly diagonally dominant by rows for a diagonal matrix D = \mathrm{diag}(d_i) with d_i > 0 for all i, then

\notag   \|A^{-1}\|_\infty \le \displaystyle\frac{\|D\|_{\infty}}{\alpha},

where \alpha = \min_i (|a_{ii}|d_i - \sum_{j\ne i} |a_{ij}|d_j).

Proof. Assume first that D = I. Let y satisfy \|A^{-1}\|_{\infty} = \|A^{-1}y\|_{\infty} / \|y\|_{\infty} and let x = A^{-1}y. Applying (3) gives \|A^{-1}\|_{\infty} = \|x\|_{\infty} / \|y\|_{\infty} \le \alpha^{-1}. The result is obtained on applying this bound to AD and using \|A^{-1}\|_{\infty} \le \|D\|_{\infty} \|(AD)^{-1}\|_{\infty}. ~\square.

Another bound for A^{-1} when A is strictly diagonally dominant by rows can be obtained by writing A = D(I - E), where D = \mathrm{diag}(a_{ii}), e_{ii} = 0, and e_{ij} = -a_{ij}/a_{ii} for i\ne j. It is easy to see that \|E\|_\infty < 1, which gives another proof that A is nonsingular. Then

\notag  \begin{aligned}   |A^{-1}| &= |(I-E)^{-1}D^{-1}|            = |I + E + E^2 + \cdots | |D^{-1}|\\            &\le (I + |E| + |E|^2 + \cdots ) |D|^{-1}\\             &= (I - |E|)^{-1} |D|^{-1}\\             &= M(A)^{-1}.  \end{aligned}

This bound implies that M(A)^{-1} \ge 0, so in view of its sign pattern M(A) is an M-matrix, which essentially proves one direction of the H-matrix equivalence in the previous section. The same bound holds if A is diagonally dominant by columns, by writing A = (I-E)D.

An upper bound also holds for block diagonal dominance.

Theorem 5.

If A\in\mathbb{C}^{n\times n} is block diagonally dominant by rows then

\notag   \|A^{-1}\|_\infty \le \displaystyle\frac{1}{\alpha}.

where \alpha = \min_i ( \|A_{ii}^{-1}\|^{-1} - \sum_{j\ne i} \|A_{ij}\| ).

It is interesting to note that the inverse of a strictly row diagonally dominant matrix enjoys a form of diagonal dominance, namely that the largest element in each column is on the diagonal.

Theorem 6.

If A\in\mathbb{C}^{n\times n} is strictly diagonally dominant by rows then B = A^{-1} satisfies |b_{ij}| < |b_{jj}| for all i\ne j.

Proof. For i\ne j we have \sum_{k=1}^n a_{ik}b_{kj} = 0. Let \beta_j = \max_k |b_{kj}|. Taking absolute values in a_{ii}b_{ij} = -\sum_{k\ne i}a_{ik}b_{kj} gives

\notag  |a_{ii}||b_{ij}| \le \beta_j \sum_{k\ne i} |a_{ik}| < \beta_j |a_{ii}|,

or |b_{ij}| < \beta_j, since a_{ii} \ne 0. This inequality holds for all i\ne j, so we must have \beta_j = |b_{jj}|, which gives the result.

Historical Remarks

Theorems 1 and 2 have a long history and have been rediscovered many times. Theorem 1 was first stated by Lévy (1881) with additional assumptions. In a short but influential paper, Taussky (1949) pointed out the recurring nature of the theorems and gave simple proofs (our proof of Theorem 2 is Taussky’s). Schneider (1977) attributes the surge in interest in matrix theory in the 1950s and 1960s to Taussky’s paper and a few others by her, Brauer, Ostrowski, and Wielandt. The history of Gershgorin’s theorem (published in 1931) is intertwined with that of Theorems 1 and 2; see Varga’s 2004 book for details.

Theorems 4 and 5 are from Varah (1975) and Theorem 6 is from Ostrowski (1952).

References

This is a minimal set of references, which contain further useful references within.

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.