An LU factorization simplifies the solution of many problems associated with linear systems. In particular, solving a linear system reduces to solving the triangular systems and , since then .

For a given , an LU factorization may or may not exist, and if it does exist it may not be unique. Conditions for existence and uniqueness are given in the following result (see Higham, 2002, Thm. 9.1 for a proof). Denote by the leading principal submatrix of of order .

Theorem 1.The matrix has a unique LU factorization if and only if is nonsingular for . If is singular for some then the factorization may exist, but if so it is not unique.

Note that the (non)singularity of plays no role in Theorem 1. However, if is nonsingular and has an LU factorization then the factorization is unique. Indeed if has LU factorizations then the are necessarily nonsingular and so . The left side of this equation is unit lower triangular and the right side is upper triangular; therefore both sides must equal the identity matrix, which means that and , as required.

Equating leading principal submatrices in gives , which implies that . Hence . In fact, such determinantal formulas hold for all the elements of and :

Here, , where and are vectors of subscripts, denotes the submatrix formed from the intersection of the rows indexed by and the columns indexed by .

LU factorization is intimately connected with Gaussian elimination. Recall that Gaussian elimination transforms a matrix to upper triangular form in stages. At the th stage, multiples of row are added to the later rows to eliminate the elements below the diagonal in column , using the formulas

where the quantities are the multipliers. Of course each must be nonzero for these formulas to be defined, and this is connected with the conditions of Theorem 1, since . The final is the upper triangular LU factor, with for , and for , that is, the multipliers make up the factor (for a proof of these properties see any textbook on numerical linear algebra).

The matrix factorization viewpoint is well established as a powerful paradigm for thinking and computing. Separating the computation of LU factorization from its application is beneficial. For example, given we saw above how to solve . If we need to solve for another right-hand side we can just solve and , re-using the LU factorization. Similarly, solving reduces to solving the triangular systems and .

An LU factorization can be computed by directly solving for the components of and in the equation . Indeed because has unit diagonal the first row of is the same as the first row of , and then determines the first column of . One can go on to determine the th row of and the th row of , for . This leads to the Doolittle method, which involves inner products of partial rows of and partial columns of .

Given the equivalence between LU factorization and Gaussian elimination we can also employ the Gaussian elimination equations:

This ordering of the loops in the factorization is the basis of early Fortran implementations of LU factorization, such as that in LINPACK. The inner loop travels down the columns of , accessing contiguous elements of since Fortran stores arrays by column. Interchanging the two inner loops gives the ordering, which updates the matrix a row at a time, and is appropriate for a language such as C that stores arrays by row.

The and orderings correspond to the Doolittle method. The last two of the orderings are the and orderings, to which we will return later.

For with we can write

The matrix is called the *Schur complement* of in .

The first row and column of and have the correct forms for a unit lower triangular matrix and an upper triangular matrix, respectively. If we can find an LU factorization then

is an LU factorization of . Note that this is simply another way to express the algorithm above.

For several matrix structures it is immediate that . If we can show that the Schur complement inherits the same structure then it follows by induction that we can compute the factorization for , and so an LU factorization of exists. Classes of matrix for which and being in the class implies the Schur complement is also in the class include

- symmetric positive definite matrices,
- -matrices,
- matrices (block) diagonally dominant by rows or columns.

(The proofs of these properties are nontrivial.) Note that the matrix (1) is row diagonally dominant, as is its factor, as must be the case since its rows are contained in Schur complements.

We say that has *upper bandwidth* if for and *lower bandwidth* if for . Another use of (2) is to show that and inherit the bandwidths of .

Theorem 2.Let have lower bandwidth and upper bandwidth . If has an LU factorization then has lower bandwidth and has upper bandwidth .

Proof.In (2), the first column of and the first row of have the required structure and has upper bandwidth and lower bandwidth , since and have only and nonzero components, respectively. The result follows by induction.

In order to achieve high performance on modern computers with their hierarchical memories, LU factorization is implemented in a block form expressed in terms of matrix multiplication and the solution of multiple right-hand side triangular systems. We describe two block forms of LU factorization. First, consider a block form of (2) with block size , where is :

Here, is the Schur complement of in , given by . This leads to the following algorithm:

- Factor .
- Solve for .
- Solve for .
- Form .
- Repeat steps 1–4 on to obtain .

The factorization on step 1 can be done by any form of LU factorization. This algorithm is known as a *right-looking* algorithm, since it accesses data to the right of the block being worked on (in particular, at each stage lines 2 and 4 access the last few columns of the matrix).

An alternative algorithm can derived by considering a block partitioning, in which we assume we have already computed the first block column of and :

We now compute the middle block column of and , comprising columns, by

- Solve for .
- Factor .
- Solve for .
- Repartition so that the first two block columns become a single block column and repeat steps 1–4.

This algorithm corresponds to the ordering. Note that the Schur complement is updated only a block column at a time. Because the algorithm accesses data only to the left of the block column being worked on, it is known as a *left-looking* algorithm.

Our description of these block algorithms emphasizes the mathematical ideas. The implementation details, especially for the left-looking algorithm, are not trivial. The optimal choice of block size will depend on the machine, but is typically in the range —.

An important point is that all these different forms of LU factorization, no matter which ordering or which value of , carry out the same operations. The only difference is the order in which the operations are performed (and the order in which the data is accessed). Even the rounding errors are the same for all versions (assuming the use of “plain vanilla” floating-point arithmetic).

Although it is most commonly used for square matrices, LU factorization is defined for rectangular matrices, too. If then the factorization has the form with lower triangular and upper trapezoidal. The conditions for existence and uniqueness of an LU factorization of are the same as those for , where .

Another form of LU factorization relaxes the structure of and from triangular to block triangular, with having identity matrices on the diagonal:

Note that is not, in general, upper triangular.

An example of a block LU factorization is

LU factorization fails on because of the zero pivot. This block LU factorization corresponds to using the leading principal submatrix of to eliminate the elements in the submatrix. In the context of a linear system , we have effectively solved for the variables and in terms of and and then substituted for and in the last two equations.

Conditions for the existence of a block LU factorization are analogous to, but less stringent than, those for LU factorization in Theorem 1.

Theorem 3.The matrix has a unique block LU factorization if and only if the first leading principal block submatrices of are nonsingular.

The conditions in Theorem 3 can be shown to be satisfied if is block diagonally dominant by rows or columns.

Note that to solve a linear system using a block LU factorization we need to solve and , but the latter system is not triangular and requires the solution of systems involving the diagonal blocks of , which would normally be done by (standard) LU factorization.

If has a unique LU factorization then for a small enough perturbation an LU factorization exists. To first order, this equation is , which gives

Since is strictly lower triangular and is upper triangular, we have, to first order,

where denotes the strictly lower triangular part and the strictly upper triangular part. Clearly, the sensitivity of the LU factors depends on the inverses of and . However, in most situations, such as when we are solving a linear system , it is the backward stability of the LU factors, not their sensitivity, that is relevant.

Since not all matrices have an LU factorization, we need the option of applying row and column interchanges to ensure that the pivots are nonzero unless the column in question is already in triangular form.

In finite precision computation it is important that computed LU factors and are numerically stable in the sense that with , where is a constant and is the unit roundoff. For certain matrix properties, such as diagonal dominance by rows or columns, numerical stability is guaranteed, but in general it is necessary to incorporate row interchanges, or row or column interchanges, in order to obtain a stable factorization.

See What Is the Growth Factor for Gaussian Elimination? for details of pivoting strategies and see Randsvd Matrices with Large Growth Factors for some recent research on growth factors.

This is a minimal set of references, which contain further useful references within.

- Jack J. Dongarra, Iain S. Duff, Danny C. Sorensen, and Henk A. Van der Vorst, Numerical Linear Algebra for High-Performance Computers, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 1998. (For different implementations of LU factorization.)
- Nicholas J. Higham, Accuracy and Stability of Numerical Algorithms, second edition, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2002.

- Randsvd Matrices with Large Growth Factors (2020)
- What Is a Block Matrix? (2020)
- What is a Diagonally Dominant Matrix? (2021)
- What Is the Growth Factor for Gaussian Elimination? (2020)

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

In function calls that accept “name, value” pairs, separated by a comma, the values can now be specified with an equals sign. Example:

x = linspace(0,2*pi,100); y = tan(x); % Existing syntax plot(x,y,'Color','red','LineWidth',2) plot(x,y,"Color","red","LineWidth",2) % New syntax plot(x,y,Color = "red",LineWidth = 2) lw = 2; plot(x,y,Color = "red",LineWidth = lw)

Note that the string can be given as a character vector in single quotes or as a string array in double quotes (string arrays were introduced in R2016b).

There are some limitations, including that all name=value arguments must appear after any comma separated pairs and after any positional arguments (arguments that must be passed to a function in a specific order).

For skew-symmetric and skew-Hermitian matrices, the `eig`

function now guarantees that the matrix of eigenvectors is unitary (to machine precision) and that the computed eigenvalues are pure imaginary. The code

rng(2); n = 5; A = gallery('randsvd',n,-1e3,2); A = 1i*A; [V,D] = eig(A); unitary_test = norm(V'*V-eye(n),1) norm_real_part = norm(real(D),1)

produces

% R2020b unitary_test = 9.6705e-01 norm_real_part = 8.3267e-17 % R2021a unitary_test = 1.9498e-15 norm_real_part = 0

For this matrix MATLAB R2020b produces an eigenvector matrix that is far from being unitary and eigenvalues with a nonzero (but tiny) real part, whereas MATLAB R2021a produces real eigenvalues and eigenvectors that are unitary to machine precision.

Among the reported performance improvements are faster matrix multiplication for large sparse matrices and faster solution of multiple right-hand systems with a sparse coefficient matrix, both resulting from added support for multithreading.

An interesting addition to the Symbolic Math Toolbox is the `symmatrix`

class, which represents a symbolic matrix. An example of usage is

>> A = symmatrix('A',[2 2]); B = symmatrix('B',[2 2]); whos A B Name Size Bytes Class Attributes A 2x2 8 symmatrix B 2x2 8 symmatrix >> X = A*B, Y = symmatrix2sym(X), whos X Y X = A*B Y = [A1_1*B1_1 + A1_2*B2_1, A1_1*B1_2 + A1_2*B2_2] [A2_1*B1_1 + A2_2*B2_1, A2_1*B1_2 + A2_2*B2_2] Name Size Bytes Class Attributes X 2x2 8 symmatrix Y 2x2 8 sym

The range of functions that can be applied to a `symmatrix`

is as follows:

>> methods symmatrix Methods for class symmatrix: adjoint horzcat mldivide symmatrix cat isempty mpower symmatrix2sym conj isequal mrdivide tan cos isequaln mtimes times ctranspose kron norm trace det latex plus transpose diff ldivide power uminus disp length pretty uplus display log rdivide vertcat eq matlabFunction sin exp minus size Static methods: empty

In order to invert `A*B`

in this example, or find its eigenvalues, use `inv(Y)`

or `eig(Y)`

.

Last week I posted the fiftieth in my “What Is” series of articles. I began the series just over a year ago, in March 2020. The original aim was to provide “brief descriptions of important concepts in numerical analysis and related areas, with a focus on topics that arise in my research”, and the articles were meant to be short, widely accessible, and contain a minimum of mathematical symbols, equations, and citations. I have largely kept to these aims, though for some topics there is a lot to say and I have been more lengthy.

The articles are also available in PDF form on GitHub.

Below is a list of all the “What Is” articles published at the time of writing, in alphabetical order.

If there is a topic you would like me to cover, please put it in the comments below.

- What Is a Block Matrix?
- What Is a Cholesky Factorization?
- What Is a Companion Matrix?
- What Is a Condition Number?
- What Is a Correlation Matrix?
- What is a Diagonally Dominant Matrix?
- What Is a Fractional Matrix Power?
- What Is a Fréchet Derivative?
- What Is a Generalized Inverse?
- What Is a Hadamard Matrix?
- What Is a Householder Matrix?
- What Is a Matrix Function?
- What Is a Matrix Square Root?
- What Is a Matrix?
- What Is a Modified Cholesky Factorization?
- What Is a (Non)normal Matrix?
- What Is a QR Factorization?
- What Is a Random Orthogonal Matrix?
- What is a Sparse Matrix?
- What Is a Symmetric Positive Definite Matrix?
- What Is a Unitarily Invariant Norm?
- What Is an M-Matrix?
- What Is an Orthogonal Matrix?
- What Is Backward Error?
- What Is Bfloat16 Arithmetic?
- What Is Floating-Point Arithmetic?
- What Is IEEE Standard Arithmetic?
- What is Numerical Stability?
- What Is Rounding?
- What Is Stochastic Rounding?
- What Is the Adjugate of a Matrix?
- What is the Cayley–Hamilton Theorem?
- What Is the Complex Step Approximation?
- What Is the CS Decomposition?
- What Is the Gerstenhaber Problem?
- What Is the Growth Factor for Gaussian Elimination?
- What Is the Hilbert Matrix?
- What is the Kronecker Product?
- What Is the Log-Sum-Exp Function?
- What Is the Matrix Exponential?
- What Is the Matrix Logarithm?
- What Is the Matrix Sign Function?
- What Is the Matrix Unwinding Function?
- What Is the Nearest Positive Semidefinite Matrix?
- What Is the Nearest Symmetric Matrix?
- What is the Polar Decomposition?
- What Is the Sherman–Morrison–Woodbury Formula?
- What Is the Singular Value Decomposition?
- What Is the Softmax Function?
- What Is the Sylvester Equation?

is singular because is a null vector. A useful definition of a matrix with large diagonal requires a stronger property.

A matrix is *diagonally dominant by rows* if

It is *strictly diagonally dominant by rows* if strict inequality holds in (2) for all . is *(strictly) diagonally dominant by columns* if is (strictly) diagonally dominant by rows.

Diagonal dominance on its own is not enough to ensure nonsingularity, as the matrix (1) shows. Strict diagonal dominance does imply nonsingularity, however.

Theorem 1.If is strictly diagonally dominant by rows or columns then it is nonsingular.

Proof. Since is nonsingular if and only if is nonsingular, it suffices to consider diagonal dominance by rows. For any nonzero let and choose so that . Then the th equation of can be written

which gives

Using (2), we have

Therefore and so is nonsingular.

Diagonal dominance plus two further conditions is enough to ensure nonsingularity. We need the notion of irreducibility. A matrix is *irreducible* if there does not exist a permutation matrix such that

with and square matrices. Irreducibility is equivalent to the directed graph of being strongly connected.

Theorem 2.If is irreducible and diagonally dominant by rows with strict inequality in for some then it is nonsingular.

Proof. The proof is by contradiction. Suppose there exists such that . Define

The th equation of can be written

Hence for ,

The set is nonempty, because if it were empty then we would have for all and if there is strict inequality in for , then putting in (4) would give , which is a contradiction. Hence as long as for some , we obtain , which contradicts the diagonal dominance. Therefore we must have for all and all . This means that all the rows indexed by have zeros in the columns indexed by , which means that is reducible. This is a contradiction, so must be nonsingular.

The obvious analogue of Theorem 2 holds for column diagonal dominance.

As an example, the symmetric tridiagonal matrix (minus the second difference matrix)

is row diagonally dominant with strict inequality in the first and last diagonal dominance relations. It can also be shown to be irreducible and so it is nonsingular by Theorem 2. If we replace or by , then remains nonsingular by the same argument. What if we replace *both* and by ? We can answer this question by using an observation of Strang. If we define the rectangular matrix

then and

Since in general and have the same nonzero eigenvalues, we conclude that , where denotes the spectrum. Hence is symmetric positive definite and is singular and symmetric positive semidefinite.

Theorem 1 can be used to obtain information about the location of the eigenvalues of a matrix. Indeed if is an eigenvalue of then is singular and hence cannot be strictly diagonally dominant, by Theorem 1. So cannot be true for all . Gershgorin’s theorem is simply a restatement of this fact.

Theorem 3(Gershgorin’s theorem).The eigenvalues of lie in the union of the discs in the complex plane

If is symmetric with positive diagonal elements and satisfies the conditions of Theorem 1 or Theorem 2 then it is positive definite. Indeed the eigenvalues are real and so in Gershgorin’s theorem the discs are intervals and , so , so the eigenvalues are nonnegative, and hence positive since nonzero. This provides another proof that the matrix in (5) is positive definite.

In some situations is not diagonally dominant but a row or column scaling of it is. For example, the matrix

is not diagonally dominant by rows or columns but

is strictly diagonally dominant by rows.

A matrix is *generalized diagonally dominant by rows* if is diagonally dominant by rows for some diagonal matrix with for all , that is, if

It is easy to see that if is irreducible and there is strictly inequality in (6) for some then is nonsingular by Theorem 2.

It can be shown that is generalized diagonally dominant by rows if and only if it is an -matrix, where an -matrix is a matrix for which the comparison matrix , defined by

is an -matrix (see What Is an M-Matrix?).

A matrix is *block diagonally dominant by rows* if, for a given norm and block partitioning , the diagonal blocks are all nonsingular and

is block diagonally dominant by columns if is block diagonally dominant by rows. If the blocks are all then block diagonal dominance reduces to the usual notion of diagonal dominance. Block diagonal dominance holds for certain block tridiagonal matrices arising in the discretization of PDEs.

Analogues of Theorems 1 and 2 giving conditions under which block diagonal dominance implies nonsingularity are given by Feingold and Varga (1962).

If a matrix is strictly diagonally dominant then we can bound its inverse in terms of the minimum amount of diagonal dominance. For full generality, we state the bound in terms of generalized diagonal dominance.

Theorem 4.If and is strictly diagonally dominant by rows for a diagonal matrix with for all , then

where .

Proof. Assume first that . Let satisfy and let . Applying (3) gives . The result is obtained on applying this bound to and using . .

Another bound for when is strictly diagonally dominant by rows can be obtained by writing , where , , and for . It is easy to see that , which gives another proof that is nonsingular. Then

This bound implies that , so in view of its sign pattern is an -matrix, which essentially proves one direction of the -matrix equivalence in the previous section. The same bound holds if is diagonally dominant by columns, by writing .

An upper bound also holds for block diagonal dominance.

Theorem 5.If is block diagonally dominant by rows then

where .

It is interesting to note that the inverse of a strictly row diagonally dominant matrix enjoys a form of diagonal dominance, namely that the largest element in each column is on the diagonal.

Theorem 6.If is strictly diagonally dominant by rows then satisfies for all .

Proof. For we have . Let . Taking absolute values in gives

or , since . This inequality holds for all , so we must have , which gives the result.

Theorems 1 and 2 have a long history and have been rediscovered many times. Theorem 1 was first stated by Lévy (1881) with additional assumptions. In a short but influential paper, Taussky (1949) pointed out the recurring nature of the theorems and gave simple proofs (our proof of Theorem 2 is Taussky’s). Schneider (1977) attributes the surge in interest in matrix theory in the 1950s and 1960s to Taussky’s paper and a few others by her, Brauer, Ostrowski, and Wielandt. The history of Gershgorin’s theorem (published in 1931) is intertwined with that of Theorems 1 and 2; see Varga’s 2004 book for details.

Theorems 4 and 5 are from Varah (1975) and Theorem 6 is from Ostrowski (1952).

This is a minimal set of references, which contain further useful references within.

- David G. Feingold and Richard S. Varga, Block Diagonally Dominant Matrices and Generalizations of the Gerschgorin Circle Theorem, Pacific J. Math. 12(4), 1241–1250, 1962.
- A. M. Ostrowski, Note on Bounds for Determinants with Dominant Principal Diagonal, Proc. Amer. Math. Soc. 3, 260–30, 1952.
- Hans Schneider, Olga Taussky-Todd’s Influence on Matrix Theory and Matrix Theorists: A Discursive Personal Tribute, Linear and Multilinear Algebra 5, 197–224, 1977.
- Olga Taussky, A Recurring Theorem on Determinants, Amer. Math. Monthly 56(2), 672–676, 1949.
- J. M. Varah, A Lower Bound for the Smallest Singular Value of a Matrix, Linear Algebra Appl. 11, 3–5, 1975.
- Richard Varga, Geršgorin and His Circles, Springer-Verlag, Berlin, 2004.

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

We denote by any matrix norm, and we take the consistency condition as one of the defining properties of a matrix norm.

It will be useful to note that

and that more generally the inverse of the upper triangular matrix with

is given by

First, we consider a general matrix and let be an eigenvalue with (the spectral radius) and a corresponding eigenvector. With , where is the vector of ones, , so

which implies since . Hence

Let be a triangular matrix. Applying the latter bound to , whose eigenvalues are its diagonal entries , gives

Combining this bound with the analogous bound for gives

We note that commonly used norms satisfy , which yields another proof of (3) and (4).

For any and such that we have the lower bound . We can choose and then solve the triangular system for to obtain the lower bound. Condition number estimation techniques, which we will describe in another article, provide ways to choose that usually yield estimates of correct to within an order of magnitude.

For the -norm, we can choose and then compute by repeated triangular solves, obtaining the lower bound . This bound is simply the power method applied to .

Let be an upper triangular matrix. The upper bounds for that we will discuss depend only on the absolute values of the elements of . This limits the ability of the bounds to distinguish between well-conditioned and ill-conditioned matrices. For example, consider

The bounds for and will be the same, yet the inverses are of different sizes (the more so as the dimension increases).

Let and write

where is strictly upper triangular and hence nilpotent with . Then

Taking absolute values and using the triangle inequality gives

where the inequalities hold elementwise.

The comparison matrix associated with a general is the matrix with

It is not hard to see that is upper triangular with and so the bound (5) is

If we replace every element above the diagonal of by the most negative off-diagonal element in its row we obtain the upper triangular matrix with

Then , where , so

Finally, let , where is strictly upper triangular with every element above the diagonal equal to the maximum element of , that is,

Then

We note that , , and are all nonsingular -matrices. We summarize the bounds.

Theorem 1.

If is a nonsingular upper triangular matrix then

We make two remarks.

- The bounds (6) are equally valid for lower triangular matrices as long as the maxima in the definitions of and are taken over columns instead of rows.
- We could equally well have written . The comparison matrix is unchanged, and (6) continues to hold as long as the maxima in the definitions of and are taken over columns rather than rows.

It follows from the theorem that

for the 1-, 2-, and -norms and the Frobenius norm. Now , , and all have nonnegative inverses, and for a matrix with nonnegative inverse we have . Hence

where the big-Oh expressions show the asymptotic cost in flops of evaluating each term by solving the relevant triangular system. As the bounds become less expensive to compute they become weaker. The quantity can be explicitly evaluated for , using . It has the same value for , and since we have

This bound is an equality for for the matrix in (1).

For the Frobenius norm, evaluating , and using , gives

For the -norm, either of (7) and (8) can be the smaller bound depending on .

For the special case of a bidiagonal matrix it is easy to show that , and so can be computed exactly in flops.

These upper bounds can be arbitrarily weak, even for a fixed , as shown by the example

for which

As , . On the other hand, the overestimation is bounded as a function of for triangular matrices resulting from certain pivoting strategies.

Theorem 1.

Suppose the upper triangular matrix satisfies

Then, for the -, -, and -norms,

Proof. The first four inequalities are a combination of (3) and (6). The fifth inequality is obtained from the expression (7) for with .

Condition (9) is satisfied for the triangular factors from QR factorization with column pivoting and for the transpose of the unit lower triangular factors from LU factorization with any form of pivoting.

The upper bounds we have described have been derived independently by several authors, as explained by Higham (2002).

- Nicholas J. Higham, Accuracy and Stability of Numerical Algorithms, second edition, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2002. (Chapter 8.)

Alternatively, can be transposed and permuted so that the coefficients appear in the first or last column or the last row. By expanding the determinant about the first row it can be seen that

so the coefficients in the first row of are the coefficients of its characteristic polynomial. (Alternatively, in add times the th column to the last column for , to obtain as the new last column, and expand the determinant about the last column.) MacDuffee (1946) introduced the term “companion matrix” as a translation from the German “Begleitmatrix”.

Setting in gives , so is nonsingular if and only if . The inverse is

Note that is in companion form, where is the reverse identity matrix, and the coefficients are those of the polynomial , whose roots are the reciprocals of those of .

A companion matrix has some low rank structure. It can be expressed as a unitary matrix plus a rank- matrix:

Also, differs from in just the first and last columns, so , where is a rank- matrix.

If is an eigenvalue of then is a corresponding eigenvector. The last rows of are clearly linearly independent for any , which implies that is nonderogatory, that is, no two Jordan blocks in the Jordan canonical form contain the same eigenvalue. In other words, the characteristic polynomial is the same as the minimal polynomial.

The MATLAB function `compan`

takes as input a vector of the coefficients of a polynomial, , and returns the companion matrix with , …, .

Perhaps surprisingly, the singular values of have simple representations, found by Kenney and Laub (1988):

where . These formulae generalize to block companion matrices, as shown by Higham and Tisseur (2003).

Companion matrices arise naturally when we convert a high order difference equation or differential equation to first order. For example, consider the Fibonacci numbers , , , , , , which satisfy the recurrence for , with . We can write

where is a companion matrix. This expression can be used to compute in operations by computing the matrix power using binary powering.

As another example, consider the differential equation

Define new variables

Then

or

so the third order scalar equation has been converted into a first order system with a companion matrix as coefficient matrix.

The MATLAB function `roots`

takes as input a vector of the coefficients of a polynomial and returns the roots of the polynomial. It computes the eigenvalues of the companion matrix associated with the polynomial using the `eig`

function. As Moler (1991) explained, MATLAB used this approach starting from the first version of MATLAB, but it does not take advantage of the structure of the companion matrix, requiring flops and storage instead of the flops and storage that should be possible given the structure of . Since the early 2000s much research has aimed at deriving methods that achieve this objective, but numerically stable methods proved elusive. Finally, a backward stable algorithm requiring flops and storage was developed by Aurentz, Mach, Vandebril, and Watkins (2015). It uses the QR algorithm and exploits the unitary plus low rank structure shown in (1). Here, backward stability means that the computed roots are the eigenvalues of for some with . It is not necessarily the case that the computed roots are the exact roots of a polynomial with coefficients with for all .

It is an interesting observation that

Multiplying by the inverse of the matrix on the left we express the companion matrix as the product of two symmetric matrices. The obvious generalization of this factorization to matrices shows that we can write

We need the rational canonical form of a matrix, described in the next theorem, which Halmos (1991) calls “the deepest theorem of linear algebra”. Let denote the field or .

Theorem 1(rational canonical form).If then where is nonsingular and , with each a companion matrix.

The theorem says that every matrix is similar over the underlying field to a block diagonal matrix composed of companion matrices. Since we do not need it, we have omitted from the statement of the theorem the description of the in terms of the irreducible factors of the characteristic polynomial. Combining the factorization (2) and Theorem 1 we obtain

Since is nonsingular, and since can alternatively be taken nonsingular by considering the factorization of , this proves a theorem of Frobenius.

Theorem 2(Frobenius, 1910).For any there exist symmetric , either one of which can be taken nonsingular, such that .

Note that if with the symmetric then , so is symmetric. Likewise, is symmetric.

Fiedler (2003) noted that a companion matrix can be factorized into the product of simpler factors, of them being the identity matrix with a block placed on the diagonal, and he used this factorization to determine a matrix similar to . For it is

In general, Fielder’s construction yields an pentadiagonal matrix that is not simply a permutation similarity of . The fact that has block diagonal factors opens the possibility of obtaining new methods for finding the eigenvalues of . This line of research has been extensively pursued in the context of polynomial eigenvalue problems (see Mackey, 2013).

The companion matrix is associated with the monomial basis representation of the characteristic polynomial. Other polynomial bases can be used, notably orthogonal polynomials, and this leads to generalizations of the companion matrix having coefficients on the main diagonal and the subdiagonal and superdiagonal. Good (1961) calls the matrix resulting from the Chebyshev basis a *colleague matrix*. Barnett (1981) calls the matrices corresponding to orthogonal polynomials *comrade matrices*, and for a general polynomial basis he uses the term *confederate matrices*. Generalizations of the properties of companion matrices can be derived for these classes of matrices.

Since the roots of a polynomial are the eigenvalues of the associated companion matrix, or a Fiedler matrix similar to it, or indeed the associated comrade matrix or confederate matrix, one can obtain bounds on the roots by applying any available bounds for matrix eigenvalues. For example, since any eigenvalue of matrix satisfies , by taking the -norm and the -norm of the companion matrix we find that any root of the polynomial satisfies

either of which can be the smaller. A rich variety of such bounds is available, and these techniques extend to matrix polynomials and the corresponding block companion matrices.

This is a minimal set of references, which contain further useful references within.

- Jared L. Aurentz, Thomas Mach, Raf Vandebril, and David S. Watkins, Fast and Backward Stable Computation of Roots of Polynomials, SIAM J. Matrix Anal. Appl. 36(3), 942–973, 2015.
- Stephen Barnett, Congenial Matrices, Linear Algebra Appl. 41, 277–298, 1981.
- Fernando De Terán and Froilán M. Dopico and Javier Pérez, New Bounds for Roots of Polynomials Based on Fiedler Companion Matrices, Linear Algebra Appl. 451, 197–230, 2014.
- Miroslav Fiedler, A Note on Companion Matrices, Linear Algebra Appl. 372, 325–331, 2003.
- Nicholas J. Higham and Françoise Tisseur, Bounds for Eigenvalues of Matrix Polynomials, Linear Algebra Appl. 358, 5–22, 2003
- Charles S. Kenney and Alan J. Laub, Controllability and Stability Radii for Companion Form Systems, Math. Control Signals Systems 1, 239–256, 1988.
- Cyrus Colton MacDuffee, The Theory of Matrices, Chelsea, New York, 1946.
- D. Steven Mackey, The Continuing Influence of Fiedler’s Work on Companion Matrices, Linear Algebra Appl. 439, 810–817, 2013.
- Cleve Moler, ROOTS—of Polynomials, That Is, The MathWorks Newsletter 5(1), 1991.
- Olga Taussky, The Role of Symmetric Matrices in the Study of General Matrices, Linear Algebra Appl. 5, 147–154, 1972.

Here, is the spectral radius of , that is, the largest modulus of any eigenvalue of , and denotes that has nonnegative entries. An -matrix clearly has nonpositive off-diagonal elements. It also has positive diagonal elements, which can be shown using the result that

for any consistent matrix norm:

Although the definition of an -matrix does not specify , we can set it to . Indeed let satisfy and set and . Then , since and for . Furthermore, for a nonnegative matrix the spectral radius is an eigenvalue, by the Perron–Frobenius theorem, so is an eigenvalue of and is an eigenvalue of . Hence .

The concept of -matrix was introduced by Ostrowski in 1937. -matrices arise in a variety of scientific settings, including in finite difference methods for PDEs, input-output analysis in economics, and Markov chains in stochastic processes.

An immediate consequence of the definition is that the eigenvalues of an -matrix lie in the open right-half plane, which means that -matrices are special cases of positive stable matrices. Hence an -matrix is nonsingular and the determinant, being the product of the eigenvalues, is positive. Moreover, since satisfies ,

In fact, nonnegativity of the inverse characterizes -matrices. Define

Theorem 1.A nonsingular matrix is an -matrix if and only if .

Sometimes an -matrix is *defined* to be a matrix with nonpositive off-diagonal elements and a nonnegative inverse. In fact, this condition is just one of a large number of conditions equivalent to a matrix with nonpositive off-diagonal elements being an -matrix, fifty of which are given in Berman and Plemmons (1994, Chap. 6).

It is easy to check from the definitions, or using Theorem 1, that a triangular matrix with positive diagonal and nonpositive off-diagonal is an -matrix. An example is

An -matrix can be constructed from any nonsingular triangular matrix by taking the comparison matrix. The comparison matrix associated with a general is the matrix

For a nonsingular triangular , is an -matrix, and it easy to show that

where the absolute value is taken componentwise. This bound, and weaker related bounds, can be useful for cheaply bounding the norm of the inverse of a triangular matrix. For example, with denoting the vector of ones, since is nonnegative we have

and can be computed in flops by solving a triangular system, whereas computing costs flops.

More generally, if we have an LU factorization of an -matrix then, since ,

Therefore the norm of the inverse can be computed in flops with two triangular solves, instead of the flops that would be required if were to be formed explicitly.

There are many analogies between M-matrices and symmetric positive definite matrices. For example, every principal submatrix of a symmetric positive definite matrix is symmetric positive definite and every principal submatrix of an -matrix is an -matrix. Indeed if is a principal submatrix of a nonnegative then , which follows from for the -norm (for example). Hence on taking principal submatrices in we have with the same .

A symmetric -matrix is known as a *Stieltjes matrix*, and such a matrix is positive definite. An example of a Stieltjes matrix is minus the second difference matrix (the tridiagonal matrix arising from a central difference discretization of a second derivative), illustrated for by

Since the leading principal submatrices of an -matrix have positive determinant it follows that has an LU factorization with having positive diagonal elements. However, more is true, as the next result shows.

Theorem 2.An -matrix has an LU factorization in which and are -matrices.

Proof. We can write

The first stage of LU factorization is

where is the Schur complement of in . The first column of and the first row of are of the form required for a triangular -matrix. We have

Since it follows that . It is easy to see that , and hence Theorem shows that is an -matrix. The result follows by induction.

The question now arises of what can be said about the numerical stability of LU factorization of an -matrix. To answer it we use another characterization of -matrices, that is strictly diagonally dominant by columns for some diagonal matrix with for all , that is,

(This condition can also be written as because of the sign pattern of .) Matrices that are diagonally dominant by columns have the properties that an LU factorization without pivoting exists, the growth factor , and partial pivoting does not require row interchanges. The effect of row scaling on LU factorization is easy to see:

where is unit lower triangular, so that and are the LU factors of . It is easy to see that the growth factor bound of for a matrix diagonally dominant by columns translates into the bound

for an -matrix, which was obtained by Funderlic, Neumann, and Plemmons (1982). Unfortunately, this bound can be large. Consider the matrix

We have

so is an -matrix. The element of the LU factor of is , which means that

and this lower bound can be arbitrarily large. One can verify experimentally that numerical instability is possible when is large, in that the computed LU factors have a large relative residual. We conclude that pivoting is necessary for numerical stability in LU factorization of -matrices.

A stationery iterative method for solving a linear system is based on a splitting with nonsingular, and has the form . This iteration converges for all starting vectors if . Much interest has focused on *regular splittings*, which are defined as ones for which and . An -matrix has the important property that for every regular splitting, and it follows that the Jacobi iteration, the Gauss-Seidel iteration, and the successive overrelaxation (SOR) iteration (with ) are all convergent for -matrices.

The principal square root of an -matrix is an -matrix, and it is the unique such square root. An expression for follows from :

This expression does not necessarily provide the best way to compute .

The theory of -matrices extends to the case where the condition on is relaxed to in , though the theory is more complicated and extra conditions such as irreducibility are needed for some results. Singular -matrices occur in Markov chains (Berman and Plemmons, 1994, Chapter 8), for example. Let be the transition matrix of a Markov chain. Then is stochastic, that is, nonnegative with unit row sums, so . A nonnegative vector with such that is called a *stationary distribution vector* and is of interest for describing the properties of the Markov chain. To compute we can solve the singular system . Clearly, and , so is a singular -matrix.

A more general concept is that of -matrix: is an -matrix if the comparison matrix is an -matrix. Many results for -matrices extend to -matrices. For example, for an -matrix with positive diagonal elements the principal square root exists and is the unique square root that is an -matrix with positive diagonal elements. Also, the growth factor bound holds for any -matrix for which is diagonally dominant by columns.

This is a minimal set of references, which contain further useful references within.

- Abraham Berman and Robert J. Plemmons, Nonnegative Matrices in the Mathematical Sciences, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 1994.
- R. E. Funderlic and M. Neumann and R. J. Plemmons Decompositions of Generalized Diagonally Dominant Matrices, Numer. Math. 40, 57–69, 1982.
- Nicholas J. Higham, Accuracy and Stability of Numerical Algorithms, second edition, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2002. (Chapter 8.)
- Nicholas J. Higham, Functions of Matrices: Theory and Computation, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2008. (Section 6.8.3.)

- What Is a Matrix Square Root? (2020)
- What Is a Symmetric Positive Definite Matrix? (2020)
- What is Numerical Stability? (2020)

The eigenvalues of a Hermitian matrix are real and we order them . Note that in some references, such as Horn and Johnson (2013), the reverse ordering is used, with the largest eigenvalue. When it is necessary to specify what matrix is an eigenvalue of we write : the th largest eigenvalue of . All the following results also hold for symmetric matrices over .

The function is the quadratic form for evaluated on the unit sphere, since . As is Hermitian it has a spectral decomposition , where is unitary and . Then

from which is it clear that

with equality when is an eigenvector corresponding to and , respectively, This characterization of the extremal eigenvalues of as the extrema of is due to Lord Rayleigh (John William Strutt), and is called a Rayleigh quotient. The intermediate eigenvalues correspond to saddle points of .

The Courant–Fischer theorem (1905) states that every eigenvalue of a Hermitian matrix is the solution of both a min-max problem and a max-min problem over suitable subspaces of .

Theorem (Courant–Fischer).For a Hermitian ,

Note that the equalities are special cases of these characterizations.

In general there is no useful formula for the eigenvalues of a sum of Hermitian matrices. However, the Courant–Fischer theorem yields the upper and lower bounds

from which it follows that

This inequality shows that the eigenvalues of a Hermitian matrix are well conditioned under perturbation. We can rewrite the inequality in the symmetric form

If is positive semidefinite then (1) gives

while if is positive definite then strict inequality holds for all . These bounds are known as the Weyl monotonicity theorem.

Weyl’s inequalities (1912) bound the eigenvalues of in terms of those of and .

Theorem (Weyl).For Hermitian and ,

The Weyl inequalities yield much information about the effect of low rank perturbations. Consider a positive semidefinite rank- perturbation . Inequality (3) with gives

(which also follows from (1)). Inequality (3) with , combined with (2), gives

These inequalities confine each eigenvalue of to the interval between two adjacent eigenvalues of ; the eigenvalues of are said to *interlace* those of . The following figure illustrates the case , showing a possible configuration of the eigenvalues of and of .

A specific example, in MATLAB, is

>> n = 4; eig_orig = 5:5+n-1 >> D = diag(eig_orig); eig_pert = eig(D + ones(n))' eig_orig = 5 6 7 8 eig_pert = 5.2961e+00 6.3923e+00 7.5077e+00 1.0804e+01

Since and the trace is the sum of the eigenvalues, we can write

where the are nonnegative and sum to . If we greatly increase , the norm of the perturbation, then most of the increase in the eigenvalues is concentrated in the largest, since (5) bounds how much the smaller eigenvalues can change:

>> eig_pert = eig(D + 100*ones(n))' eig_pert = 5.3810e+00 6.4989e+00 7.6170e+00 4.0650e+02

More generally, if has positive eigenvalues and negative eigenvalues then (3) with gives

while (4) with gives

So the inertia of (the number of negative, zero, and positive eigenvalues) determines how far the eigenvalues can move as measured relative to the indexes of the eigenvalues of .

An important implication of the last two inequalities is for the case , for which we have

Exactly eigenvalues appear in one of these inequalities and appear in both. Therefore of the eigenvalues are equal to and so only eigenvalues can differ from . So perturbing the identity matrix by a Hermitian matrix of rank changes at most of the eigenvalues. (In fact, it changes exactly eigenvalues, as can be seen from a spectral decomposition.)

Finally, if has rank then and and so taking in (3) and in (4) gives

The Cauchy interlace theorem relates the eigenvalues of successive leading principal submatrices of a Hermitian matrix. We denote the leading principal submatrix of of order by .

Theorem (Cauchy).For a Hermitian ,

The theorem says that the eigenvalues of interlace those of for all . Two immediate implications are that (a) if is Hermitian positive definite then so are all its leading principal submatrices and (b) appending a row and a column to a Hermitian matrix does not decrease the largest eigenvalue or increase the smallest eigenvalue.

Since eigenvalues are unchanged under symmetric permutations of the matrix, the theorem can be reformulated to say that the eigenvalues of any principal submatrix of order interlace those of . A generalization to principal submatrices of order is given in the next result.

Theorem.If is a principal submatrix of order of a Hermitian then

It follows by taking to be a unit vector in the formula that for all . And of course the trace of is the sum of the eigenvalues: . These relations are the first and last in a sequence of inequalities relating sums of eigenvalues to sums of diagonal elements obtained by Schur in 1923.

Theorem (Schur).For a Hermitian ,

where is the set of diagonal elements of arranged in decreasing order: .

These inequalities say that the vector of eigenvalues *majorizes* the ordered vector of diagonal elements.

An interesting special case is a correlation matrix, a symmetric positive semidefinite matrix with unit diagonal, for which the inequalities are

and . Here is an illustration in MATLAB.

>> n = 5; rng(1); A = gallery('randcorr',n); >> e = sort(eig(A)','descend'), partial_sums = cumsum(e) e = 2.2701e+00 1.3142e+00 9.5280e-01 4.6250e-01 3.6045e-04 partial_sums = 2.2701e+00 3.5843e+00 4.5371e+00 4.9996e+00 5.0000e+00

Ky Fan (1949) proved a majorization relation between the eigenvalues of , , and :

For , the inequality is the same as the upper bound of (1), and for it is an equality: .

For a Hermitian and a nonsingular , the transformation is a congruence transformation. Sylvester’s law of inertia says that congruence transformations preserve the inertia. A result of Ostrowski (1959) goes further by providing bounds on the ratios of the eigenvalues of the original and transformed matrices.

Theorem (Ostrowski).For a Hermitian and ,

where .

If is unitary then and so Ostrowski’s theorem reduces to the fact that a congruence with a unitary matrix is a similarity transformation and so preserves eigenvalues. The theorem shows that the further is from being unitary the greater the potential change in the eigenvalues.

Ostrowski’s theorem can be generalized to the situation where is rectangular (Higham and Cheng, 1998).

The results we have described are strongly interrelated. For example, the Courant–Fischer theorem and the Cauchy interlacing theorem can be derived from each other, and Ostrowski’s theorem can be proved using the Courant–Fischer Theorem.

- Rajendra Bhatia, Linear Algebra to Quantum Cohomology: The Story of Alfred Horn’s Inequalities, Amer. Math. Monthly 108(4), 289–318, 2001.
- Roger A. Horn and Charles R. Johnson, Matrix Analysis, second edition, Cambridge University Press, 2013. My review of the second edition.
- Nicholas J. Higham and Sheung Hun Cheng, Modifying the Inertia of Matrices Arising in Optimization, Linear Algebra Appl. 275–276, 261-279, 1998.
- Beresford Parlett, The Symmetric Eigenvalue Problem, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA 1998.

*Minisymposium description:* Reduced precision floating-point arithmetic, such as IEEE half precision and bfloat16, is increasingly available in hardware. Low precision computations promise major increases in speed and reductions in data communication costs, but they also bring an increased risk of overflow, underflow, and loss of accuracy. One way to improve the results of low precision computations is to use stochastic rounding instead of round to nearest, and this is proving popular in machine learning. This minisymposium will discuss recent advances in exploitation and analysis of reduced precision arithmetic and stochastic rounding.

**Algorithms for Stochastically Rounded Elementary Arithmetic Operations in IEEE 754 Floating-Point Arithmetic** Massimiliano Fasi, Örebro University, Sweden; *Mantas Mikaitis*, University of Manchester, United Kingdom. Abstract. Slides.

**Reduced Precision Elementary Functions***Jean-Michel Muller*, ENS Lyon, France. Abstract. Slides.

**Effect of Reduced Precision and Stochastic Rounding in the Numerical Solution of Parabolic Equations** *Matteo Croci* and Michael B. Giles, University of Oxford, United Kingdom. Abstract. Slides.

**Stochastic Rounding and its Probabilistic Backward Error Analysis***Michael P. Connolly* and Nicholas J. Higham, University of Manchester, United Kingdom; Theo Mary, Sorbonne Universités and CNRS, France. Abstract. Slides.

**Stochastic Rounding in Weather and Climate Models** *Milan Kloewer*, Edmund Paxton, and Matthew Chantry, University of Oxford, United Kingdom Abstract. Slides.

The rank of a matrix is the maximum number of linearly independent columns, which is the dimension of the range space of , . An important but non-obvious fact is that this is the same as the maximum number of linearly independent rows (see (5) below).

A rank- matrix has the form , where and are nonzero vectors. Every column is a multiple of and every row is a multiple of . A sum of rank- matrices has the form

Each column of is a linear combination of the vectors , , …, , so has at most linearly independent columns, that is, has rank at most . In fact, if and have rank , as follows from (4) below. Any rank- matrix can be written in the form with and of rank ; indeed this is the full-rank factorization below.

Here are some fundamental rank equalities and inequalities.

The rank-nullity theorem says that

where is the null space of .

The rank cannot exceed the number of columns, or, by (5) below, the number of rows:

For any and of the same dimension,

The upper bound follows from the fact that the dimension of the sum of two subspaces cannot exceed the sum of the dimensions of the subspaces. Interestingly, the upper bound is also a corollary of the bound (3) for the rank of a matrix product, because

For the lower bound, writing and applying the upper bound gives , and likewise with the roles of and interchanged.

For any ,

Indeed implies , and implies , which implies . Hence the null spaces of and are the same. The equality (2) follows from the rank-nullity theorem.

For any and for which the product is defined,

If then , so the columns of are linear combinations of those of and so cannot have more linearly independent columns than , that is, . Using (5) below, we then have

The latter inequality can be proved without using (5) (our proof of which uses (3)), as follows. Suppose . Let the columns of span , so that has columns and for some matrix with columns. Now by the first part, so for some nonzero . But then , which contradicts the linear independence of the columns of , so we must have .

We have

We note that and are both nonsingular matrices by (2), so their product has rank . Using (3),

and hence .

Another important relation is

This is a consequence of the equality for nonsingular and .

By (2) and (3) we have . Interchanging the roles of and gives and so

In other words, the rank of is equal to the maximum number of linearly independent rows as well as the maximum number of linearly independent columns.

has rank if and only if for some and , both of rank , and this is called a full-rank factorization. The existence of such a factorization implies that by (4). Conversely, suppose that has rank . Let the columns of form a basis for the range space of . Then there are -vectors such that , , and with we have . Finally, by (3), and since we have .

A characterization of rank that is sometimes used as the definition is that it is the size of the largest nonsingular square submatrix. Equivalently, the rank is the size of the largest nonzero minor, where a minor of size is the determinant of a submatrix.

Although and have some properties in common when both products are defined (notably they have the same nonzero eigenvalues), is not always equal to . A simple example is and with and orthogonal vectors: but . An example with square and is

Note that and , where has in the th position and zeros everywhere else. Such matrices are easy to manipulate in this form (e.g., ) and are useful for constructing examples.

If we have a full-rank factorization of then we can read off the rank from the dimensions of the factors. But finding a full-rank factorization is a nontrivial task. The ultimate full-rank factorization is the SVD

where and are orthogonal, , where , and . The rank of is , the number of nonzero singular values.

In floating-point arithmetic, the standard algorithms for computing the SVD are numerically stable, that is, the computed singular values are the exact singular values of a matrix with , where is a constant and is the unit roundoff. Unfortunately, will typically be full rank when is rank deficient. For example, consider this computation.

>> n = 4; A = zeros(n); A(:) = 1:n^2, svd(A) A = 1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16 ans = 3.8623e+01 2.0713e+00 1.5326e-15 1.3459e-16

The matrix has rank and the two zero singular values are approximated by computed singular values of order . In general, we have no way to know whether tiny computed singular values signify exactly zero singular values. In practice, one typically defines a numerical rank based on a threshold and regards computed singular values less than the threshold as zero. Indeed the MATLAB `rank`

function computes the rank as the number of singular values exceeding , where is the largest computed singular value. If the data from which the matrix is constructed is uncertain then the definition of numerical rank should take into account the level of uncertainty in the data. Dealing with rank deficiency in the presence of data errors and in finite precision arithmetic is a tricky business.

An excellent reference for further rank relations is Horn and Johnson. Stewart describes some of the issues associated with rank-deficient matrices in practical computation.

- Roger A. Horn and Charles R. Johnson, Matrix Analysis, second edition, Cambridge University Press, 2013. My review of the second edition.
- G. W. Stewart, Rank Degeneracy, SIAM J. Sci. Statist. Comput. 5 (2), 403–413, 1984