What Is a Vector Norm?

A vector norm measures the size, or length, of a vector. For complex $n$ -vectors, a vector norm is a function $\|\cdot\| : \mathbb{C}^n \to \mathbb{R}$ satisfying

$\|x\| \ge 0$ with equality if and only if $x=0$ (nonnegativity),
$\|\alpha x\| =|\alpha| \|x\|$ for all $\alpha\in\mathbb{C}$ $x\in\mathbb{C}^n$ (homogeneity),
$\|x+y\| \le \|x\|+\|y\|$ for all $x,y\in\mathbb{C}^n$ (the triangle inequality).

An important class of norms is the Hölder $p$ -norms

$\|x\|_p = \biggl( \displaystyle\sum_{i=1}^n |x_i|^p \biggr)^{1/p}, \quad p\ge 1. \qquad\qquad (1)$

The $\infty$ -norm is interpreted as $\lim_{p\to\infty}\|x\|_p$ and is given by

$\notag \|x\|_{\infty} = \displaystyle\max_{1\le i\le n} |x_i|.$

Other important special cases are

$\notag \begin{alignedat}{2} \|x\|_1 &= \sum_{i=1}^n |x_i|, &\quad& \mbox{``Manhattan'' or ``taxi~cab'' norm}, \\ \|x\|_2 &= \biggl( \sum_{i=1}^n |x_i|^2 \biggr)^{1/2} = (x^*x)^{1/2}, &\quad& \mbox{Euclidean length}. \end{alignedat}$

A useful concept is that of the dual norm associated with a given vector norm, which is defined by

$\notag \|y\|^D = \displaystyle\max_{x\ne0} \displaystyle\frac{\mathop{\mathrm{Re}}x^* y}{\|x\|}.$

The maximum is attained because the definition is equivalent to $\|y\|^D = \max\{ \, \mathop{\mathrm{Re}} x^*y: \|x\| = 1\,\}$ , in which we are maximizing a continuous function over a compact set. Importantly, the dual of the dual norm is the original norm (Horn and Johnson, 2013, Thm. $\,$ 5.5.9(c)).

We can rewrite the definition of dual norm, using the homogeneity of vector norms, as

$\notag \|y\|^D = \displaystyle\max_{|c| = 1} \| cy \|^D = \max_{|c| = 1} \max_{x\ne 0} \frac{\mathop{\mathrm{Re}} x^*(cy) }{\|x\|} = \max_{x\ne 0} \max_{|c| = 1} \frac{\mathop{\mathrm{Re}} c(x^*y) }{\|x\|} = \max_{x\ne 0} \frac{ |x^*y| }{\|x\|}.$

Hence we have the attainable inequality

$\notag |x^*y| \le \|x\| \, \|y\|^D, \qquad\qquad (2)$

which is the generalized Hölder inequality.

The dual of the $p$ -norm is the $q$ -norm, where $p^{-1} + q^{-1} = 1$ , so for the $p$ -norms the inequality (2) becomes the (standard) Hölder inequality,

$\notag |x^*y| \le \|x\|_p \, \|y\|_q, \quad \displaystyle\frac{1}{p} + \frac{1}{q} = 1.$

An important special case is the Cauchy–Schwarz inequality,

$\notag |x^*y| \le \|x\|_2 \, \|y\|_2.$

The notation $\|x\|_0$ is used to denote the number of nonzero entries in $x$ , even though it is not a vector norm and is not obtained from (1) with $p = 0$ . In portfolio optimization, if $x_k$ specifies how much to invest in stock $k$ then the inequality $\|x\|_0 \le k$ says “invest in at most $k$ stocks”.

In numerical linear algebra, vector norms play a crucial role in the definition of a subordinate matrix norm, as we will explain in the next post in this series.

All norms on $\mathbb{C}^n$ are equivalent in the sense that for any two norms $\|\cdot\|_\alpha$ and $\|\cdot\|_\beta$ there exist positive constants $\nu_1$ and $\nu_2$ such that

$\nu_1\|x\|_\alpha \le \|x\|_\beta \le \nu_2 \|x\|_\alpha \quad \mathrm{for~all}~x\in \mathbb{C}^n.$

For example, it is easy to see that

$\notag \begin{aligned} \|x\|_2 &\le \|x\|_1 \le \sqrt{n} \|x\|_2,\\ \|x\|_\infty &\le \|x\|_2 \le \sqrt{n} \|x\|_\infty,\\ \|x\|_\infty &\le \|x\|_1 \le n \|x\|_\infty. \end{aligned}$

The 2-norm is invariant under unitary transformations: if $Q^*Q = I$ , then $\|Qx\|^2 = x^*Q^* Qx = x^*x = \|x\|_2^2$ .

Care must be taken to avoid overflow and (damaging) underflow when evaluating a vector $p$ -norm in floating-point arithmetic for $p\ne 1,\infty$ . One can simply use the formula $\|x\|_p = \| (x/\|x\|_{\infty}) \|_p \|x\|_{\infty}$ , but this requires two passes over the data (the first to evaluate $\|x\|_{\infty}$ ). For more efficient one-pass algorithms for the $2$ -norm see Higham (2002, Sec. 21.8) and Harayama et al. (2021).

References

This is a minimal set of references, which contain further useful references within.

Takeyuki Harayama, Shuhei Kudo, Daichi Mukunoki, Toshiyuki Imamura, and Daisuke Takahashi, A Rapid Euclidean Norm Calculation Algorithm that Reduces Overflow and Underflow, in Computational Science and Its Applications–ICCSA 2021, O. Gervasi et al., eds, 95–110, Springer, 2021.
Nicholas J. Higham, Accuracy and Stability of Numerical Algorithms, second edition, Society for Industrial and Applied Mathematics,
Roger A. Horn and Charles R. Johnson, Matrix Analysis, second edition, Cambridge University Press, 2013. My review of the second edition.

Note: This article was revised on October 12, 2021 to change the definition of dual norm to use $\mathrm{Re}$ .

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

Can We Solve Linear Algebra Problems at Extreme Scale and Low Precisions?

The Fugaku supercomputer that tops the HPL-AI mixed-precision benchmark in the June 2021 TOP500 list. It solved a linear system of order 10^7 using IEEE half precision arithmetic for most of the computations.

The largest dense linear systems being solved today are of order $n = 10^7$ , and future exascale computer systems will be able to tackle even larger problems. Rounding error analysis shows that the computed solution satisfies a componentwise backward error bound that, under favorable assumptions, is of order $nu$ , where $u$ is the unit roundoff of the floating-point arithmetic: $u \approx 10^{-16}$ for double precision and $u \approx 10^{-8}$ for single precision. This backward error bound cannot guarantee any stability for single precision solution of today’s largest problems and suggests a loss of half the digits in the backward error for double precision.

Half precision floating-point arithmetic is now readily available in hardware, in both the IEEE binary16 format and the bfloat16 format, and it is increasingly being used in machine learning and in scientific computing more generally. For the computation of the inner product of two $n$ -vectors the backward error bound is again of order $nu$ , and this bound exceeds $1$ for $n \ge 684$ for both half precision formats, suggesting a potentially complete loss of numerical stability. Yet inner products with $n \ge 684$ are successfully used in half precision computations in practice.

The error bounds I have referred to are upper bounds and so bound the worst-case over all possible rounding errors. Their main purpose is to reveal potential instabilities rather than to provide realistic error estimates. Yet we do need to know the limits of what we can compute, and for mission critical applications we need to be able to guarantee a successful computation..

Can we understand the behavior of linear algebra algorithms at extreme scale and in low precision floating-point arithmetics?

To a large extent the answer is yes if we exploit three different features to obtain smaller error bounds.

Blocked Algorithms

Many algorithms are implemented in blocked form. For example, an inner product $x^Ty$ of two $n$ -vectors $x$ and $y$ can computed as

$\notag \begin{aligned} s_i &= x((i-1)b+1:ib)^T y((i-1)b+1:ib), \quad i = 1:k,\\ s &= s_1 + s_2 + \dots + s_k, \end{aligned}$

where $n = kb$ and $b \ll n$ is the block size. The inner product has been broken into $k$ smaller inner products of size $b$ , which are computed independently then summed. Many linear algebra algorithms are blocked in an analogous way, where the blocking is into submatrices with $b$ rows or $b$ columns (or both). Careful analysis of the error analysis shows that a blocked algorithm has an error bound about a factor of $b$ smaller than that for the corresponding unblocked algorithm. Practical block sizes for matrix algorithms are typically $128$ or $256$ , so blocking brings a substantial reduction in the error bounds.

Backward errors for the inner product of two vectors with elements of the form -0.25 + randn, computed in single precision in MATLAB with block size 256.

In fact, one can do even better than an error bound of order $(n/b)u$ . By computing the sum $s= s_1 + s_2 + \dots + s_k$ with a more accurate summation method the error constant is further reduced to $bu + O(u^2)$ (this is the FABsum method of Blanchard et al. (2020)).

Architectural Features

Intel x86 processors support an 80-bit extended precision format with a 64-bit significand, which is compatible with that specified in the IEEE standard. When a compiler uses this format with 80-bit registers to accumulate sums and inner products it is effectively working with a unit roundoff of $2^{-64}$ rather than $2^{-53}$ for double precision, giving error bounds smaller by a factor up to $2^{11} = 2048$ .

Some processors have a fused multiply–add (FMA) operation, which computes a combined multiplication and addition $x + yz$ with one rounding error instead of two. This results in a reduction in error bounds by a factor $2$ .

Mixed precision block FMA operations $D = C + AB$ , with matrices $A,B,C,D$ of fixed size, are available on Google tensor processing units, NVIDIA GPUs, and in the ARMv8-A architecture. For half precision inputs these devices can produce results of single precision quality, which can give a significant boost in accuracy when block FMAs are chained together to form a matrix product of arbitrary dimension.

Probabilistic Bounds

Worst-case rounding error bounds suffer from the problem that they are not attainable for most specific sets of data and are unlikely to be nearly attained. Stewart (1990) noted that

To be realistic, we must prune away the unlikely. What is left is necessarily a probabilistic statement.

Theo Mary and I have recently developed probabilistic rounding error analysis, which makes probabilistic assumptions on the rounding errors and derives bounds that hold with a certain probability. The key feature of the bounds is that they are proportional to $\sqrt{n}u$ when a corresponding worst-case bound is proportional to $nu$ . In the most general form of the analysis (Connolly, Higham, and Mary, 2021), the rounding errors are assumed to be mean independent and of mean zero, where mean independence is a weaker assumption than independence.

Putting the Pieces Together

The different features we have described can be combined to obtain significantly smaller error bounds. If we use a blocked algorithm with block size $b \ll n$ then in an inner product the standard error bound of order $nu$ reduces to a probabilistic bound of order $(\sqrt{n/b})u$ , which is a significant reduction. Block FMAs and extended precision registers provide further reductions.

For example, for a linear system of order $10^7$ solved in single precision with a block size of $256$ , the probabilistic error bound is of order $10^{-5}$ versus $1$ for the standard worst-case bound. If FABsum is used then the bound is further reduced.

Our conclusion is that we can successfully solve linear algebra problems of greater size and at lower precisions than the standard rounding error analysis suggests. A priori bounds will always be pessimistic, though. One should compute a posteriori residuals or backward errors (depending on the problem) in order to assess the quality of a numerical solution.

For full details of the work summarized here, see Higham (2021).

References

Pierre Blanchard, Nicholas Higham, and Theo Mary, A Class of Fast and Accurate Summation Algorithms, SIAM J. Sci. Comput. 42(3), A1541–A1557, 2020.
Pierre Blanchard, Nicholas Higham, Florent Lopez, Theo Mary, and Srikara Pranesh, Mixed Precision Block Fused Multiply-Add: Error Analysis and Application to GPU Tensor Cores, SIAM J. Sci. Comput. 42(3), C124-C141, 2020
Michael P. Connolly, Nicholas J. Higham, and Theo Mary, Stochastic Rounding and Its Probabilistic Backward Error Analysis, SIAM J. Sci. Comput. 43(1), A566–A585, 2021.
Nicholas J. Higham, Numerical Stability of Algorithms at Extreme Scale and Low Precisions, MIMS EPrint 2021.14, Manchester Institute for Mathematical Sciences, The University of Manchester, UK, September 2021.
Nicholas Higham and Theo Mary, A New Approach to Probabilistic Rounding Error Analysis, SIAM J. Sci. Comput. 41(5), A2815–A2835, 2019.
G. W. Stewart, Stochastic Perturbation Theory, SIAM Rev. 32(4), 579–610, 1990.

Videos from New Directions in Numerical Linear Algebra and High Performance Computing Workshop

In July 2021, Sven Hammarling, Françoise Tisseur and I organized an online workshop New Directions in Numerical Linear Algebra and High Performance Computing. The workshop brought together researchers working in numerical linear algebra and high performance computing to discuss current developments and challenges in the light of evolving computer hardware. It was held to honour Jack Dongarra on the occasion of his 70th birthday. The workshop had been postponed from July 2020 as a result of the pandemic.

Videos of the talks are now available on the Numerical Linear Algebra Group’s YouTube channel and are included below. Slides for the talks are available on the workshop website.

Sven Hammarling (The University of Manchester), “Jack Dongarra”.

Iain Duff (STFC-RAL and CERFACS), “Jack”

James Demmel (University of California, Berkeley), “New Communication-Avoiding Algorithms, and Fixing Old Bugs in the BLAS and LAPACK”

Piotr Luszczek (University of Tennessee), “Numerical Methods and Across Scales, Precisions and Hardware Platforms”

Cleve Moler (MathWorks), “Computers That I Have Known”

Yves Robert (Ecole Normale Supérieure de Lyon), “25+ Years of Scheduling at ICL”

Françoise Tisseur (The University of Manchester), “Mixed Precision Tall and Thin QR Factorization with Applications”

David Keyes (King Abdullah University of Science and Technology), “Adaptive Nonlinear Preconditioning for PDEs with Error Bounds on Output Functionals”

Zhaojun Bai (University of California, Davis), “Many Eigenpair Computation Via Hotelling’S Deflation”,

Ilse Ipsen (North Carolina State University), “A Few Observations About Summation Algorithms”

Erich Strohmaier (TOP500), “TOP500 and Accidental Benchmarking”

Nick Higham (The University of Manchester), “Solving Dense Linear Systems: A Brief History and Future Directions ”

Jack Dongarra (University of Tennessee, Oak Ridge Laboratory and The University of Manchester), “Still Having Fun After 50 Years”,

SIAM AN21 Minisymposium on Bohemian Matrices and Applications

Image ViridisDragonEye10 courtesy of Rob Corless.

The two-part minisymposium Bohemian Matrices and Applications, organized by Rob Corless and I, took place at the SIAM Annual Meeting, July 22 and 23, 2021. This page makes available slides from some of the talks.

The minisymposium followed a two-part minisymposium on Bohemian matrices at the 2019 ICIAM meeting in Valencia and a 3-day workshop on Bohemian matrices in Manchester in 2018.

For more on Bohemian matrices see the Bohemian matrices website.

Minisymposium description: Bohemian matrices are matrices with entries drawn from a fixed discrete set of small integers (or some other discrete set). The term is a contraction of BOunded HEight Matrix of Integers. Such matrices arise in many applications, and include $(0,1)$ graph incidence matrices and $(-1,1)$ Bernoulli matrices. The questions of interest range from identifying structures in the spectra of particular classes of Bohemian matrix to searching for most ill conditioned matrices within a class, and applications include stress-testing algorithms and software. This minisymposium will report recent theoretical and computational progress as well as open questions.

Putting Skew-Symmetric Tridiagonal Bohemians on the Calendar. Robert M. Corless, Western University, Canada. Abstract. Rob did not use slides but gave his talk using this paper and this Maple worksheet.

Determinants of Normalized Bohemian Upper Hessenberg Matrices. Massimiliano Fasi, Örebro University, Sweden; Jishe Feng, Longdong University, China; Gian Maria Negri Porzio, University of Manchester, United Kingdom. Abstract. Slides.

Experiments on Upper Hessenberg and Toeplitz Bohemians. Eunice Chan, Western University, Canada. Abstract. Slides.

Eigenvalues of Magic Squares and Related Bohemian Matrices. Hariprasad Manjunath Hegde, Indian Institute of Science, Bengaluru, India. Abstract. Slides.

Calculating the 3D Kings Multiplicity Constant. Nicholas Cohen and Neil Calkin, Clemson University, U.S. Abstract. Slides.

Bohemian Inners Inverses: A First Step Toward Bohemian Generalized Inverses. Laureano Gonzalez-Vega, Universidad de Cantabria, Spain; Juan Rafael Sendra, Universidad Alcalá de Henares, Spain; Juana Sendra Pons, Universidad Politécnica de Madrid, Spain. Abstract. Slides.

Recent Progress in the Rational Factorisation of Integer Matrices. Matthew Lettington, Cardiff University, United Kingdom. Abstract. Slides.

Which Columns are Independent? Why does Row Rank = Column Rank? Gilbert Strang, Massachusetts Institute of Technology, U.S. Abstract. Slides.

Bohemian Matrices: the Symbolic Computation Approach. Juana Sendra, Universidad Autónoma de Madrid, Spain; Laureano González-Vega, Universidad de Estudios Financieros en Madrid, Spain; Juan Rafael Sendra, Universidad Alcalá de Henares, Spain. Abstract. Slides.

What Is a Totally Nonnegative Matrix?

The determinant of a square submatrix of a matrix is called a minor. A matrix $A\in\mathbb{R}^{n\times n}$ is totally positive if every minor is positive. It is totally nonnegative if every minor is nonnegative. These definitions require, in particular, that all the matrix elements must be nonnegative or positive, as must $\det(A)$ .

An important property is that total nonnegativity is preserved under matrix multiplication and hence under taking positive integer powers.

Theorem 1. If $A,B\in\mathbb{R}^{n\times n}$ are totally nonnegative then so is $AB$ .

Theorem 1 is a direct consequence of the Binet–Cauchy theorem on determinants (also known as the Cauchy–Binet theorem). To state it, we need a way of specifying submatrices. We say the vector $\alpha = [\alpha_1,\alpha_2,\dots,\alpha_k]$ is an index vector of order $k$ if its components are integers from the set $\{1,2,\dots,n\}$ satisfying $\alpha_1 < \alpha_2 < \cdots < \alpha_k$ . If $\alpha$ and $\beta$ are index vectors of order $k$ and $\ell$ , respectively, then $A(\alpha, \beta)$ denotes the $k\times \ell$ matrix with ( $i,j$ ) element $a_{\alpha_i,\beta_j}$ .

Theorem 2. (Binet–Cauchy) Let $A\in\mathbb{R}^{m\times n}$ , $B\in\mathbb{R}^{n\times p}$ , and $C = AB$ . If $\alpha$ and $\beta$ are index vectors of order $k$ and $1 \le k \le \min(m,n,p)$ then

$\notag \det(C(\alpha,\beta)) = \sum_{\gamma} \det( A(\alpha,\gamma) ) \det( B(\gamma,\beta) ), \qquad (1)$

where the sum is over all index vectors $\gamma$ of order $k$ .

Note than when $k = m = n = p$ , (1) reduces to the well-known relation $\det(AB) = \det(A)\det(B)$ , while when $k = 1$ , (1) reduces to the definition of matrix multiplication.

Totally nonnegative matrices have many interesting determinantal properties. For example, they satisfy Fischer’s inequality, first proved for symmetric positive definite matrices.

Theorem 3. (Fischer) If $A\in\mathbb{R}^{n\times n}$ is totally nonnegative then for any index vector $\alpha$ ,

$\notag \det(A) \le \det(A(\alpha)) \det(A(\alpha^c)), \qquad (2)$

where $\alpha^c$ comprises the indices not in $\alpha$ .

By repeatedly applying (2) with $\alpha$ containing just one element, we obtain Hadamard’s inequality for totally nonnegative $A$ :

$\notag \det(A) \le a_{11} a_{22} \cdots a_{nn}.$

Examples

We give some examples of totally positive matrices, showing how they can be generated in MATLAB. We use the Anymatrix toolbox.

A matrix well known to be positive definite, but which is also totally positive, is the Hilbert matrix $H\in\mathbb{R}^{n\times n}$ , with $h_{ij} = 1/(i+j-1)$ . The Hilbert matrix is a particular case of a Cauchy matrix $C$ , with $c_{ij} = 1/(x_i + y_j)$ for given vectors $x,y\in\mathbb{R}^{n\times n}$ . A Cauchy matrix is totally positive if $0 < x_1 < x_2 < \cdots < x_n$ and $0 < y_1 < y_2 < \cdots < y_n$ , which follows from the formula

$\notag \det(C_n) = \displaystyle\frac{\displaystyle\prod_{1\le i < j \le n} (x_j-x_i) (y_j-y_i) } {\displaystyle\prod_{1\le i,j \le n} (x_i+y_j) }.$

In MATLAB, the Hilbert matrix is hilb(n) and the Cauchy matrix can be generated by gallery('cauchy',x,y) (or anymatrix('gallery/cauchy',x,y)).

A Vandermonde matrix

$\notag V = V(x_1,x_2,\dots,x_n) = \begin{bmatrix} 1 & 1 & \dots & 1 \\ x_1 & x_2 & \dots & x_n \\ \vdots &\vdots & & \vdots \\ x_1^{n-1} & x_2^{n-1} & \dots & x_n^{n-1} \end{bmatrix} \in \mathbb{C}^{n\times n}$

is totally positive if the points $x_i$ satisfy $0 < x_1 < x_2 < \cdots < x_n$ . As a partial check, the general formula

$\notag \det(V) = \displaystyle\prod_{1\le i < j \le n}^n (x_i - x_j)$

shows that every leading principal minor is positive. In MATLAB, a Vandermonde matrix can be generated by anymatrix('core/vand',x).

The Pascal matrix $P_n\in\mathbb{R}^{n\times n}$ is defined by

$p_{ij} = \displaystyle\frac{ (i+j-2)! }{ (i-1)! (j-1)! } = {i+j-2 \choose j-1}.$

For example, in MATLAB:

>> P = pascal(5)
P =
     1     1     1     1     1
     1     2     3     4     5
     1     3     6    10    15
     1     4    10    20    35
     1     5    15    35    70

The Pascal matrix is totally positive for all $n$ (see the section below on bidiagonal factorizations).

The one-parameter correlation matrix $C_n(\theta)$ with off-diagonal elements given by $\theta$ with $0 \le \theta < 1$ , illustrated by

$\notag C_3(\theta) = \begin{bmatrix} 1 & \theta & \theta \\ \theta & 1 & \theta \\ \theta & \theta & 1 \\ \end{bmatrix},$

is not totally positive because while the principal minors are all positive, the submatrix $A([1,2],[2,3]) = \bigl[\begin{smallmatrix} \theta & \theta \\ 1 & \theta \end{smallmatrix}\bigr]$ has nonpositive determinant. However, the Kac–Murdock–Szegö matrix $K_n(\theta) = (\theta^{|i-j|})$ , with $0 \le \theta < 1$ , illustrated by

$\notag K_3(\theta) = \begin{bmatrix} 1 & \theta & \theta^2 \\ \theta & 1 & \theta \\ \theta^2 & \theta & 1 \\ \end{bmatrix}$

is totally positive thanks to the decay of the elements way from the diagonal. In MATLAB, the Kac–Murdock–Szegö matrix can be generated by gallery('kms',n,rho).

The lower Hessenberg Toeplitz matrix $H_n$ with all elements $1$ on and below the superdiagonal, illustrated for $n = 4$ by

$\notag H_4 = \begin{bmatrix} 1 & 1 & 0 & 0 \\ 1 & 1 & 1 & 0 \\ 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 \\ \end{bmatrix},$

is totally nonnegative. It has $\lfloor n/2 \rfloor$ zero eigenvalues, which appear in a single Jordan block, and its largest eigenvalue is $2(1+\cos(2\pi/(n+2)))$ . In MATLAB, this matrix can be generated by anymatrix('core/hessfull01',n). This and other binary totally nonnegative matrices are studied by Brualdi and Kirkland (2010).

Finally, consider a nonnegative $4\times 4$ bidiagonal matrix factorized into a product of elementary nonnegative bidiagonal matrices (nonnegative means that the elements of the matrix are nonnegative):

$\notag \begin{aligned} L = \begin{bmatrix} 1 & 0 & 0 & 0 \\ \ell_{21} & 1 & 0 & 0 \\ 0 & \ell_{32} & 1 & 0 \\ 0 & 9 & \ell_{43} & 1 \\ \end{bmatrix} &= \begin{bmatrix} 1 & 0 & 0 & 0 \\ \ell_{21} & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & \ell_{32} & 1 & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & \ell_{43} & 1 \\ \end{bmatrix}\\ &\equiv L_1(\ell_{21}) L_2(\ell_{32}) L_3(\ell_{43}). \end{aligned}$

It is easy to see by inspection that $L_1$ , $L_2$ , and $L_3$ are totally nonnegative, so $L$ is totally nonnegative by Theorem 1. With $D = \mathrm{diag}(1,-1,1,1,-1)$ , we have

$\notag \begin{aligned} DL^{-1}D &= (DLD)^{-1} = (DL_1D \cdot DL_2D \cdot DL_3D )^{-1}\\ &= (DL_3D)^{-1} (DL_2D)^{-1} (DL_3D)^{-1}\\ &= L_3(\ell_{43}) L_2(\ell_{32}) L_1(\ell_{21}), \qquad (3)\ \end{aligned}$

which is a product of totally nonnegative matrices and hence is totally nonnegative by Theorem 1. This example clearly generalizes to show that an $n\times n$ nonnegative bidiagonal matrix is totally nonnegative.

Inverse

Recall that the inverse of a nonsingular $A\in\mathbb{R}^{n\times n}$ is given by $A^{-1} = \mathrm{adj}(A)/\det(A)$ , where

$\mathrm{adj}(A) = \bigl( (-1)^{i+j} \det(A_{ji}) \bigr)$

and $A_{pq}$ denotes the submatrix of $A$ obtained by deleting row $p$ and column $q$ . If $A$ is nonsingular and totally nonnegative then it follows that $A^{-1}$ has a checkerboard (alternating) sign pattern. Indeed, we can write $A^{-1} = DBD$ , where $D = \mathrm{diag}((-1)^{i+1})$ and $B$ has nonnegative elements, and in fact it can be shown that $B$ is totally nonnegative using Theorem 1, Theorem 6, and (3). For example, here is the inverse of the $4\times 4$ Pascal matrix:

>> inv(sym(pascal(5)))
ans =
[  5, -10,  10,  -5,  1]
[-10,  30, -35,  19, -4]
[ 10, -35,  46, -27,  6]
[ -5,  19, -27,  17, -4]
[  1,  -4,   6,  -4,  1]

Eigensystem

A totally nonnegative matrix has nonnegative trace and determinant, so the sum and product of its eigenvalues are both nonnegative. In fact, all the eigenvalues are real and nonnegative. Since a Jordan block corresponding to a nonnegative eigenvalue is totally nonnegative any Jordan form with nonnegative eigenvalues is possible. More can be said of $A$ is irreducible. Recall that a matrix $A\in\mathbb{C}^{n\times n}$ is irreducible if there does not exist a permutation matrix $P$ such that

$\notag P^TAP = \begin{bmatrix} A_{11} & A_{12} \\ 0 & A_{22} \end{bmatrix}$

where $A_{11}$ and $A_{22}$ are square, nonempty submatrices.

Theorem 4. If $A\in\mathbb{R}^{n\times n}$ is totally nonnegative then its eigenvalues are all real and nonnegative. If $A$ is also irreducible then the positive eigenvalues are distinct.

If $A$ is nonsingular and totally nonnegative and irreducible then by the theorem we can write the eigenvalues as $\lambda_1 > \lambda_2 > \cdots > \lambda_n >0$ . It is known that the eigenvector $x_k$ associated with $\lambda_k$ has $k-1$ sign changes, that is, $(x_k)_{i+1}$ and ( $x_k)_i$ have opposite signs for $k-1$ values of $i$ (any zero elements are deleted before counting sign changes). Note that for $k=1$ , we already know from Perron–Frobenius theory that there is a positive eigenvector $x_1$ . This result is illustrated by the Pascal matrix above:

>> A = pascal(5); [V,d] = eig(A,'vector'); [~,k] = sort(d,'descend');
>> evals = d', evecs = V(:,k)
evals =
   1.0835e-02   1.8124e-01   1.0000e+00   5.5175e+00   9.2290e+01
evecs =
   1.7491e-02   2.4293e-01  -7.6605e-01  -5.7063e-01   1.6803e-01
   7.4918e-02   4.8079e-01  -3.8302e-01   5.5872e-01  -5.5168e-01
   2.0547e-01   6.1098e-01   1.6415e-01   2.5292e-01   7.0255e-01
   4.5154e-01   4.1303e-01   4.3774e-01  -5.1785e-01  -4.0710e-01
   8.6486e-01  -4.0736e-01  -2.1887e-01   1.7342e-01   9.0025e-02

Note that the number of sign changes (but not the number of negative elements) increases by $1$ as we go from one column to the next

The class of nonsingular totally nonnegative irreducible matrices is known as the oscillatory matrices, because such matrices arise in the analysis of small oscillations of elastic systems. An equivalent definition (in fact, the usual definition) is that an oscillatory matrix is a totally nonnegative matrix for which $A^q$ is totally positive for some positive integer $q$ .

LU Factorization

The next result shows that a totally nonnegative matrix has an LU factorization with special properties. We will need the following special case of Fischer’s inequality (Theorem 3):

$\notag \det(A) \le \det \bigl( A(1\colon p,1\colon p) \bigr) \det \bigl( A(p+1\colon n,p+1\colon n) \bigr), \quad p=1\colon n-1. \qquad (4)$

Theorem 5. If $A\in\mathbb{R}^{n\times n}$ is nonsingular and totally nonnegative then it has an LU factorization with $L$ and $U$ totally nonnegative and the growth factor $\rho_n = 1$ .

Proof. Since $A$ is nonsingular and every minor is nonnegative, (4) shows that $\det(A(1\colon p,1\colon p))>0$ for $p=1\colon n-1$ , which guarantees the existence of an LU factorization. That the elements of $L$ and $U$ are nonnegative follows from explicit determinantal formulas for the elements of $L$ and $U$ . The total nonnegativity of $L$ and $U$ is proved by Cryer (1976). Gaussian elimination starts with $A^{(1)} = A$ and computes $a_{ij}^{(k+1)} = a_{ij}^{(k)} - m_{ik}a_{kj}^{(k)} = a_{ij}^{(k)} - \ell_{ik} u_{kj} \le a_{ij}^{(k)}$ , since $\ell_{ik}, u_{kj} \ge 0$ . Thus $a_{ij} = a_{ij}^{(1)} \ge a_{ij}^{(2)} \ge \cdots \ge a_{ij}^{(r)}$ , $r = \min (i,j)$ . For $i > j$ , $a_{ij}^{(r)} \ge a_{ij}^{(r+1)} = \cdots = a_{ij}^{(n)} = 0$ ; for $j \ge i$ , $a_{ij}^{(r)} = \cdots = a_{ij}^{(n)} = u_{ij} \ge 0$ . Thus $0 \le a_{ij}^{(k)} \le a_{ij}$ for all $i,j,k$ and hence $\rho_n \le 1$ . But $\rho_n\ge1$ , so $\rho_n=1$ .

Theorem 5 implies that it is safe to compute the LU factorization without pivoting of a nonsingular totally nonnegativity matrix: the factorization does not break down and it is numerically stable. In fact, the computed LU factors have a strong componentwise form of stability. As shown by De Boor and Pinkus (1977), for small enough unit roundoff $u$ the computed factors $\widehat{L}$ and $\widehat{U}$ will have nonnegative elements and so from the standard backward error result for LU factorization,

$\notag \widehat{L}\widehat{U} = A + \Delta A, \quad |\Delta A| \le \gamma_n |\widehat{L}||\widehat{U}| \quad \Bigl(\gamma_n = \displaystyle\frac{nu}{1-nu} \Bigr),$

we have

$\notag |\widehat{L}||\widehat{U}| = |\widehat{L}\widehat{U}| = |A + \Delta A| \le |A| + \gamma_n |\widehat{L}||\widehat{U}|,$

which gives $|\widehat{L}||\widehat{U}| \le (1 - \gamma_n)^{-1}|A|$ and hence

$\notag \widehat{L}\widehat{U} = A + \Delta A, \quad |\Delta A| \le \displaystyle\frac{\gamma_n}{1-\gamma_n} |A|,$

which is about as strong a backward error result as we could hope for. The significance of this result is reduced, however, by the fact that for some important classes of totally nonnegative matrices, including Vandermonde matrices and Cauchy matrices, structure-exploiting linear system solvers exist that are substantially faster, and potentially more accurate, than LU factorization.

Factorization into a Product of Bidiagonal Matrices

We showed above that any nonnegative bidiagonal matrix is totally nonnegative. The next result shows that any nonsingular totally nonnegative matrix has an LU factorization in which $L$ and $U$ can be factorized into a product of nonnegative bidiagonal matrices.

Theorem 6. (Gasca and Peña, 1996) A nonsingular matrix $A\in\mathbb{R}^{n\times n}$ is totally nonnegative if and only if it it can be factorized as

$\notag A = L_{n-1} L_{n-2} \dots L_1 D U_1 U_2 \dots U_{n-1}, \qquad (5)$

where $D$ is a diagonal matrix with positive diagonal entries and $L_i$ and $U_i$ are unit lower and unit upper bidiagonal matrices, respectively, with the first $i-1$ entries along the subdiagonal of $L_i$ and $U_i^T$ zero and the rest nonnegative.

An analogue of Theorem 6 holds for totally positive matrices, the only difference being that the last $n-i+1$ subdiagonal entries of $L_i$ and $U_i^T$ are positive.

The factorization (5) can be computed by Neville elimination, which is a version of Gaussian elimination in which the eliminations are between adjacent rows, working from the bottom of each column upwards.

This factorization into bidiagonal factors can be used to obtain simple proofs of various properties of totally nonnegative matrices and totally positive matrices (Fallat, 2001). It also provides an efficient way to generates such matrices. If all the parameters in $D$ and the $L_i$ and $U_i$ are set to $1$ then the Pascal matrix is generated.

Testing for Total Positivity

An $n\times n$ matrix has $\sum_{i=1}^n {n \choose k} = 2^n-1$ principal minors (ones based on submatrices centred on the diagonal) and $\sum_{i=1}^n {n \choose k}^2 = {2n \choose n} -1 \approx 4^n/(n\pi)^{1/2}$ minors in total. However, it is not necessary to check all these minors to test for total positivity.

Theorem 7. (Gasca and Peña, 1996) The matrix $A\in\mathbb{R}^{n\times n}$ is totally positive if and only if $\det(A(\alpha,\beta)) > 0$ for all index vectors $\alpha$ and $\beta$ such that one of $\alpha$ and $\beta$ is $[1,2,\dots,k]$ and the entries of the other are $k$ consecutive integers.

Theorem 7 shows that only $n(n+1)$ minors need to be tested. Gasca and Peña have also show that total nonnegativity can be tested by checking about $2^{n+1} + n^2/2$ minors. A more efficient way to test for total nonnegativity is to compute the factorization in Theorem 6 and check the signs of the entries.

Notes

The results we have described show that totally nonnegative and totally positive matrices are analogous in many ways to symmetric positive (semi)definite matrices. The analogies go further because totally nonnegative and totally positive matrices also satisfy eigenvalue interlacing inequalities (albeit weaker than for symmetric matrices) and the eigenvalues of an oscillatory matrix majorize the diagonal elements. See Fallat and Johnson (2011) or Fallat (2014) for details.

References

This is a minimal set of references, which contain further useful references within.

Richard A. Brualdi and Steve Kirkland, Totally Nonnegative (0,1)-Matrices, Linear Algebra Appl. 432, 1650–1662, 2010.
Colin W. Cryer, Some Properties of Totally Positive Matrices, Linear Algebra Appl. 15, 1–25, 1976.
Carl de Boor and Allan Pinkus, Backward Error Analysis for Totally Positive Linear Systems, 27, 485–490, 1977.
Mariano Gasca and Juan M. Peña, On Factorizations of Totally Positive Matrices, in Mariano Gasca and Charles Micchelli, eds, Total Positivity and Its Applications, 109–130, Springer, 1996.
Shaun M. Fallat, Bidiagonal Factorizations of Totally Nonnegative Matrices, Amer. Math. Monthly 108 (6), 697–712, 2001.
Shaun M. Fallat, Totally Positive and Totally Nonnegative Matrices, in Handbook of Linear Algebra, Leslie Hogben, ed, 29.1–29.17, Chapman and Hall/CRC, 2014.
Shaun M. Fallat and Charles R. Johnson, Totally Nonnegative Matrices, Princeton University Press, Princeton, NJ, USA, 2011.

What Is the Perron–Frobenius Theorem?

A real matrix is nonnegative if all its elements are nonnegative and it is positive if all its elements are positive. Nonnegative matrices arise in a wide variety of applications, for example as matrices of probabilities in Markov processes and as adjacency matrices of graphs. Information about the eigensystem is often essential in these applications.

Perron (1907) proved results about the eigensystem of a positive matrix and Frobenius (1912) extended them to nonnegative matrices.

The following three results of increasing specificity summarize the key spectral properties of nonnegative matrices proved by Perron and Frobenius. Recall that a simple eigenvalue of an $n\times n$ matrix is one with algebraic multiplicity $1$ , that is, it occurs only once in the set of $n$ eigenvalues. We denote by $\rho(A)$ the spectral radius of $A$ , the largest absolute value of any eigenvalue of $A$ .

Theorem 1. (Perron–Frobenius) If $A\in\mathbb{R}^{n\times n}$ is nonnegative then

$\rho(A)$ is an eigenvalue of $A$ ,

there is a nonnegative eigenvector $x$ such that $Ax = \rho(A)x$ .

A matrix $A\in\mathbb{C}^{n\times n}$ is reducible if there is a permutation matrix $P$ such that

$\notag P^TAP = \begin{bmatrix} A_{11} & A_{12} \\ 0 & A_{22} \end{bmatrix}$

where $A_{11}$ and $A_{22}$ are square, nonempty submatrices; it is irreducible if it is not reducible. Examples of reducible matrices are triangular matrices and matrices with a zero row or column. A positive matrix is trivially irreducible.

Theorem 2. (Perron–Frobenius) If $A\in\mathbb{R}^{n\times n}$ is nonnegative and irreducible then

$\rho(A)$ is an eigenvalue of $A$ ,

$\rho(A)>0$ ,

there is a positive eigenvector $x$ such that $Ax = \rho(A) x$ ,

$\rho(A)$ is a simple eigenvalue.

Theorem 3. (Perron) If $A\in\mathbb{R}^{n\times n}$ is positive then Theorem 2 holds and, in addition, $|\lambda| < \rho(A)$ for any eigenvalue $\lambda$ with $\lambda \ne \rho(A)$ .

For nonnegative, irreducible $A$ , the eigenvalue $\rho(A)$ is called the Perron root of $A$ and the corresponding positive eigenvector $x$ , normalized so that $\|x\|_1 = 1$ , is called the Perron vector.

It is a good exercise to apply the theorems to all binary $2\times 2$ matrices. Here are some interesting cases.

$A = \bigl[\begin{smallmatrix}0 & 1 \\ 0 & 0 \end{smallmatrix}\bigr]$ : Theorem 1 says that $\rho(A) = 0$ is an eigenvalue and and that it has a nonnegative eigenvector. Indeed $[1~0]^T$ is an eigenvector. Note that $A$ is reducible and $0$ is a repeated eigenvalue.
$A = \bigl[\begin{smallmatrix}0 & 1 \\ 1 & 0 \end{smallmatrix}\bigr]$ : $A$ is irreducible and Theorem 2 says that $\rho(A)$ is a simple eigenvalue with positive eigenvector. Indeed the eigenvalues are $\pm 1$ and $[1~1]^T/2$ is the Perron vector for the Perron root $1$ . This matrix has two eigenvalues of maximal modulus.
$A = \bigl[\begin{smallmatrix}1 & 1 \\ 1 & 1 \end{smallmatrix}\bigr]$ : Theorem 3 says that $\rho(A) = 2$ is an eigenvalue with positive eigenvector and that the other eigenvalue has modulus less than $2$ . Indeed the eigenvalues are the Perron root $2$ , with Perron vector $[1~1]^T/2$ , and $0$ .

For another example, consider the irreducible matrix

$\notag B = \begin{bmatrix} 0 & 0 & 1\\ 1 & 0 & 0\\ 0 & 1 & 0 \end{bmatrix}, \quad \Lambda(B) = \bigl\{ 1, \textstyle\frac{1}{2}( -1 \pm \sqrt{3}\mskip1mu\mathrm{i} ) \bigr\}.$

Note that $B$ is a companion matrix and a permutation matrix. Theorem 2 correctly tells us that $\rho(A) = 1$ is an eigenvalue of $A$ , and that it has a corresponding positive eigenvector, the Perron vector $[1~1~1]^T/3$ . Two of the eigenvalues are complex, however, and all three eigenvalues have modulus 1, as they must because $B$ is orthogonal.

A stochastic matrix is a nonnegative matrix whose row sums are all equal to $1$ . A stochastic matrix satisfies $Ae = e$ , where $e = [1,1,\dots,1]^T$ , which means that $A$ has an eigenvalue $1$ , and so $\rho(A) \ge 1$ . Since $\rho(A) \le \|A\|$ for any norm, by taking the $\infty$ -norm we conclude that $\rho(A) = 1$ . For a stochastic matrix, Theorem 1 does not give any further information. If $A$ is irreducible then Theorem 2 tells us that $\rho(A)$ is a simple eigenvalue, and if $A$ is positive Theorem 3 tells us that every other eigenvalue has modulus less than $\rho(A)$ .

The next result is easily proved using Theorem 3 together with the Jordan canonical form. It shows that the powers of a positive matrix behave like multiples of a rank-1 matrix.

Theorem 4. If $A\in\mathbb{R}^{n\times n}$ is positive, $x$ is the Perron vector of $A$ , and $y$ is the Perron vector of $A^T$ then

$\notag \displaystyle\lim_{k\to\infty} \left( \displaystyle\frac{A}{\rho(A)} \right)^k = \displaystyle\frac{xy^T}{y^Tx}.$

Note that $y$ in the theorem is a left eigenvector of $A$ corresponding to $\rho(A)$ , that is, $y^TA = \rho(A)y^T$ (since $\rho(A^T) = \rho(A)$ ).

If $A$ is stochastic and positive then Theorem 4 is applicable and $x = n^{-1}e$ . If $A$ also has unit column sums, so that it is doubly stochastic, then $y = n^{-1}e$ and Theorem 4 says that $\lim_{k\to\infty}A^k = n^{-1}ee^T$ . We illustrate this result in MATLAB using a scaled magic square matrix.

>> n = 4; M = magic(n), A = M/sum(M(1,:)) % Doubly stochastic matrix.
A =
    16     2     3    13
     5    11    10     8
     9     7     6    12
     4    14    15     1
A =
   4.7059e-01   5.8824e-02   8.8235e-02   3.8235e-01
   1.4706e-01   3.2353e-01   2.9412e-01   2.3529e-01
   2.6471e-01   2.0588e-01   1.7647e-01   3.5294e-01
   1.1765e-01   4.1176e-01   4.4118e-01   2.9412e-02

>> for k = 8:8:32, fprintf('%11.2e',norm(A^k-ones(n)/n,1)), end, disp(' ')
   3.21e-05   7.37e-10   1.71e-14   8.05e-16

References

This is a minimal set of references, which contain further useful references within.

Roger A. Horn and Charles R. Johnson, Matrix Analysis, second edition, Cambridge University Press, 2013. Chapter 8. My review of the second edition.
Carl D. Meyer, Matrix Analysis and Applied Linear Algebra, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2000. Chapter 8.
Helene Shapiro, Linear Algebra and Matrices. Topics for a Second Course, American Mathematical Society, Providence, RI, USA, 2015. Chapter 17.

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

What Is the Kac–Murdock–Szegö Matrix?

The Kac–Murdock–Szegö matrix is the symmetric Toeplitz matrix

$\notag A_n(\rho) = \begin{bmatrix} 1 & \rho & \rho^2 & \dots & \rho^{n-1} \\ \rho & 1 & \rho & \dots & \rho^{n-2} \\ \rho^2 & \rho & 1 & \ddots & \vdots \\ \vdots & \vdots & \ddots & \ddots & \rho \\ \rho^{n-1} & \rho^{n-2} & \dots & \rho & 1 \end{bmatrix} \in\mathbb{R}^{n\times n}. \qquad(1)$

It was considered by Kac, Murdock, and Szegö (1953), who investigated its spectral properties. It arises in the autoregressive AR(1) model in statistics and signal processing.

The matrix is singular for $\rho=1$ , as $A_n(1)$ is the rank- $1$ matrix $ee^T$ , and it is also rank- $1$ for $\rho = -1$ , as in this case every column is a multiple of the vector with alternating elements $\pm 1$ . The determinant $\det(A_n(\rho)) = (1-\rho^2)^{n-1}$ . For $\rho \ne \pm 1$ , $A_n$ is nonsingular and the inverse is the tridiagonal (but not Toeplitz) matrix

$\notag A_n(\rho)^{-1} = \displaystyle\frac{1}{1-\rho^2} \begin{bmatrix} 1 & -\rho & 0 & \dots & \dots & 0 \\ -\rho & 1+\rho^2 & -\rho & \dots & \dots & 0 \\ 0 & -\rho & 1+\rho^2 & \ddots & \dots & \vdots \\ \vdots & \vdots & \ddots & \ddots & \ddots & 0 \\ 0 & \dots & \dots & -\rho & 1+\rho^2 & -\rho \\ 0 & \dots & \dots & 0 & -\rho & 1 \end{bmatrix}. \qquad (2)$

For $-1 < \rho < 1$ , $A_n(\rho)$ is positive definite, since every leading principal submatrix has positive determinant, as can also be seen by noting that the inverse is diagonally dominant with positive diagonal, so that $A_n^{-1}$ is positive definite and hence $A_n$ is positive definite.

For $-1 \le \rho \le 1$ , $A_n(\rho)$ is positive semidefinite, so it is a correlation matrix for $\rho$ in this range.

For $0 \le \rho \le 1$ , $A_n(\rho)$ is totally nonnegative, that is. every submatrix has nonnegative determinant. For $0 < \rho < 1$ , we know that $A_n(\rho)$ is nonsingular, and it is clearly irreducible, and together with the total nonnegativity these properties imply that the eigenvalues are distinct and positive (this can also be deduced from the fact that the inverse is tridiagonal with nonzero subdiagonal and superdiagonal entries).

It is straightforward to verify that $A_n$ has a factorization $A_n = LDL^*$ with $L$ the inverse of a unit lower bidiagonal matrix:

$\notag L = \begin{bmatrix} 1 & & & &\\ -\rho & 1 & & &\\ & -\rho & 1 & &\\ & & \ddots & \ddots &\\ & & &-\rho &1 \end{bmatrix}^{-1}, \quad D = \mathrm{diag}(1, 1-\rho^2, 1-\rho^2, \dots, 1-\rho^2). \qquad (3)$

This factorization can be used to prove all the properties stated above.

From (1) and (2) we can derive the formulas

$\notag \begin{aligned} \|A_n\|_{1,\infty} &= 2 \left(\displaystyle\frac{1-\rho^{k+1}}{1-\rho}\right) -1 - (2k-n+1) \rho^k, \quad k = \lfloor n/2 \rfloor, \\ \|A_n^{-1}\|_{\infty} &= (1+2\rho+\rho^2)/(1-\rho^2) = (1+\rho)/(1-\rho). \end{aligned}$

Hence we have an explicit formula for the condition number $\kappa_p(A_n) = \|A_n\|_{1,\infty} \|A_n^{-1}\|_{1,\infty}$ for $p = 1,\infty$ .

We can allow $\rho$ to be complex, in which case the definition (1) is modified to conjugate the elements below the diagonal. The factorization $A = LDL^*$ continues to hold with $D$ in $(2)$ replaced by $\mathrm{diag}(1, 1-|\rho|^2, 1-|\rho|^2, \dots, 1-|\rho|^2)$ .

The Kac–Murdock–Szegö matrix (for real or complex $\rho$ ) can be generated in MATLAB as gallery('kms',n,rho).

References

This is a minimal set of references, which contain further useful references within.

George Fikioris, Spectral Properties of Kac–Murdock-Szegö Matrices with a Complex Parameter, Linear Algebra Appl 553, 182–210, 2018.
M. Kac, W. L. Murdock, and G. Szegö, On the Eigen-values of Certain Hermitian Forms, Journal of Rational Mechanics and Analysis 2, 767–800, 1953.

Ian Gladwell (1944–2021)

By Len Freeman, Nick Higham and Jim Nagy.

Ian Gladwell giving talk “Software for the Numerical Solution of ODEs—a University of Manchester and NAG Library Perspective” at Numerical Analysis and Computers—50 Years of Progress, University of Manchester, June 16–17, 1998.

Ian Gladwell passed away on May 23, 2021 at the age of 76. He was born in Bolton, Lancashire in 1944. He did his secondary education at Thornleigh College, Bolton and was an undergraduate at Hertford College, University of Oxford, from where he graduated with a B.A. Hons. in Mathematics in 1966. He did his postgraduate studies at the University of Manchester, gaining an MSc in Numerical Analysis and Computing in 1967 and a PhD in Numerical Analysis in 1970. He was the first PhD student of Christopher T. H. Baker (1939–2017).

Ian was appointed Lecturer in the Department of Mathematics at the University of Manchester in 1969 and progressed to Senior Lecturer in 1980. He was a member of the Numerical Analysis Group (along with Christopher Baker, Len Freeman, George Hall, Will McLewin, Jack Williams (1943–2015), and Joan Walsh (1932–2017)) who, together with colleagues at UMIST, made Manchester a major centre of numerical analysis activity from the 1970s onwards.

Ian’s research focused on ordinary differential equation (ODE) initial value problems and boundary value problems, mathematical software, and parallel computing, and he had a wide knowledge of numerical analysis and scientific computing. He was perhaps best known for his pioneering work on mathematical software for the numerical solution of ODEs, much of which was published in the NAG Library and in the journal ACM Transactions on Mathematical Software. A particular topic of interest for Ian was algorithms and software for the numerical solution of almost block diagonal linear systems, which arise in discretizations of boundary value problems for ODEs and partial differential equations.

More details on Ian’s publications can be found at his MathSciNet author profile (subscription required). It lists 55 publications with 19 co-authors, among which Richard Brankin, Larry Shampine, Ruth Thomas, and Marcin Paprzycki are his most frequent co-authors.

In his time at Manchester he collaborated with a variety of colleagues both inside and outside the department, and he was always ready to offer advice to students and colleagues across the campus on numerical computing (as evidenced by the common sight of people waiting outside his office door to be seen).

Ian was instrumental in setting up the Manchester Numerical Analysis Reports, a long-running technical report series to which he contributed many items.

Ian had a five-month visit to the Department of Computer Science at the University of Toronto in 1975. Links between the Manchester and Toronto departments were strong, and over the years numerical analysts made several visits in both directions.

In the mid 1980s, Ian was one of the first people in the UK to have an email address: igladwel@uk.ac.ucl.cs. His email account was on a computer at University College London (UCL), because UCL hosted a gateway between JANET, the UK computer network, and ARPANET in the USA. Ian kindly allowed Nick Higham and Len Freeman use of the account to communicate with colleagues in the US.

Ian had long-standing collaborations with the Numerical Algorithms Group (NAG) Ltd., Oxford. He contributed many codes and associated documentation to the NAG Library, principally in ordinary differential equations. In a 1979 paper in ACM Trans. Math. Software he wrote

“When the NAG library structure was designed in the late 1960s, it was decided to devote a chapter, named DO2, to the numerical solution of systems of ordinary differential equations and that this chapter would be contributed by members of the Department of Mathematics, University of Manchester, and in particular by J. E. Walsh, G. Hall, and the author.”

Ian was a long-term member of NAG and of the NAG Technical Policy Committee, and during 1986 he held a Royal Society/Science and Engineering Research Council Industrial Fellowship at NAG.

Nick Higham was taught by Ian in an upper level undergraduate course “Numerical Linear Algebra” that Ian was giving for the first time, in 1981. As an MSc student and PhD student he benefited greatly from Ian’s advice about how to think about and do research.

Ian moved to the Department of Mathematics at Southern Methodist University (SMU), Dallas, as a Visiting Associate Professor in 1987, which became a permanent position in 1988. He had collaborated during the 1980s with Larry Shampine, who was working at Sandia National Laboratories until he moved to the SMU Mathematics Department in 1986.

Ian served as chair of the department 1988–1994 and again in 1998. He was also Director of Graduate Studies from 2005–2008. Ian excelled in these roles as mentor, which is recognized by a PhD fellowship in his honor. Jim Nagy was extremely fortunate to have Ian as his first department chair in 1992; Ian mentored him during the challenging tenure-track years, advising on research, teaching and more, including extensive editing of his first successful grant proposals.

Ian wrote the book Solving ODEs with MATLAB (2003) with Larry Shampine and Skip Thompson, which was described as “an excellent treatment of the fundamentals for solving ODEs using MATLAB” in Mathematical Reviews. It is Ian’s most highly cited work, with around 900 citations on Google Scholar at the time of writing.

Ian served as editor for ten journals, including as Associate Editor (2002–2005) and Editor-in-Chief (2005–2008) of ACM Transactions on Mathematical Software, as Associate Editor of the IMA Journal on Numerical Analysis (1988–2007), and as Associate Editor of Scalable Computing: Practice and Experience (2005–2010). A special issue of the latter journal in 2009 was dedicated to him on the occasion of his retirement from SMU

Ian was a long-term member of the Institute of Mathematics and Its Applications, of which he was a Fellow, and the Society for Industrial and Applied Mathematics.

According to the Mathematics Genealogy Project, Ian had 23 PhD students, equally split between Manchester and SMU, with one jointly supervised at the University of Bari.

What Is the Determinant of a Matrix?

The determinant of an $n\times n$ matrix $A$ is defined by

$\notag \det(A) = \displaystyle\sum_j (-1)^{\mathop{\mathrm{sgn}}j} a_{1,j_1}a_{2,j_2} \dots a_{n,j_n}, \qquad (1)$

where the sum is over all $n!$ permutations $j = (j_1,j_2,\dots,j_n$ ) of the sequence $(1,2,\dots,n)$ and $\mathop{\mathrm{sgn}}j$ is the number of inversions in $j$ , that is, the number of pairs $(j_k,j_\ell)$ with $k j_\ell$ . Each term in the sum is a signed product of $n$ entries of $A$ and the product contains one entry taken from each row and one from each column.

The determinant is sometimes written with vertical bars, as $|A|$ .

Three fundamental properties are

$\notag \begin{aligned} \det(\alpha A) &= \alpha^n \det(A)\; \mathrm{for~any~scalar~}\alpha,\qquad(2)\\ \det(A^T) &= \det(A), \qquad(3)\\ \det(AB) &= \det(A)\det(B) \mathrm{~for~} n\times n~ A \mathrm{~and~} B.\qquad(4) \end{aligned}$

The first property is immediate, the second can be proved using properties of permutations, and the third is proved in texts on linear algebra and matrix theory.

An alternative, recursive expression for the determinant is the Laplace expansion

$\notag \det(A) = \displaystyle\sum_{j=1}^n (-1)^{i+j} a_{ij} \det (A_{ij}). \qquad(5)$

for any $i\in\{1,2,\dots,n\}$ , where $A_{ij}$ denotes the $(n-1)\times (n-1)$ submatrix of $A$ obtained by deleting row $i$ and column $j$ , and $\det(a) = a$ for a scalar $a$ . This formula is called the expansion by minors because $\det (A_{ij})$ is a minor of $A$ .

For some types of matrices the determinant is easy to evaluate. If $T$ is triangular then $\det(T) = \prod_{i=1}^n t_{ii}$ . If $Q$ is unitary then $Q^*Q = I$ implies $|\det(Q)| = 1$ on using (3) and (4). An explicit formula exists for the determinant of a Vandermonde matrix.

The determinant of $A$ is connected with the eigenvalues $\lambda_i$ of $A$ via the property $\det(A) = \prod_{i=1}^n \lambda_i$ . Since the eigenvalues are the roots of the characteristic polynomial $\det(tI - A)$ , this relation follows by setting $t = 0$ in the expression

$\notag \det(tI - A) = t^n + a_{n-1}t^{n-1} + \cdots + a_1 t + a_0 = \displaystyle\prod_{i=1}^n (t - \lambda_i).$

For $n=2$ , the determinant is

$\notag \det\biggl( \begin{bmatrix} a & b \\ c & d \end{bmatrix} \biggr) = ad - bc,$

but already for $n=3$ the determinant is tedious to write down. If one must compute $\det(A)$ , the formulas (1) and (5) are too expensive unless $n$ is very small: they have an exponential cost. The best approach is to use a factorization of $A$ involving factors that are triangular or orthogonal, so that the determinants of the factors are easily computed. If $PA = LU$ is an LU factorization, with $P$ a permutation matrix, $L$ unit lower triangular, and $U$ upper triangular, then $\det(A) = \det(P) \prod_{i=1}^n u_{ii} = \pm \prod_{i=1}^n u_{ii}$ . As this expression indicates, the determinant is prone to overflow and underflow in floating-point arithmetic, so it may be preferable to compute $\log(|\det(A)|) = \sum_{i=1}^n \log|u_{ii}|$ .

The determinant features in the formula

$\notag A^{-1} = \displaystyle\frac{\mathrm{adj}(A)}{\det(A)}$

for the inverse, where $\mathrm{adj}(A)$ is the adjugate of $A$ (recall that $\mathrm{adj}(A)$ has $(i,j)$ element $(-1)^{i+j} \det(A_{ji})$ ). More generally, Cramer’s rule says that the components of the solution to a linear system $Ax = b$ are given by $x_i = \det(A_i(b))/\det(A)$ , where $A_i(b)$ denotes $A$ with its $i$ th column replaced by $b$ . While mathematically elegant, Cramer’s rule is of no practical use, as it is both expensive and numerically unstable in finite precision arithmetic.

Inequalities

A celebrated bound for the determinant of a Hermitian positive definite matrix $H\in\mathbb{C}^{n\times n}$ is Hadamard’s inequality. Note that for such $H$ , $\det(H)$ is real and positive (being the product of the eigenvalues, which are real and positive) and the diagonal elements are also real and positive (since $h_{ii} = e_i^*He_i > 0$ ).

Theorem 1 (Hadamard’s inequality). For a Hermitian positive definite matrix $H\in\mathbb{C}^{n\times n}$ ,

$\notag \det(H) \le \displaystyle\prod_{i=1}^n h_{ii},$

with equality if and only if $H$ is diagonal.

Theorem 1 is easy to prove using a Cholesky factorization.

The following corollary can be obtained by applying Theorem 1 to $H = A^*A$ or by using a QR factorization of $A$ .

Corollary 2. For $A = [a_1,a_2,\dots,a_n] \in\mathbb{C}^{n\times n}$ ,

$\notag \det(A) \le \displaystyle\prod_{i=1}^n \|a_i\|_2,$

with equality if and only if the columns of $A$ are orthogonal.

Obviously, one can apply the corollary to $A^*$ and obtain the analogous bound with column norms replaced by row norms.

The determinant of $A\in\mathbb{R}^{n\times n}$ can be interpreted as the volume of the parallelepiped $\{\, \sum_{i=1}^n t_ia_i : 0 \le t_i \le 1, ~ i = 1\colon n\,\}$ , whose sides are the columns of $A$ . Corollary 2 says that for columns of given lengths the volume is maximized when the columns are orthogonal.

Nearness to Singularity and Conditioning

The determinant characterizes nonsingularity: $A$ is singular if and only if $\det(A) = 0$ . It might be tempting to use $|\det(A)|$ as a measure of how close a nonsingular matrix $A$ is to being singular, but this measure is flawed, not least because of the sensitivity of the determinant to scaling. Indeed if $Q$ is unitary then $\det(\alpha Q) = \alpha^n \det(Q)$ can be given any value by a suitable choice of $\alpha$ , yet $\alpha Q$ is perfectly conditioned: $\kappa_2(\alpha Q) = 1$ , where $\kappa(A) = \|A\| \|A^{-1}\|$ is the condition number.

To deal with the poor scaling one might normalize the determinant: in view of Corollary 2,

$\notag \psi(A) = \displaystyle\frac{\prod_{i=1}^n \|a_i\|_2} {\det(A)}$

satisfies $\psi(A) \ge 1$ and $\psi(A) = 1$ if and only if the columns of $A$ are orthogonal. Birkhoff (1975) calls $\psi$ the Hadamard condition number. In general, $\psi$ is not related to the condition number $\kappa$ , but if $A$ has columns of unit $2$ -norm then it can be shown that $\kappa_2(A) < 2\psi(A)$ (Higham, 2002, Prob. 14.13). Dixon (1984) shows that for classes of $n\times n$ random matrices $A_n$ that include matrices with elements independently drawn from a normal distribution with mean $0$ , the probability that the inequality

$n^{1/4 - \epsilon} \mathrm{e}^{n/2} < \psi(A_n) < n^{1/4 + \epsilon} \mathrm{e}^{n/2}$

holds tends to $1$ as $n\to\infty$ for any $\epsilon > 0$ , so $\psi(A_n) \approx n^{1/4}\mathrm{e}^{n/2}$ for large $n$ . This exponential growth is much faster than the growth of $\kappa$ , for which Edelman (1998) showed that for the standard normal distribution, $\mathbb{E}(\log(\kappa_2(A_n))) \approx \log n + 1.537$ , where $\mathbb{E}$ denotes the mean value. This MATLAB example illustrates these points.

>> rng(1); n = 50; A = randn(n); 
>> psi = prod(sqrt(sum(A.*A)))/abs(det(A)), kappa2 = cond(A)
psi =
   5.3632e+10
kappa2 =
   1.5285e+02
>> ratio = psi/(n^(0.25)*exp(n/2))
ratio =
   2.8011e-01

The relative distance from $A$ to the set of singular matrices is equal to the reciprocal of the condition number.

Theorem 3 (Gastinel, Kahan). For $A\in\mathbb{C}^{n\times n}$ and any subordinate matrix norm,

$\notag \min \left\{ \displaystyle\frac{\|\Delta A\|}{\|A\|} : A+\Delta A\mathrm{~singular} \right\} = \displaystyle\frac{1}{\kappa(A)}.$

Notes

Determinants came before matrices, historically. Most linear algebra textbooks make significant use of determinants, but a lot can be done without them. Axler (1995) shows how the theory of eigenvalues can be developed without using determinants.

Determinants have little application in practical computations, but they are a useful theoretical tool in numerical analysis, particularly for proving nonsingularity.

There is a large number of formulas and identities for determinants. Sir Thomas Muir collected many of them in his five-volume magnum opus The Theory of Determinants in the Historical Order of Development, published between 1890 and 1930. Brualdi and Schneider (1983) give concise derivations of many identities using Gaussian elimination, bringing out connections between the identities.

The quantity obtained by modifying the definition (1) of determinant to remove the $(-1)^{\mathop{\mathrm{sgn}}j}$ term is the permanent. The permanent arises in combinatorics and quantum mechanics and is much harder to compute than the determinant: no algorithm is known for computing the permanent in $p(n)$ operations for a polynomial $p$ .

References

This is a minimal set of references, which contain further useful references within.

Sheldon Axler, Down With Determinants!, Amer. Math. Monthly 102, 139–154, 1995.
Garrett Birkhoff, Two Hadamard Numbers for Matrices, Comm. ACM 18, 25–29, 1975.
Richard Brualdi and Hans Schneider, Determinantal Identities: Gauss, Schur, Cauchy, Sylvester, Kronecker, Jacobi, Binet, Laplace, Muir, and Cayley, Linear Algebra Appl. 52–53, 769–791, 1983.
John Dixon, How Good is Hadamard’s Inequality for Determinants?, Canadian Math. Bulletin 27, 260–264, 1984.
Alan Edelman, Eigenvalues and Condition Numbers of Random Matrices, SIAM J. Matrix Anal. Appl. 9(4), 543–560, 1988.
Nicholas J. Higham, Accuracy and Stability of Numerical Algorithms, second edition, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2002.

What Is a Vandermonde Matrix?

A Vandermonde matrix is defined in terms of scalars $x_1$ , $x_2$ , …, $x_n\in\mathbb{C}$ by

The $x_i$ are called points or nodes. Note that while we have indexed the nodes from $1$ , they are usually indexed from $0$ in papers concerned with algorithms for solving Vandermonde systems.

Vandermonde matrices arise in polynomial interpolation. Suppose we wish to find a polynomial $p_{n-1}(x) = a_nx^{n-1} + a_{n-1}x^{n-2} + \cdots + a_1$ of degree at most $n-1$ that interpolates to the data $(x_i,f_i)_{i=1}^n$ , that is, $p_{n-1}(x_i) = f_i$ , $i=1\colon n$ . These equations are equivalent to

$\notag V^Ta = f \quad \mathrm{(dual)},$

where $a = [a_1,a_2,\dots,a_n]^T$ is the vector of coefficients. This is known as the dual problem. We know from polynomial interpolation theory that there is a unique interpolant if the $x_i$ are distinct, so this is the condition for $V$ to be nonsingular.

The problem

$\notag Vy = b \quad \mathrm{(primal)}$

is called the primal problem, and it arises when we determine the weights for a quadrature rule: given moments $b_i$ find weights $y_i$ such that $\sum_{j=1}^n y_j^{} x_j^{\,i-1} = b_i$ , $i=1\colon n$ .

Determinant

The determinant of $V$ is a function of the $n$ points $x_i$ . If $x_i = x_j$ for some $i\ne j$ then $V$ has identical $i$ th and $j$ th columns, so is singular. Hence the determinant must have a factor $x_i - x_j$ . Consequently, we have

$\notag \det( V(x_1,x_2,\dots,x_n) ) = c \displaystyle\prod_{i,j = 1\atop i > j}^n (x_i - x_j),$

where, since both sides have degree $n(n-1)/2$ in the $x_i$ , $c$ is a constant. But $\det(V)$ contains a term $x_2 x_3^2 \dots x_n^{n-1}$ (from the main diagonal), so $c = 1$ . Hence

$\notag \det(V) = \displaystyle\prod_{i,j = 1\atop i > j}^n (x_i - x_j). \qquad (1)$

This formula confirms that $V$ is nonsingular precisely when the $x_i$ are distinct.

Inverse

Now assume that $V$ is nonsingular and let $V^{-1} = W = (w_{ij})_{i,j=1}^n$ . Equating elements in the $i$ th row of $WV = I$ gives

$\sum_{j=1}^n w_{ij} x_k^{\mskip1mu j-1} = \delta_{ik}, \quad k=1\colon n,$

where $\delta_{ij}$ is the Kronecker delta (equal to $1$ if $i=j$ and $0$ otherwise). These equations say that the polynomial $\sum_{j=1}^n w_{ij} x^{\mskip1mu j-1}$ takes the value $1$ at $x = x_i$ and $0$ at $x = x_k$ , $k\ne i$ . It is not hard to see that this polynomial is the Lagrange basis polynomial:

$\notag \sum_{j=1}^n w_{ij} x^{j-1} = \displaystyle\prod_{k=1\atop k\ne i}^n \left( \frac{x-x_k}{x_i-x_k} \right) =: \ell_i(x). \qquad (2)$

We deduce that

$\notag w_{ij} = \displaystyle\frac{ (-1)^{n-j} \sigma_{n-j}(x_1,\dots,x_{i-1},x_{i+1},\dots,x_n) } { \displaystyle\prod_{k=1 \atop k\ne i}^n (x_i-x_k) }, \qquad (3)$

where $\sigma_k(y_1,\dots,y_n)$ denotes the sum of all distinct products of $k$ of the arguments $y_1,\dots,y_n$ (that is, $\sigma_k$ is the $k$ th elementary symmetric function).

From (1) and (3) we see that if the $x_i$ are real and positive and arranged in increasing order $0 < x_1 < x_2 < \cdots 0$ and $V^{-1}$ has a checkerboard sign pattern: the $(i,j)$ element has sign $(-1)^{i+j}$ .

Note that summing (2) over $i$ gives

$\notag \displaystyle\sum_{j=1}^n x^{j-1} \sum_{i=1}^n w_{ij} = \sum_{i=1}^n \ell_i(x) = 1,$

where the second equality follows from the fact that $\sum_{i=1}^n \ell_i(x)$ is a degree $n-1$ polynomial that takes the value $1$ at the $n$ distinct points $x_i$ . Hence

$\notag \displaystyle\sum_{i=1}^n w_{ij} = \delta_{j1},$

so the elements in the $j$ th column of the inverse sum to $1$ for $j = 1$ and $0$ for $j\ge 2$ .

Example

To illustrate the formulas above, here is an example, with $x_i = (i-1)/(n-1)$ and $n = 5$ :

$\notag V = \left[\begin{array}{ccccc} 1 & 1 & 1 & 1 & 1\\ 0 & \frac{1}{4} & \frac{1}{2} & \frac{3}{4} & 1\\[\smallskipamount] 0 & \frac{1}{16} & \frac{1}{4} & \frac{9}{16} & 1\\[\smallskipamount] 0 & \frac{1}{64} & \frac{1}{8} & \frac{27}{64} & 1\\[\smallskipamount] 0 & \frac{1}{256} & \frac{1}{16} & \frac{81}{256} & 1 \end{array}\right], \quad V^{-1} = \left[\begin{array}{ccccc} 1 & -\frac{25}{3} & \frac{70}{3} & -\frac{80}{3} & \frac{32}{3}\\[\smallskipamount] 0 & 16 & -\frac{208}{3} & 96 & -\frac{128}{3}\\ 0 & -12 & 76 & -128 & 64\\[\smallskipamount] 0 & \frac{16}{3} & -\frac{112}{3} & \frac{224}{3} & -\frac{128}{3}\\[\smallskipamount] 0 & -1 & \frac{22}{3} & -16 & \frac{32}{3} \end{array}\right],$

for which $\det(V) = 9/32768$ .

Conditioning

Vandermonde matrices are notorious for being ill conditioned. The ill conditioning stems from the monomials being a poor basis for the polynomials on the real line. For arbitrary distinct points $x_i$ , Gautschi showed that $V_n = V(x_1, x_2, \dots, x_n)$ satisfies

$\notag \displaystyle\max_i \displaystyle\prod_{j\ne i} \frac{ \max(1,|x_j|) }{ |x_i-x_j| } \le \|V_n^{-1}\|_{\infty} \le \displaystyle\max_i \prod_{j\ne i} \frac{ 1+|x_j| }{ |x_i-x_j| },$

with equality on the right when $x_j = |x_j| e^{\mathrm{i}\theta}$ for all $j$ with a fixed $\theta$ (in particular, when $x_j\ge0$ for all $j$ ). Note that the upper and lower bounds differ by at most a factor $2^{n-1}$ . It is also known that for any set of real points $x_i$ ,

$\notag \kappa_2(V_n) \ge \Bigl(\displaystyle\frac{2}{n}\Bigr)^{1/2} \, (1+\sqrt{2})^{n-2}$

and that for $x_i = 1/i$ we have $\kappa_{\infty}(V_n) > n^{n+1}$ , where the lower bound is an extremely fast growing function of the dimension!

These exponential lower bounds are alarming, but they do not necessarily rule out the use of Vandermonde matrices in practice. One of the reasons is that there are specialized algorithms for solving Vandermonde systems whose accuracy is not dependent on the condition number $\kappa$ , and which in some cases can be proved to be highly accurate. The first such algorithm is an $O(n^2)$ operation algorithm for solving $V_ny =b$ of Björck and Pereyra (1970). There is now a long list of generalizations of this algorithm in various directions, including for confluent Vandermonde-like matrices (Higham, 1990), as well as for more specialized problems (Demmel and Koev, 2005) and more general ones (Bella et al., 2009). Another important observation is that the exponential lower bounds are for real nodes. For complex nodes $V_n$ can be much better conditioned. Indeed when the $x_i$ are the roots of unity, $V_n/\sqrt{n}$ is the unitary Fourier matrix and so $V_n$ is perfectly conditioned.

Generalizations

Two ways in which Vandermonde matrices have been generalized are by allowing confluency of the points $x_i$ and by replacing the monomials by other polynomials. Confluency arises when the $x_i$ are not distinct. If we assume that equal $x_i$ are contiguous then a confluent Vandermonde matrix is obtained by “differentiating” the previous column for each of the repeated points. For example, with points $x_1, x_1, x_1, x_2, x_2$ we obtain

$\notag \begin{bmatrix} 1 & 0 & 0 & 1 & 0 \\ x_1 & 1 & 0 & x_2 & 1 \\ x_1^2 & 2x_1 & 2 & x_2^2 & 2x_2 \\ x_1^3 & 3x_1^2 & 6x_1 & x_2^3 & 3x_2^2 \\ x_1^4 & 4x_1^3 & 12x_1^2 & x_2^4 & 4x_2^3 \end{bmatrix}. \qquad (4)$

The transpose of a confluent Vandermonde matrix arises in Hermite interpolation; it is nonsingular if the points corresponding to the “nonconfluent columns” are distinct (that is, if $x_1 \ne x_2$ in the case of (4)).

A Vandermonde-like matrix is defined in terms of a set of polynomials $\{p_i(x)\}_{i=0}^n$ with $p_i$ having degree $i$ :

$\notag \begin{bmatrix} p_0(x_1) & p_0(x_2) & \dots & p_0(x_n)\\ p_1(x_1) & p_1(x_2) & \dots & p_1(x_n)\\ \vdots & \vdots & \dots & \vdots\\ p_{n-1}(x_1) & p_{n-1}(x_2) & \dots & p_{n-1}(x_n)\\ \end{bmatrix}.$

Of most interest are polynomials that satisfy a three-term recurrence, in particular, orthogonal polynomials. Such matrices can be much better conditioned than general Vandermonde matrices.

Notes

Algorithms for solving confluent Vandermonde-like systems and their rounding error analysis are described in the chapter “Vandermonde systems” of Higham (2002).

Gautschi has written many papers on the conditioning of Vandermonde matrices, beginning in 1962. We mention just his most recent paper on this topic: Gautschi (2011).

References

This is a minimal set of references, which contain further useful references within.

T. Bella, Y. Eidelman. I. Gohberg. I. Koltracht, and V. Olshevsky, A Fast Björck–Pereyra-Type Algorithm for Solving Hessenberg-Quasiseparable-Vandermonde Systems, SIAM J. Matrix Anal. Appl. 31(2), 790–815, 2009.
James W. Demmel and Plamen Koev, The Accurate and Efficient Solution of a Totally Positive Generalized Vandermonde Linear System, SIAM J. Matrix Anal. Appl. 2791, 142–152, 2005.
Walter Gautschi, Optimally Scaled and Optimally Conditioned Vandermonde and Vandermonde-like matrices, BIT 51, 103–125, 2011.
Nicholas J. Higham, Accuracy and Stability of Numerical Algorithms, second edition, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2002.
Nicholas J. Higham, Stability Analysis of Algorithms for Solving Confluent Vandermonde-like Systems, SIAM J. Matrix Anal. Appl. 11, 23–41, 1990.

References

Share this:

Blocked Algorithms

Architectural Features

Probabilistic Bounds

Putting the Pieces Together

References

Share this:

Share this:

Share this:

Examples

Inverse

Eigensystem

LU Factorization

Factorization into a Product of Bidiagonal Matrices

Testing for Total Positivity

Notes

References

Related Blog Posts

Share this:

References

Share this:

References

Related Blog Posts

Share this:

Share this:

Inequalities

Nearness to Singularity and Conditioning

Notes

References

Related Blog Posts

Share this:

Determinant

Inverse

Example

Conditioning

Generalizations

Notes

References

Related Blog Posts

Share this: