# What Is a Matrix?

A matrix is a rectangular array of numbers on which certain algebraic operations are defined. Matrices provide a convenient way of encapsulating many numbers in a single object and manipulating those numbers in useful ways.

An $m \times n$ matrix has $m$ rows and $n$ columns and $m$ and $n$ are called the dimensions of the matrix. A matrix is square if it has the same number of rows and columns, otherwise it is rectangular.

An example of a square matrix is

$\begin{bmatrix} 1 & 1 & 1 & 1\\ 1 & 2 & 3 & 4\\ 1 & 3 & 6 & 10\\ 1 & 4 & 10 & 20 \end{bmatrix}.$

This matrix is symmetric: $a_{ij} = a_{ji}$ for all $i$ and $j$, where $a_{ij}$ denotes the entry at the intersection of row $i$ and column $j$. Matrices are written either with square brackets, as in this example, or round brackets (parentheses).

Addition of matrices of the same dimensions is defined in the obvious way: by adding the corresponding entries.

Multiplication of matrices requires the inner dimensions to match. The product of an $m \times p$ matrix $A$ and an $p\times n$ matrix $B$ is an $m\times n$ matrix $C = AB$ defined by the formula

$c_{ij} = \displaystyle\sum_{k=1}^p a_{ik}b_{kj}, \quad 1 \le i \le m, \quad 1 \le j \le n.$

When $m = n$, both $AB$ and $BA$ are defined, but they are generally unequal: matrix multiplication is not commutative.

The inverse of a square matrix $A$ is a matrix $X$ such that $AX = XA = I$, where $I$ is the identity matrix, which has ones on the main diagonal (that is, in the $(i,i)$ position for all $i$) and zeros off the diagonal. For rectangular matrices various notions of generalized inverse exist.

The transpose of an $m \times n$ matrix $A$, written $A^T$, is the $n \times m$ matrix whose $(i,j)$ entry is $a_{ji}$. For a complex matrix, the conjugate transpose, written $A^*$ or $A^H$, has $(i,j)$ entry $\overline{a}_{ji}$.

In linear algebra, a matrix represents a linear transformation between two vector spaces in terms of particular bases for each space.

Vectors and scalars are special cases of matrices: column vectors are $n\times 1$, row vectors are $1\times n$, and scalars are $1\times1$.

Many programming languages and problem solving environments support arrays. It is important to note that operations on arrays are typically defined componentwise, so that, for example, multiplying two arrays multiplies the corresponding pairs of entries, which is not the same as matrix multiplication. The quintessential programming environment for matrices is MATLAB, in which a matrix is the core data type.

It is possible to give meaning to a matrix with one or both dimensions zero. MATLAB supports such empty matrices. Matrix multiplication generalizes in a natural way to allow empty dimensions:

>> A = zeros(0,2)*zeros(2,3)
A =
0x3 empty double matrix

>> A = zeros(2,0)*zeros(0,3)
A =
0     0     0
0     0     0


In linear algebra and numerical analysis, matrices are usually written with a capital letter and vectors with a lower case letter. In some contexts matrices are distinguished by boldface.

The term matrix was coined by James Joseph Sylvester in 1850. Arthur Cayley was the first to define matrix algebra, in 1858.

## References

• Arthur Cayley, A Memoir on the Theory of Matrices, Philos. Trans. Roy. Soc. London 148, 17–37, 1858.
• Nicholas J. Higham, Sylvester’s Influence on Applied Mathematics, Mathematics Today 50, 202–206, 2014. A version of the article with an extended bibliography containing additional historical references is available as a MIMS EPrint.
• Roger A. Horn and Charles R. Johnson, Matrix Analysis, second edition, Cambridge University Press, 2013. My review of the second edition.

# Update of Catalogue of Software for Matrix Functions

Edvin Hopkins and I have updated to version 3.0 the catalogue of software for matrix functions that we originally produced in 2014 and updated in 2016. It covers what is available in various languages (C++, Fortran, Java, Julia, Python, Rust), problem solving environments (GNU Octave, Maple, Mathematica, MATLAB and associated toolboxes, R, Scilab), and libraries (Armadillo, GNU Scientific Library, NAG Library, SLEPc, SLICOT).

Here are some highlights of changes in the last four years that are reflected in the new version.

• Several new MATLAB third-party functions have been written, by various authors, notably for $f(A)b$ and for arbitrary precision evaluation of the exponential and logarithm.
• Matrix function support in Julia has been expanded.
• Armadillo, Rust, SLEPc, and Tensorflow are included in new entries.

In addition, all URLs and references have been updated.

Suggestions for inclusion in a future revision are welcome.

# What Is Backward Error?

Backward error is a measure of error associated with an approximate solution to a problem. Whereas the forward error is the distance between the approximate and true solutions, the backward error is how much the data must be perturbed to produce the approximate solution.

For a function $f$ from $\mathbb{R}^n$ to $\mathbb{R}^n$ and an approximation $y$ to $f(x)$, the backward error in $y$ is the smallest $\Delta x$ such that $y = f(x+\Delta x)$, for some appropriate measure of size. There can be many $\Delta x$ satisfying this equation, so the backward error is the solution to a minimization problem. Using a vector norm and measuring perturbations in a relative sense, we can define the backward error in $y$ as

$\eta(y) = \min\{ \, \epsilon: y = f(x+\Delta x), \; \|\Delta x\| \le \epsilon \|x\| \,\}.$

In the following figure the solid lines denote exact mappings and the dashed line shows the mapping that was actually computed.

Usually, but not always, the errors in question are rounding errors, but backward error can also be a useful way to characterize truncation errors (for example in deriving algorithms based on Padé approximation for computing matrix functions).

As an example, for the inner product $u^Tv$ of two vectors the backward error of an approximation $w$ can be defined as

$\eta(w) = \min \{\, \epsilon: w = (u + \Delta u)^Tv,\; \|\Delta u\|_2 \le \epsilon \|u\|_2 \,\},$

where $\|u\|_2 = (u^Tu)^{1/2}$. It can be shown that

$\eta(w) = \displaystyle\frac{ |w - u^Tv| }{ \|u\|_2 \|v\|_2 }.$

The definition of $\eta(w)$ is clearly unsymmetric in that $u$ is perturbed but $v$ is not. If $v$ is perturbed instead of $u$ then the same formula is obtained. If both $u$ and $v$ are perturbed then the constraint in the definition of $\eta(w)$ becomes nonlinear in $\Delta u$ and $\Delta v$ and no explicit formula is available for $\eta(w)$.

For some problems a backward error may not exist. An example is computation of the outer product $A = uv^T$ of two $n$-vectors, which has rank 1. In floating-point arithmetic the computed matrix $\widehat{A}$ is unlikely to be of rank 1, so $\widehat{A} = (u + \Delta u)(v + \Delta v)^T$ is not in general possible for any $\Delta u$ and $\Delta v$. In this situation one can consider a mixed backward–forward error that perturbs $\widehat{A}$ as well as $u$ and $v$.

Backward error analysis refers to rounding error analysis in which a bound is obtained for a suitably defined backward error. If the backward error can be shown to be small then $y$ is the solution to a nearby problem. Indeed, if the backward error can be shown to be of the same size as any uncertainties in the data then $y$ is as good a solution as can be expected.

Backward error analysis was developed and popularized by James Wilkinson in the 1950s and 1960s. He first used it in the context of computing zeros of polynomials, but the method’s biggest successes came when he applied it to linear algebra computations.

Backward error analysis has also been used in the context of the numerical solution of differential equations, where it is used in different forms known as defect control and shadowing.

The forward error of $y\approx f(x)$ is bounded in terms of the backward error $\eta(y)$ by

$\displaystyle\frac{\|y - f(x)\|}{\|f\|} \le \mathrm{cond}(f,x) \eta(y) + O(\eta(y))^2,$

in view of the definition of condition number. Consequently, we have the rule of thumb that

$\mbox{forward error} \lesssim \mbox{condition number}\times \mbox{backward error}.$

## References

This is a minimal set of references, which contain further useful references within.

# What Is a Condition Number?

A condition number of a problem measures the sensitivity of the solution to small perturbations in the input data. The condition number depends on the problem and the input data, on the norm used to measure size, and on whether perturbations are measured in an absolute or a relative sense. The problem is defined by a function, which may be known explicitly or may be only implicitly defined (as when the problem is to solve an equation).

The most well known example of a condition number is the condition number of a nonsingular square matrix $A$, which is $\kappa(A) = \|A\| \|A^{-1}\|$. More correctly, this is the condition number with respect to inversion, because a relative change to $A$ of norm $\epsilon$ can change $A^{-1}$ by a relative amount as much as, but no more than, about $\kappa(A)\epsilon$ for small $\epsilon$. The same quantity $\kappa(A)$ is also the condition number for a linear system $Ax = b$ (exactly if $A$ is the data, but only approximately if both $A$ and $b$ are the data).

It is easy to see that $\kappa(A) \ge 1$ for any norm for which $\|I\| = 1$ (most common norms, but not the Frobenius norm, have this property) and that $\kappa(A)$ tends to infinity as $A$ tends to singularity.

A general definition of (relative) condition number, for a function $f$ from $\mathbb{R}^n$ to $\mathbb{R}^n$, is

$\mathrm{cond}(f,x) = \lim_{\epsilon\to0} \displaystyle\sup_{\|\Delta x\| \le \epsilon \|x\|} \displaystyle\frac{\|f(x+\Delta x) - f(x)\|}{\epsilon\|f(x)\|}.$

Taking a small, nonzero $\epsilon$, we have

$\displaystyle\frac{\|f(x+\Delta x) - f(x)\|}{\|f(x)\|} \lesssim \mathrm{cond}(f,x) \displaystyle\frac{\|\Delta x\|}{\|x\|}$

for small $\|\Delta x\|$, with approximate equality for some $\Delta x$.

An explicit expression for $\mathrm{cond}(f,x)$ can be given in terms of the Jacobian matrix, $J(x) = (\partial f_i/\partial x_j)$:

$\mathrm{cond}(f,x) = \displaystyle\frac{ \|x\| \| J(x) \| }{ \| f(x) \|}.$

We give two examples.

• If $f$ is a scalar function then $J(x) = f'(x)$, so $\mathrm{cond}(f,x) = |xf'(x)/f(x)|$. Hence, for example, $\mathrm{cond}(\log,x) = 1/|\log x|$.
• If $z$ is a simple (non-repeated) root of the polynomial $p(t) = a_n t^n + \cdots + a_1 t + a_0$ then the data is the vector of coefficients $a = [a_n,\dots,a_0]^T$. It can be shown that the condition number of the root $z$ is, for the $\infty$-norm,

$\mathrm{cond}(z,a) = \displaystyle\frac{ \max_i |a_i| \sum_{i=0}^n |z|^i } { |z p'(z)| }.$

A general theory of condition numbers was developed by Rice (1966).

A problem is said to be well conditioned if the condition number is small and ill conditioned if the condition number is large. The meaning of “small” and “large” depends on the problem and the context. This diagram illustrates a well-conditioned function $f$: small changes in $x$ produce small changes in $f$.

The next diagram depicts an ill-conditioned function $f$: small changes in $x$ can produce large changes in $f$ (but do not necessarily do so, as the closeness of $f(x_2)$ and $f(x_3)$ illustrates).

Here are a few key points about condition numbers.

• Even though an explicit expression may be available for it, computing $\mathrm{cond}(f,x)$ is usually as expensive as computing $f(x)$, so a lot of research has focused on obtaining inexpensive estimates of the condition number or bounds for it.
• While $\kappa(A) \ge 1$, it is not true for all functions that $\mathrm{cond}(f,x)$ is bounded below by $1$.
• For a range of functions that includes the matrix inverse, matrix eigenvalues, and a root of a polynomial, it is known that the condition number is the reciprocal of the relative distance to the nearest singular problem (one with an infinite condition number).
• As the condition number is itself a function, one can ask: What is the condition number of the condition number? For a variety of problems, including those mentioned in the previous point, the condition number of the condition number is (approximately) the condition number!

## References

This is a minimal set of references, which contain further useful references within.

# “What Is” Series

I am starting a series of “What Is” posts. These will give brief descriptions of important concepts in numerical analysis and related areas, with a focus on topics that arise in my research. I will try to adhere to the following criteria.

• The posts are no more than two or three screenfuls (about 500 words).
• The posts contain a minimum of mathematical symbols, equations, and citations.
• The posts are accessible to upper level undergraduate students in any mathematically oriented subject.
• The list of references given for further reading is very small and concentrates on those that are broad, readable, and contain further references to the literature.

I will make PDF versions (from $\LaTeX$) of the posts available on GitHub in the repository What-Is. These are much better for printing. They contain hyperlinks, which will be clickable in the PDF but not visible in a printout.

I will be happy to receive suggestions for topics to cover.

From 2002–2019 the Notices of the American Mathematical Society published a “What Is …” series of articles, latterly in the Graduate Student section. An index of the articles is available here. Those articles are almost exclusively on pure mathematics and I don’t anticipate much, if any, overlap with my series.