In numerical linear algebra we are concerned with solving linear algebra problems accurately and efficiently and understanding the sensitivity of the problems to perturbations. We describe seven sins, whereby accuracy or efficiency is lost or misleading information about sensitivity is obtained.
1. Inverting a Matrix
In linear algebra courses we learn that the solution to a linear system of
equations in
unknowns can be written
, where
is the matrix inverse. What is not always emphasized is that there are very few circumstances in which one should compute
. Indeed one would not solve the scalar (
) system
by computing
, but rather would carry out a division
. In the
case, it is faster and more accurate to solve a linear system by LU factorization (Gaussian elimination) with partial pivoting than by inverting
(which has, in any case, to be done by LU factorization).
Rare cases where is required are in statistics, where the diagonal elements of the inverse of the covariance matrix are relevant quantities, and in certain algorithms for computing matrix functions.
2. Forming the Cross-Product Matrix A^TA
The solution to the linear least squares problem , where
is a full-rank
matrix with
, satisfies the normal equations
. It is therefore natural to form the symmetric positive definite matrix
and solve the normal equations by Cholesky factorization. While fast, this method is numerically unstable when
is ill conditioned. By contrast, solving the least squares problem via QR factorization is always numerically stable.
What is wrong with the cross-product matrix (also known as the Gram matrix)? It squares the data, which can cause a loss of information in floating-point arithmetic. For example, if
where is the unit roundoff of the floating point arithmetic, then
is positive definite but, since , in floating-point arithmetic
rounds to
and so
which is singular, and the information in has been lost.
Another problem with the cross product matrix is that the -norm condition number of
is the square of that of
, and this leads to numerical instability in algorithms that work with
when the condition number is large.
3. Evaluating Matrix Products in an Inefficient Order
The cost of evaluating a matrix product depends on the order in which the product is evaluated (assuming the matrices are not all ). More precisely, matrix multiplication is associative, so
, and in general the cost of the evaluation of a product depends on where one puts the parentheses. One order may be much superior to others, so one should not simply evaluate the product in a fixed left-right or right-left order. For example, if
,
, and
are
-vectors then
can be evaluated as
: a vector outer product followed by a matrix–vector product, costing
operations, or
: a vector scalar product followed by a vector scaling, costing just
operations.
In general. finding where to put the parentheses in a matrix product in order to minimize the operation count is a difficult problem, but for many cases that arise in practice it is easy to determine a good order.
4. Assuming that a Matrix is Positive Definite
Symmetric positive definite matrices (symmetric matrices with positive eigenvalues) are ubiquitous, not least because they arise in the solution of many minimization problems. However, a matrix that is supposed to be positive definite may fail to be so for a variety of reasons. Missing or inconsistent data in forming a covariance matrix or a correlation matrix can cause a loss of definiteness, and rounding errors can cause a tiny positive eigenvalue to go negative.
Definiteness implies that
- the diagonal entries are positive,
,
for all
,
but none of these conditions, or even all taken together, guarantees that the matrix has positive eigenvalues.
The best way to check definiteness is to compute a Cholesky factorization, which is often needed anyway. The MATLAB function chol
returns an error message if the factorization fails, and a second output argument can be requested, which is set to the number of the stage on which the factorization failed, or to zero if the factorization succeeded. In the case of failure, the partially computed factor is returned in the first argument, and it can be used to compute a direction of negative curvature (as needed in optimization), for example.
This sin takes the top spot in Schmelzer and Hauser’s Seven Sins in Portfolio Optimization, because in portfolio optimization a negative eigenvalue in the covariance matrix can identify a portfolio with negative variance, promising an arbitrarily large investment with no risk!
5. Not Exploiting Structure in the Matrix
One of the fundamental tenets of numerical linear algebra is that one should try to exploit any matrix structure that might be present. Sparsity (a matrix having a large number of zeros) is particularly important to exploit, since algorithms intended for dense matrices may be impractical for sparse matrices because of extensive fill-in (zeros becoming nonzero). Here are two examples of structures that can be exploited.
Matrices from saddle point problems are symmetric indefinite and of the form
with symmetric positive definite. Much work has been done on developing numerical methods for solving
that exploit the block structure and possible sparsity in
and
. A second example is a circulant matrix
Circulant matrices have the important property that they are diagonalized by a unitary matrix called the discrete Fourier transform matrix. Using this property one can solve in
operations, rather than the
operations required if the circulant structure is ignored.
Ideally, linear algebra software would detect structure in a matrix and call an algorithm that exploits that structure. A notable example of such a meta-algorithm is the MATLAB backslash function x = A\b
for solving . Backslash checks whether the matrix is triangular (or a permutation of a triangular matrix), upper Hessenberg, symmetric, or symmetric positive definite, and applies an appropriate method. It also allows
to be rectangular and solves the least squares problem if there are more rows than columns and the underdetermined system if there are more columns than rows.
6. Using the Determinant to Detect Near Singularity
An matrix
is nonsingular if and only if its determinant is nonzero. One might therefore expect that a small value for
indicates a matrix that is nearly singular. However, the size of
tells us nothing about near singularity. Indeed, since
we can achieve any value for the determinant by multiplying by a scalar
, yet
is no more or less nearly singular than
for
.
Another limitation of the determinant is shown by the two matrices
Both matrices have unit diagonal and off-diagonal elements bounded in modulus by . So
, yet
So is ill conditioned for large
. In fact, if we change the
element of
to
then the matrix becomes singular! By contrast,
is always very well conditioned. The determinant cannot distinguish between the ill-conditioned
and the well-conditioned
.
7. Using Eigenvalues to Estimate Conditioning
For any matrix
and any consistent matrix norm it is true that
for all
, where the
are the eigenvalue of
. Since the eigenvalues of
are
, it follows that the matrix condition number
is bounded below by the ratio of largest to smallest eigenvalue in absolute value, that is,
But as the matrix in (1) shows, this bound can be very weak.
It is singular values not eigenvalues that characterize the condition number for the 2-norm. Specifically,
where is a singular value decomposition (SVD), with
and
orthogonal and
,
. If
is symmetric, for example, then the sets
and
are the same, but in general the eigenvalues
and singular values
can be very different.