What Is a Symmetric Indefinite Matrix?

A symmetric indefinite matrix $A$ is a symmetric matrix for which the quadratic form $x^TAx$ takes both positive and negative values. By contrast, for a positive definite matrix $x^TAx > 0$ for all nonzero $x$ and for a negative definite matrix $x^TAx < 0$ for all nonzero $x$ .

A neat way to express the indefinitess is that there exist vectors $x$ and $y$ for which $(x^TAx)(y^TAy) < 0$ .

A symmetric indefinite matrix has both positive and negative eigenvalues and in some sense is a typical symmetric matrix. For example, a random symmetric matrix is usually indefinite:

>> rng(3); B = rand(4); A = B + B'; eig(A)'
ans =
  -8.9486e-01  -6.8664e-02   1.1795e+00   3.9197e+00

In general it is difficult to tell if a symmetric matrix is indefinite or definite, but there is one easy-to-spot sufficient condition for indefinitess: if the matrix has a zero diagonal element that has a nonzero element in its row then it is indefinite. Indeed if $a_{kk} = 0$ then $e_k^TAe_k = a_{kk} = 0$ , where $e_k$ is the $k$ th unit vector, so $A$ cannot be positive definite or negative definite. The existence of a nonzero element in the row of the zero rules out the matrix being positive semidefinite ( $x^TAx \ge 0$ for all $x$ ) or negative semidefinite ( $x^TAx \le 0$ for all $x$ ).

An example of a symmetric indefinite matrix is a saddle point matrix, which has the block $2\times 2$ form

$\notag C = \begin{bmatrix} A & B^T \\ B & 0 \end{bmatrix},$

where $A$ is symmetric positive definite and $B\ne0$ . When $A$ is the identity matrix, $C$ is the augmented system matrix associated with a least squares problem $\min_x \|Bx - d\|_2$ . Another example is the $n\times n$ reverse identity matrix $J_n$ , illustrated by

$\notag J_4 = \begin{bmatrix} 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \\ 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0 \end{bmatrix},$

which has eigenvalues $\pm1$ (exercise: how many $1$ s and how many $-1$ s?). A third example is a Toeplitz tridiagonal matrix with zero diagonal:

>> A = full(gallery('tridiag',5,1,0,1)), eig(sym(A))'
A =
     0     1     0     0     0
     1     0     1     0     0
     0     1     0     1     0
     0     0     1     0     1
     0     0     0     1     0
ans =
[-1, 0, 1, 3^(1/2), -3^(1/2)]

How can we exploit symmetry in solving a linear system $Ax = b$ with a symmetric indefinite matrix $A$ ? A Cholesky factorization does not exist, but we could try to compute a factorization $A = LDL^T$ , where $L$ is unit lower triangular and $D$ is diagonal with both positive and negative diagonal entries. However, this factorization does not always exist and if it does, its computation in floating-point arithmetic can be numerically unstable. The simplest example of nonexistence is the matrix

$\notag \begin{bmatrix} 0 & 1\\ 1 & 1 \end{bmatrix} \ne \begin{bmatrix} 1 & 0 \\ \ell_{21} & 0 \end{bmatrix} \begin{bmatrix} d_{11} & 0 \\ 0 & d_{22}\end{bmatrix} \begin{bmatrix} 1 & \ell_{21}\\ 0 & 1\end{bmatrix}.$

The way round this is to allow $D$ to have $2 \times 2$ blocks. We can compute a block $\mathrm{LDL^T}$ factorization $PAP^T = LDL^T$ , were $P$ is a permutation matrix, $L$ is unit lower triangular, and $D$ is block diagonal with diagonal blocks of size $1$ or $2$ . Various pivoting strategies, which determine $P$ , are possible, but the recommend one is the symmetric rook pivoting strategy of Ashcraft, Grimes, and Lewis (1998), which has the key property of producing a bounded $L$ factor. Solving $Ax = b$ now reduces to substitutions with $L$ and a solve with $D$ , which involves solving $2\times 2$ linear systems for the $2\times 2$ blocks and doing divisions for the $1\times 1$ blocks (scalars).

MATLAB implements $\mathrm{LDL^T}$ factorization in its ldl function. Here is an example using Anymatrix:

>> A = anymatrix('core/blockhouse',4), [L,D,P] = ldl(A), eigA = eig(A)'
A =
  -4.0000e-01  -8.0000e-01  -2.0000e-01   4.0000e-01
  -8.0000e-01   4.0000e-01  -4.0000e-01  -2.0000e-01
  -2.0000e-01  -4.0000e-01   4.0000e-01  -8.0000e-01
   4.0000e-01  -2.0000e-01  -8.0000e-01  -4.0000e-01
L =
   1.0000e+00            0            0            0
            0   1.0000e+00            0            0
   5.0000e-01  -8.3267e-17   1.0000e+00            0
  -2.2204e-16  -5.0000e-01            0   1.0000e+00
D =
  -4.0000e-01  -8.0000e-01            0            0
  -8.0000e-01   4.0000e-01            0            0
            0            0   5.0000e-01  -1.0000e+00
            0            0  -1.0000e+00  -5.0000e-01
P =
     1     0     0     0
     0     1     0     0
     0     0     1     0
     0     0     0     1
eigA =
  -1.0000e+00  -1.0000e+00   1.0000e+00   1.0000e+00

Notice the $2\times 2$ blocks on the diagonal of $D$ , each of which contains one negative eigenvalue and one positive eigenvalue. The eigenvalues of $D$ are not the same as those of $A$ , but since $A$ and $D$ are congruent they have the same number of positive, zero, and negative eigenvalues.

References

Cleve Ashcraft, Roger Grimes, and John Lewis, Accurate Symmetric Indefinite Linear Equation Solvers, SIAM J. Matrix Anal. Appl. 20, 513–561, 1998.
Nicholas J. Higham and Mantas Mikaitis, Anymatrix: An Extendable MATLAB Matrix Collection, Numer. Algorithms, 90:3, 1175–1196, 2021.

What Is a Toeplitz Matrix?

$T\in\mathbb{C}^{n\times n}$ is a Toeplitz matrix if $t_{ij} = t_{i-j}$ for $2n-1$ parameters $t_{1-n},\dots,t_{n-1}$ . A Toeplitz matrix has constant diagonals. For $n = 4$ :

$\notag T = \begin{bmatrix} t_0 & t_{-1} & t_{-2} & t_{-3}\\ t_1 & t_0 & t_{-1} & t_{-2}\\ t_2 & t_1 & t_0 & t_{-1}\\ t_3 & t_2 & t_1 & t_0 \end{bmatrix}.$

Toeplitz matrices arise in various problems, including analysis of time series, discretization of constant coefficient differential equations, and discretization of convolution equations $\int a(t-s)x(s)\,\mathrm{d} s = b(t)$ .

Since a Toeplitz matrix depends on just $2n-1$ parameters it is reasonable to expect that a linear system $Tx = b$ can be solved in less than the $O(n^3)$ flops that would be required by LU factorization. Indeed methods are available that require only $O(n^2)$ flops; see Golub and Van Loan (2013) for details.

Upper triangular Toeplitz matrices can be written in the form

$\notag T = \sum_{j=1}^n t_{1-j} N^{j-1}, \quad N = \begin{bmatrix} 0 & 1 & & \\ & 0 & \ddots & \\ & & \ddots & 1 \\ & & & 0 \end{bmatrix},$

where $N$ is upper bidiagonal with a superdiagonal of ones and $N^n = 0$ . It follows that the product of two upper triangular Toeplitz matrices is again upper triangular Toeplitz, upper triangular Toeplitz matrices commute, and $T^{-1}$ is also an upper triangular Toeplitz matrix (assuming $t_0$ is nonzero, so that $T$ is nonsingular).

Tridiagonal Toeplitz matrices arise frequently:

$\notag T(c,d,e) = \begin{bmatrix} d & e & & \\ c & d & \ddots & \\ & \ddots & \ddots & e \\ & & c & d \end{bmatrix} \in\mathbb{C}^{n\times n}.$

The eigenvalues of $T(c,d,e)$ are

$\notag d + 2 (ce)^{1/2} \cos\biggl( \displaystyle\frac{k \pi}{n+1} \biggr), \quad k = 1:n.$

The Kac–Murdock–Szegö matrix is the symmetric Toeplitz matrix

$\notag A(\rho) = \begin{bmatrix} 1 & \rho & \rho^2 & \dots & \rho^{n-1} \\ \rho & 1 & \rho & \dots & \rho^{n-2} \\ \rho^2 & \rho & 1 & \ddots & \vdots \\ \vdots & \vdots & \ddots & \ddots & \rho \\ \rho^{n-1} & \rho^{n-2} & \dots & \rho & 1 \end{bmatrix}.$

It has a number of interesting properties.

In MATLAB, a Toeplitz matrix can be constructed using toeplitz(c,r), which produces the matrix with first column c and first row r. Example:

>> n = 5; A = toeplitz(1:n,[1 -2:-1:-n])
A =
     1    -2    -3    -4    -5
     2     1    -2    -3    -4
     3     2     1    -2    -3
     4     3     2     1    -2
     5     4     3     2     1

References

Gene Golub and Charles F. Van Loan, Matrix Computations, fourth edition, Johns Hopkins University Press, Baltimore, MD, USA, 2013. Section 4.7.

Seven Sins of Numerical Linear Algebra

In numerical linear algebra we are concerned with solving linear algebra problems accurately and efficiently and understanding the sensitivity of the problems to perturbations. We describe seven sins, whereby accuracy or efficiency is lost or misleading information about sensitivity is obtained.

1. Inverting a Matrix

In linear algebra courses we learn that the solution to a linear system $Ax = b$ of $n$ equations in $n$ unknowns can be written $x = A^{-1}b$ , where $A^{-1}$ is the matrix inverse. What is not always emphasized is that there are very few circumstances in which one should compute $A^{-1}$ . Indeed one would not solve the scalar ( $n=1$ ) system $7x = 21$ by computing $x = 7^{-1} \times 21$ , but rather would carry out a division $x = 21/7$ . In the $n\times n$ case, it is faster and more accurate to solve a linear system by LU factorization (Gaussian elimination) with partial pivoting than by inverting $A$ (which has, in any case, to be done by LU factorization).

Rare cases where $A^{-1}$ is required are in statistics, where the diagonal elements of the inverse of the covariance matrix are relevant quantities, and in certain algorithms for computing matrix functions.

2. Forming the Cross-Product Matrix A^TA

The solution to the linear least squares problem $\min_x\| b - Ax \|_2$ , where $A$ is a full-rank $m\times n$ matrix with $m\ge n$ , satisfies the normal equations $A^T\!A x = A^Tb$ . It is therefore natural to form the symmetric positive definite matrix $A^T\!A$ and solve the normal equations by Cholesky factorization. While fast, this method is numerically unstable when $A$ is ill conditioned. By contrast, solving the least squares problem via QR factorization is always numerically stable.

What is wrong with the cross-product matrix $A^T\!A$ (also known as the Gram matrix)? It squares the data, which can cause a loss of information in floating-point arithmetic. For example, if

$A = \begin{bmatrix} 1 & 1 \\ \epsilon & 0 \end{bmatrix}, \quad 0 < \epsilon < \sqrt{u},$

where $u$ is the unit roundoff of the floating point arithmetic, then

$A^T\!A = \begin{bmatrix} 1 + \epsilon^2 & 1 \\ 1 & 1 \end{bmatrix}$

is positive definite but, since $\epsilon^2<u$ , in floating-point arithmetic $1+\epsilon^2$ rounds to $1$ and so

$\mathrm{f\kern.2ptl}( A^T\!A) = \begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix}.$

which is singular, and the information in $\epsilon$ has been lost.

Another problem with the cross product matrix is that the $2$ -norm condition number of $A^T\!A$ is the square of that of $A$ , and this leads to numerical instability in algorithms that work with $A^T\!A$ when the condition number is large.

3. Evaluating Matrix Products in an Inefficient Order

The cost of evaluating a matrix product depends on the order in which the product is evaluated (assuming the matrices are not all $n\times n$ ). More precisely, matrix multiplication is associative, so $A(BC) = (AB)C$ , and in general the cost of the evaluation of a product depends on where one puts the parentheses. One order may be much superior to others, so one should not simply evaluate the product in a fixed left-right or right-left order. For example, if $x$ , $y$ , and $z$ are $n$ -vectors then $xy^Tz$ can be evaluated as

$(xy^T)z$ : a vector outer product followed by a matrix–vector product, costing $O(n^2)$ operations, or
$x (y^Tz)$ : a vector scalar product followed by a vector scaling, costing just $O(n)$ operations.

In general. finding where to put the parentheses in a matrix product $A_1A_2\dots A_k$ in order to minimize the operation count is a difficult problem, but for many cases that arise in practice it is easy to determine a good order.

4. Assuming that a Matrix is Positive Definite

Symmetric positive definite matrices (symmetric matrices with positive eigenvalues) are ubiquitous, not least because they arise in the solution of many minimization problems. However, a matrix that is supposed to be positive definite may fail to be so for a variety of reasons. Missing or inconsistent data in forming a covariance matrix or a correlation matrix can cause a loss of definiteness, and rounding errors can cause a tiny positive eigenvalue to go negative.

Definiteness implies that

the diagonal entries are positive,
$\det(A) > 0$ ,
$|a_{ij}| < \sqrt{a_{ii}a_{jj}}$ for all $i \ne j$ ,

but none of these conditions, or even all taken together, guarantees that the matrix has positive eigenvalues.

The best way to check definiteness is to compute a Cholesky factorization, which is often needed anyway. The MATLAB function chol returns an error message if the factorization fails, and a second output argument can be requested, which is set to the number of the stage on which the factorization failed, or to zero if the factorization succeeded. In the case of failure, the partially computed $R$ factor is returned in the first argument, and it can be used to compute a direction of negative curvature (as needed in optimization), for example.

This sin takes the top spot in Schmelzer and Hauser’s Seven Sins in Portfolio Optimization, because in portfolio optimization a negative eigenvalue in the covariance matrix can identify a portfolio with negative variance, promising an arbitrarily large investment with no risk!

5. Not Exploiting Structure in the Matrix

One of the fundamental tenets of numerical linear algebra is that one should try to exploit any matrix structure that might be present. Sparsity (a matrix having a large number of zeros) is particularly important to exploit, since algorithms intended for dense matrices may be impractical for sparse matrices because of extensive fill-in (zeros becoming nonzero). Here are two examples of structures that can be exploited.

Matrices from saddle point problems are symmetric indefinite and of the form

$\notag C = \begin{bmatrix} A & B^T \\ B & 0 \end{bmatrix},$

with $A$ symmetric positive definite. Much work has been done on developing numerical methods for solving $Cx = b$ that exploit the block structure and possible sparsity in $A$ and $B$ . A second example is a circulant matrix

$\notag C = \begin{bmatrix} c_1 & c_2 & \dots & c_n \\ c_n & c_1 & \dots & \vdots \\ \vdots & \ddots & \ddots & c_2 \\ c_2 & \dots & c_n & c_1 \\ \end{bmatrix}.$

Circulant matrices have the important property that they are diagonalized by a unitary matrix called the discrete Fourier transform matrix. Using this property one can solve $Cx = v$ in $O(n \log_2n)$ operations, rather than the $O(n^3)$ operations required if the circulant structure is ignored.

Ideally, linear algebra software would detect structure in a matrix and call an algorithm that exploits that structure. A notable example of such a meta-algorithm is the MATLAB backslash function x = A\b for solving $Ax = b$ . Backslash checks whether the matrix is triangular (or a permutation of a triangular matrix), upper Hessenberg, symmetric, or symmetric positive definite, and applies an appropriate method. It also allows $A$ to be rectangular and solves the least squares problem if there are more rows than columns and the underdetermined system if there are more columns than rows.

6. Using the Determinant to Detect Near Singularity

An $n\times n$ matrix $A$ is nonsingular if and only if its determinant is nonzero. One might therefore expect that a small value for $\det(A)$ indicates a matrix that is nearly singular. However, the size of $\det(A)$ tells us nothing about near singularity. Indeed, since $\det(\alpha A) = \alpha^n \det(A)$ we can achieve any value for the determinant by multiplying by a scalar $\alpha$ , yet $\alpha A$ is no more or less nearly singular than $A$ for $\alpha \ne 0$ .

Another limitation of the determinant is shown by the two matrices

$\notag T = \begin{bmatrix} 1 & -1 & -1 & \dots & -1\\ & 1 & -1 & \dots & -1\\ & & 1 & \dots & \vdots\\ & & & \ddots & -1 \\ & & & & 1 \end{bmatrix}, \quad U = \begin{bmatrix} 1 & 1 & 1 & \dots & 1\\ & 1 & 1 & \dots & 1\\ & & 1 & \dots & \vdots\\ & & & \ddots & 1 \\ & & & & 1 \end{bmatrix} \qquad (1)$

Both matrices have unit diagonal and off-diagonal elements bounded in modulus by $1$ . So $\det(T) = \det(U) = 1$ , yet

$\notag T^{-1} = \begin{bmatrix} 1 & 1 & 2 & \dots & 2^{n-2}\\ & 1 & 1 & \dots & \vdots\\ & & 1 & \ddots & 2\\ & & & \ddots & 1 \\ & & & & 1 \end{bmatrix}, \quad U^{-1} = \begin{bmatrix} 1 & -1 & & & \\ & 1 & -1 & & \\ & & 1 & \ddots & \\ & & & \ddots & -1 \\ & & & & 1 \end{bmatrix}.$

So $T$ is ill conditioned for large $n$ . In fact, if we change the $(n,1)$ element of $T$ to $-2^{n-2}$ then the matrix becomes singular! By contrast, $U$ is always very well conditioned. The determinant cannot distinguish between the ill-conditioned $T$ and the well-conditioned $U$ .

7. Using Eigenvalues to Estimate Conditioning

For any $n\times n$ matrix $A$ and any consistent matrix norm it is true that $\|A\| \ge |\lambda_i|$ for all $i$ , where the $\lambda_i$ are the eigenvalue of $A$ . Since the eigenvalues of $A^{-1}$ are $\lambda^{-1}$ , it follows that the matrix condition number $\kappa(A) = \|A\| \, \|A^{-1}\|$ is bounded below by the ratio of largest to smallest eigenvalue in absolute value, that is,

$\notag \kappa(A) \ge \displaystyle\frac{ \max_i |\lambda_i| } { \min_i |\lambda_i| }.$

But as the matrix $T$ in (1) shows, this bound can be very weak.

It is singular values not eigenvalues that characterize the condition number for the 2-norm. Specifically,

$\notag \kappa_2(A) = \displaystyle\frac{\sigma_1}{\sigma_n},$

where $A = U\Sigma V^T$ is a singular value decomposition (SVD), with $U$ and $V$ orthogonal and $\Sigma = \mathrm{diag}(\sigma_i)$ , $\sigma_1 \ge \sigma_2 \ge \cdots \ge \sigma_n \ge 0$ . If $A$ is symmetric, for example, then the sets $\{ |\lambda_i| \}$ and $\{\sigma_i \}$ are the same, but in general the eigenvalues $\lambda_i$ and singular values $\sigma_i$ can be very different.

Cleve Moler Wins ICIAM Industry Prize 2023

Congratulations to Cleve Moler, who has won the inaugural ICIAM Industry Prize 2023 for “outstanding contributions to the development of mathematical and computational tools and methods for the solution of science and engineering problems and his invention of MATLAB”.

I first saw Cleve demonstrate the original Fortran version of MATLAB on an IBM PC at the Gatlinburg meeting at the University of Waterloo in 1984. The commercial version of MATLAB was released soon after, and it has been my main programming environment ever since.

MATLAB succeeded for a number of reasons, some of which Dennis Sherwood and I describe in one of the creativity stories in our recent book How to Be Creative: A Practical Guide for the Mathematical Sciences. But there is one reason that is rarely mentioned.

From the beginning, MATLAB supported complex arithmetic—indeed, the basic data type has always been a complex matrix. The original 1980 MATLAB Users’ Guide says

MATLAB works with essentially only one kind of object, a rectangular matrix with complex elements. If the imaginary parts of the elements are all zero, they are not printed, but they still occupy storage.

By contrast, early competing packages usually supported only real arithmetic (see my 1989 SIAM News article Matrix Computations on a PC for a comparison of PC-MATLAB and GAUSS). Cleve understood the fundamental need to compute in the complex plane in real life problems, as opposed to textbook examples, and he appreciated how tedious it is to program with real and imaginary parts stored in separate arrays. The storing of zero imaginary parts of real numbers was a small price to pay for the convenience. Of course, the commercial version of MATLAB was optimized not to store the imaginary part of reals. Control engineers—a group who were early adopters of MATLAB—appreciated the MATLAB approach, because the stability of control systems depends on eigenvalues, which are in general complex.

Another wise choice was that MATLAB allows the imaginary unit to be written as i or j, thus keeping mathematicians and electrical engineers happy!

Here is Cleve demonstrating MATLAB in October 2000:

Month: October 2022

What Is a Symmetric Indefinite Matrix?

References

Related Blog Posts

What Is a Toeplitz Matrix?

References

Related Blog Posts

Seven Sins of Numerical Linear Algebra

1. Inverting a Matrix

2. Forming the Cross-Product Matrix A^TA

3. Evaluating Matrix Products in an Inefficient Order

4. Assuming that a Matrix is Positive Definite

5. Not Exploiting Structure in the Matrix

6. Using the Determinant to Detect Near Singularity

7. Using Eigenvalues to Estimate Conditioning

Related Blog Posts

Cleve Moler Wins ICIAM Industry Prize 2023

References

Related Blog Posts

Share this:

References

Related Blog Posts

Share this:

1. Inverting a Matrix

2. Forming the Cross-Product Matrix A^TA

3. Evaluating Matrix Products in an Inefficient Order

4. Assuming that a Matrix is Positive Definite

5. Not Exploiting Structure in the Matrix

6. Using the Determinant to Detect Near Singularity

7. Using Eigenvalues to Estimate Conditioning

Related Blog Posts

Share this:

Share this: