When Does Thresholding Preserve Positive Definiteness?

Does a symmetric positive definite matrix remain positive definite when we set one or more elements to zero? This question arises in thresholding, in which elements of absolute value less than some tolerance are set to zero. Thresholding is used in some applications to remove correlations thought to be spurious, so that only statistically significant ones are retained.

We will focus on the case where just one element is changed and consider an arbitrary target value rather than zero. Given an n\times n symmetric positive definite matrix A we define A(t) to be the matrix resulting from adding t to the (i,j) and (j,i) elements and we ask when is A(t) positive definite. We can write

\notag   A(t) = A + t(e_i^{}e_j^T + e_j^{}e_i^T) \equiv A + tE_{ij},

where e_i is the ith column of the identity matrix. The perturbation E_{ij} has rank 2, with eigenvalues -1, 1, and 0 repeated n-2 times. Hence we can write E_{ij} in the form E_{ij} = pp^T - qq^T, where p^Tp = q^Tq = 1 and p^Tq = 0. Adding pp^T to A causes each eigenvalue to increase or stay the same, while subtracting qq^T decreases or leaves unchanged each eigenvalue. However, more is true: after each of these rank-1 perturbations the eigenvalues of the original and perturbed matrices interlace, by Weyl’s theorem. Hence, with the eigenvalues of A ordered as \lambda_n(A) \le \cdots \le \lambda_1(A), we have (Horn and Johnson, Cor. 4.3.7)

\notag \begin{aligned}   \lambda_n(A(t)) &\le \lambda_{n-1}(A), \\   \lambda_{i+1}(A) &\le \lambda_i(A(t)) \le \lambda_{i-1}(A),    \quad i = 2\colon n-1, \\   \lambda_2(A) &\le \lambda_1(A(t)). \end{aligned}

Because A is positive definite these inequalities imply that \lambda_{n-1}(A(t)) \ge \lambda_n(A) > 0, so A(t) has at most one negative eigenvalue. Since \det(A(t)) is the product of the eigenvalues of A(t) this means that A(t) is positive definite precisely when \det(A(t)) > 0.

There is a simple expression for \det(A(t)), which follows from a lemma of Chan (1984), as explained by Georgescu, Higham, and Peters (2018):

\notag  \det(A(t)) = \det(A)\big(1+ 2t b_{ij} + t^2(b_{ij}^2-b_{ii}b_{jj})\big),

where B = A^{-1}. Hence the condition for A(t) to be positive definite is

\notag  q_{ij}(t) = 1 + 2t b_{ij} + t^2(b_{ij}^2-b_{ii}b_{jj}) > 0.

We can factorize

\notag     q_{ij}(t) = \Bigl( t\bigl(b_{ij}  - \sqrt{b_{ii}b_{jj}}\bigr) + 1 \Bigr)                 \Bigl( t\bigl(b_{ij}  + \sqrt{b_{ii}b_{jj}}\bigr) + 1 \Bigr),

so q_{ij}(t) > 0 for

\notag    t\in \left( \displaystyle\frac{-1}{ \sqrt{b_{ii}b_{jj}} + b_{ij} },                 \displaystyle\frac{1}{ \sqrt{b_{ii}b_{jj}} - b_{ij} } \right) =: I_{ij},

where the endpoints are finite because B, like A, is positive definite and so |b_{ij}| < \sqrt{b_{ii}b_{jj}}.

The condition for A to remain positive definite when a_{ij} is set to zero is q_{ij}(-a_{ij}) > 0, or equivalently -a_{ij} \in I_{ij}. To check either of these conditions we need just b_{ij}, b_{ii}, and b_{jj}. These elements can be computed without computing the whole inverse by solving the equations Ab_k = e_k for k = i,j, for the kth column b_k of B, making use of a Cholesky factorization of A.

As an example, we consider the 4\times 4 Lehmer matrix, which has (i,j) element i/j for i \ge j:

\notag   A = \begin{bmatrix}         1           & \frac{1}{2}  & \frac{1}{3} & \frac{1}{4} \\[3pt]         \frac{1}{2} &           1  & \frac{2}{3} & \frac{1}{2} \\[3pt]         \frac{1}{3} &  \frac{2}{3} & 1           & \frac{3}{4} \\[3pt]         \frac{1}{4} &  \frac{1}{2} & \frac{3}{4} &  1         \end{bmatrix}.

The smallest eigenvalue of A is 0.208. Any off-diagonal element except the (2,4) element can be zeroed without destroying positive definiteness, and if the (2,4) element is zeroed then the new matrix has smallest eigenvalue -0.0249. For i=2 and j=4, the following plot shows in red \lambda_{\min}(A(t)) and in blue q_{24}(t); the black dots are the endpoints of the closure of the interval I_{24} = (-0.453,0.453) and the vertical black line is the value -a_{24}. Clearly, -a_{24} lies outside I_{24}, which is why zeroing this element causes a loss of positive definiteness. Note that I_{24} also tells us that we can increase a_{24} to any number less than 0.953 without losing definiteness.


Given a positive definite matrix and a set S of elements to be modified we may wish to determine subsets (including a maximal subset) of S for which the modifications preserve definiteness. Efficiently determining these subsets appears to be an open problem.

In practical applications thresholding may lead to an indefinite matrix. Definiteness must then be restored to obtain a valid correlation matrix. One way to do this is to find the nearest correlation matrix in the Frobenius norm such that the zeroed elements remain zero. This can be done by the alternating projections method with a projection to keep the zeroed elements fixed. Since the nearest correlation matrix is positive semidefinite, it is also desirable to to incorporate a lower bound \delta > 0 on the smallest eigenvalue, which corresponds to another projection. Both these projections are supported in the algorithm of Higham and Strabić (2016), implemented in the code at https://github.com/higham/anderson-accel-ncm. For the Lehmer matrix, the nearest correlation matrix with zero (2,4) element and eigenvalues at least \delta = 0.01 is (to four significant figures)

\notag   \begin{bmatrix}    1       &    0.4946  &    0.3403  &    0.2445  \\    0.4946  &    1       &    0.6439  &    0       \\    0.3403  &    0.6439  &    1       &    0.7266  \\    0.2445  &    0       &    0.7266  &    1    \end{bmatrix}.

A related question is for what patterns of elements that are set to zero is positive definiteness guaranteed to be preserved for all positive definite A? Clearly, setting all the off-diagonal elements to zero preserves definiteness, since the diagonal of a positive definite matrix is positive. Guillot and Rajaratnam (2012) show that the answer to the question is that the new matrix must be a symmetric permutation of a block diagonal matrix. However, for particular A this restriction does not necessarily hold, as the Lehmer matrix example shows.


What Is a Correlation Matrix?

In linear algebra terms, a correlation matrix is a symmetric positive semidefinite matrix with unit diagonal. In other words, it is a symmetric matrix with ones on the diagonal whose eigenvalues are all nonnegative.

The term comes from statistics. If x_1, x_2, \dots, x_n are column vectors with m elements, each vector containing samples of a random variable, then the corresponding n\times n covariance matrix V has (i,j) element

v_{ij} = \mathrm{cov}(x_i,x_j) = \displaystyle\frac{1}{n-1}            (x_i - \overline{x}_i)^T (x_j - \overline{x}_j),

where \overline{x}_i is the mean of the elements in x_i. If v has nonzero diagonal elements then we can scale the diagonal to 1 to obtain the corresponding correlation matrix

C = D^{-1/2} V D^{-1/2},

where D = \mathrm{diag}(v_{ii}). The (i,j) element c_{ij} = v_{ii}^{-1/2} v_{ij} v_{jj}^{-1/2} is the correlation between the variables x_i and x_j.

Here are a few facts.

  • The elements of a correlation matrix lie on the interval [-1, 1].
  • The eigenvalues of a correlation matrix lie on the interval [0,n].
  • The eigenvalues of a correlation matrix sum to n (since the eigenvalues of a matrix sum to its trace).
  • The maximal possible determinant of a correlation matrix is 1.

It is usually not easy to tell whether a given matrix is a correlation matrix. For example, the matrix

A = \begin{bmatrix}      1  &  1 &   0\\      1  &  1 &   1\\      0  &  1 &   1      \end{bmatrix}

is not a correlation matrix: it has eigenvalues -0.4142, 1.0000, 2.4142. The only value of a_{13} and a_{31} that makes A a correlation matrix is 1.

A particularly simple class of correlation matrices is the one-parameter class A_n with every off-diagonal element equal to w, illustrated for n = 3 by

A_3 = \begin{bmatrix}      1  &  w &   w\\      w  &  1 &   w\\      w  &  w &   1      \end{bmatrix}.

The matrix A_n is a correlation matrix for -1/(n-1) \le w \le 1.

In some applications it is required to generate random correlation matrices, for example in Monte-Carlo simulations in finance. A method for generating random correlation matrices with a specified eigenvalue distribution was proposed by Bendel and Mickey (1978); Davies and Higham (2000) give improvements to the method. This method is implemented in the MATLAB function gallery('randcorr').

Obtaining or estimating correlations can be difficult in practice. In finance, market data is often missing or stale; different assets may be sampled at different time points (e.g., some daily and others weekly); and the matrices may be generated from different parametrized models that are not consistent. Similar problems arise in many other applications. As a result, correlation matrices obtained in practice may not be positive semidefinite, which can lead to undesirable consequences such as an investment portfolio with negative risk.

In risk management and insurance, matrix entries may be estimated, prescribed by regulations or assigned by expert judgement, but some entries may be unknown.

Two problems therefore commonly arise in connection with correlation matrices.

Nearest Correlation Matrix

Here, we have an approximate correlation matrix A that has some negative eigenvalues and we wish to replace it by the nearest correlation matrix. The natural choice of norm is the Frobenius norm, \|A\|_F = \bigl(\sum_{i,j} a_{ij}^2\bigr)^{1/2}, so we solve the problem

\min \{ \, \|A-C\|_F: C~\textrm{is a correlation matrix} \,\}.

We may also have a requirement that certain elements of C remain fixed. And we may want to weight some elements more than others, by using a weighted Frobenius norm. These are convex optimization problems and have a unique solution that can be computed using the alternating projections method (Higham, 2002) or a Newton algorithm (Qi and Sun, 2006; Borsdorf and Higham, 2010).

Another variation requires C to have factor structure, which means that the off-diagonal agrees with that of a rank-k matrix for some given k (Borsdorf, Higham, and Raydan, 2010). Yet another variation imposes a constraint that C has a certain rank or a rank no larger than a certain value. These problems are non-convex, because of the objective function and the rank constraint, respectively.

Another approach that can be used for restoring definiteness, although it does not in general produce the nearest correlation matrix, is shrinking, which constructs a convex linear combination A = \alpha C + (1-\alpha)M, where M is a target correlation matrix (Higham, Strabić, and Šego, 2016). Shrinking can readily incorporate fixed blocks and weighting.

Correlation Matrix Completion

Here, we have a partially specified matrix and we wish to complete it, that is, fill in the missing elements in order to obtain a correlation matrix. It is known that a completion is possible for any set of specified entries if the associate graph is chordal (Grone et al., 1994). In general, if there is one completion there are many, but there is a unique one of maximal determinant, which is elegantly characterized by the property that the inverse contains zeros in the positions of the unspecified entries.


This is a minimal set of references, and they cite further useful references.

Related Blog Posts

A Collection of Invalid Correlation Matrices


I’ve written before (here) about the increasingly common problem of matrices that are supposed to be correlation matrices (symmetric and positive semidefinite with ones on the diagonal) turning out to have some negative eigenvalues. This is usually bad news because it means that subsequent computations are unjustified and even dangerous. The problem occurs in a wide variety of situations. For example in portfolio optimization a consequence could be to take arbitrarily large positions in a stock, as discussed by Schmelzer and Hauser in Seven Sins in Portfolio Optimization.

Much research has been done over the last fifteen years or so on how to compute the nearest correlation matrix to a given matrix, and these techniques provide a natural way to correct an “invalid” correlation matrix. Of course, other approaches can be used, such as going back to the underlying data and massaging it appropriately, but it is clear from the literature that this is not always possible and practitioners may not have the mathematical or statistical knowledge to do it.

Nataša Strabić and I have built up a collection of invalid correlation matrices, which we used most recently in work on Bounds for the Distance to the Nearest Correlation Matrix. These are mostly real-life matrices, which makes them valuable for test purposes.

We have made our collection of invalid correlation matrices available in MATLAB form on GitHub as the repository matrices-correlation-invalid. I am delighted to be able to include, with the permission of investment company Orbis, two relatively large matrices, of dimensions 1399 and 3120, arising in finance. These were the matrices I used in my original 2002 paper.

Anderson Acceleration

Anderson acceleration, also known in quantum chemistry as Pulay mixing or direct inversion in the iterative subspace (DIIS), is a technique for accelerating the convergence of a fixed-point iteration. It has been widely used in electronic structure computations, but does not seem to be well known to numerical analysts.

Anderson’s original paper is from 1965 and is well cited, as Google Scholar shows: and65-gs.jpg I learned about Anderson acceleration in the minisymposium Anderson Acceleration and Applications organized by Tim Kelley at the SIAM Conference on Computational Science and Engineering in Salt Lake City in March 2015. Tim gave an excellent overview of the topic in the opening talk. The slides for that talk are available on Tim’s website.

PhD student Nataša Strabić and I have shown that Anderson acceleration works very well for speeding up the alternating projections method for computing the nearest correlation matrix. It typically gives a reduction in the number of iterations by a factor at least 2 for the standard nearest correlation matrix problem and by at least a factor 3 when additional constraints are imposed on the matrix (specified elements fixed and a lower bound on the smallest eigenvalue). In some cases the reduction is by a factor of as much as 25. Since the overhead of Anderson acceleration is small, significant speedups are obtained.

In my 2013 post The Nearest Correlation Matrix I included a MATLAB code nearcorr.m. In place of this I now recommend our new accelerated code nearcorr_aa.m, which is available from the repository anderson-accel-ncm on GitHub. Our paper describing this work is available on MIMS EPrints.

For me this project is an excellent illustration of the importance of going to conferences in order to learn of new ideas and new developments.

The Nearest Correlation Matrix

A correlation matrix is a symmetric matrix with unit diagonal and nonnegative eigenvalues. In 2000 I was approached by a London fund management company who wanted to find the nearest correlation matrix (NCM) in the Frobenius norm to an almost correlation matrix: a symmetric matrix having a significant number of (small) negative eigenvalues. This problem arises when the data from which the correlations are constructed is asynchronous or incomplete, or when models are stress-tested by artificially adjusting individual correlations. Solving the NCM problem (or obtaining a true correlation matrix some other way) is important in order to avoid subsequent calculations breaking down due to negative variances or volatilities, for example.


The convexity properties of the problem mean that there is a unique nearest correlation matrix, which is hence a global minimizer. In the 1990s several algorithms had been proposed for computing it, but none was guaranteed to work. Prompted by the approach from the company, I investigated the problem. I proved some results characterizing the solution and derived an alternating projections algorithm for computing it 1. The algorithm repeatedly projects onto the set of matrices with unit diagonal and the cone of symmetric positive semidefinite matrices. It is guaranteed to converge to the minimum, but does so at a linear rate. An important feature of the algorithm is that other projections can be added on. Thus, for example, if we want to leave the trailing principal submatrix of order three unchanged, we simply restore it at the end of each iteration 2, 3.

The alternating projections algorithm is widely used, but can be slow to converge, especially for large matrices 4. In 2006, Qi and Sun 5 derived a Newton method for the NCM problem. They work with the dual of the original problem, which is unconstrained. The objective function of the dual is not twice continuously differentiable, but by using the theory of strongly semismooth matrix functions Qi and Sun show that Newton’s method nevertheless has global quadratic convergence.

Ruediger Borsdorf and I, building on work in his M.Sc. thesis 3, built an algorithm that solves the Newton equations using minres with a Jacobi preconditioner (a nontrivial task since the coefficient matrix is not explicitly available), and has some other refinements described in 6. This algorithm has been implemented in the NAG Library 7.

In subsequent work, Borsdorf, Marcos Raydan and I 8 , 9 used the spectral projected gradient method (SPGM) to solve the k-factor NCM, in which the correlation matrix is constrained to have the form of a diagonal matrix plus a rank-k matrix. This problem variant arises in multifactor normal copula models, collateralized debt obligations (CDOs), and multivariate time series. One existing previous algorithm can fail to converge or solve the problem, but the SPGM has guaranteed convergence to a stationary point. This algorithm has also been implemented in the NAG Library.

The NCM problem has proved to be of very wide interest beyond the world of finance, as indicated by the fact that 1 is now my third best cited paper on the Web of Science. Recent applications in which the problem arises include reconstructing 20th century sea levels, genetic evaluations for thoroughbred horse breeding, modelling public health data sets, modelling storm damage of buildings, and a Kriging model for reservoirs.


I regularly receive emails asking for software implementing algorithms for the NCM problem. I thought it would be useful to summarize what is available. In general, the Newton method is preferred, but the alternating projections method is more flexible as regards incorporating additional constraints.

Original NCM Problem

Alternating Projections Method

Newton Method

k-Factor NCM Problem

A MATLAB Alternating Projections Function

I thought it would be useful to provide my own MATLAB function nearcorr.m implementing the alternating projections algorithm. The listing is below. To see how it compares with the NAG code g02aa.m I ran the test code

%NEARCORR_TEST  Compare g02aa and nearcorr.

rng(10)                                  % Seed random number generators.
n = 100;
A = gallery('randcorr',n);               % Random correlation matrix. 
E  = randn(n)*1e-1;  A = A + (E + E')/2; % Perturb it.
tol = 1e-10;

% A = cor1399; tol = 1e-4;

maxits = int64(-1);  % For linear equation solver.
maxit = int64(-1);   % For Newton iteration.
[~,X1,iter1,feval,nrmgrd,ifail] = g02aa(A,'errtol',tol,'maxits',maxits, ...

fprintf('  Newton steps taken: %d\n', iter1);
fprintf('  Norm of gradient of last Newton step: %6.4f\n', nrmgrd);
if ifail > 0, fprintf('  g02aa failed with ifail = %g\n', ifail), end

[X2,iter2] = nearcorr(A,tol,[],[],[],[],1);
fprintf('  Number of iterations: %d\n', iter2);

fprintf('  Normwise relative difference between computed solutions:')
fprintf('%9.2e\n', norm(X1-X2,1)/norm(X1,1))

Running under Windows 7 on an Ivy Bridge Core i7 processor @4.4Ghz I obtained the following results, where the “real-life” matrix is based on stock data:

Matrix Code Time (secs) Iterations
1. Random (100), tol = 1e-10 g02aa 0.023 4
nearcorr 0.052 15
2. Random (500), tol = 1e-10 g02aa 0.48 4
nearcorr 3.01 26
3. Real-life (1399), tol = 1e-4 g02aa 6.8 5
nearcorr 100.6 68

The results show that while nearcorr can be fast for small dimensions, the number of iterations, and hence its run time, tends to increase with the dimension and it can be many times slower than the Newton method. This is a stark illustration of the difference between quadratic convergence and linear (with problem-dependent constant) convergence. Here is my MATLAB function nearcorr.m.

function [X,iter] = nearcorr(A,tol,flag,maxits,n_pos_eig,w,prnt)
%NEARCORR    Nearest correlation matrix.
%   finds the nearest correlation matrix to the symmetric matrix A.
%   TOL is a convergence tolerance, which defaults to 16*EPS.
%   If using FLAG == 1, TOL must be a 2-vector, with first component
%   the convergence tolerance and second component a tolerance
%   for defining "sufficiently positive" eigenvalues.
%   FLAG = 0: solve using full eigendecomposition (EIG).
%   FLAG = 1: treat as "highly non-positive definite A" and solve
%             using partial eigendecomposition (EIGS).
%   MAXITS is the maximum number of iterations (default 100, but may
%   need to be increased).
%   N_POS_EIG (optional) is the known number of positive eigenvalues of A.
%   W is a vector defining a diagonal weight matrix diag(W).
%   PRNT = 1 for display of intermediate output.

%   By N. J. Higham, 13/6/01, updated 30/1/13, 15/11/14, 07/06/15.
%   Reference:  N. J. Higham, Computing the nearest correlation
%   matrix---A problem from finance. IMA J. Numer. Anal.,
%   22(3):329-343, 2002.

if ~isequal(A,A'), error('A must by symmetric.'), end
if nargin < 2 || isempty(tol), tol = length(A)*eps*[1 1]; end
if nargin < 3 || isempty(flag), flag = 0; end
if nargin < 4 || isempty(maxits), maxits = 100; end
if nargin < 6 || isempty(w), w = ones(length(A),1); end
if nargin < 7, prnt = 1; end

n = length(A);
if flag >= 1
   if nargin < 5 || isempty(n_pos_eig)
      [V,D] = eig(A); d = diag(D);
      n_pos_eig = sum(d >= tol(2)*d(n));
   if prnt, fprintf('n = %g, n_pos_eig = %g\n', n, n_pos_eig), end

X = A; Y = A;
iter = 0;
rel_diffX = inf; rel_diffY = inf; rel_diffXY = inf;
dS = zeros(size(A));

w = w(:); Whalf = sqrt(w*w');

while max([rel_diffX rel_diffY rel_diffXY]) > tol(1)

   Xold = X;
   R = Y - dS;
   R_wtd = Whalf.*R;
   if flag == 0
      X = proj_spd(R_wtd);
   elseif flag == 1
      [X,np] = proj_spd_eigs(R_wtd,n_pos_eig,tol(2));
   X = X ./ Whalf;
   dS = X - R;
   Yold = Y;
   Y = proj_unitdiag(X);
   rel_diffX = norm(X-Xold,'fro')/norm(X,'fro');
   rel_diffY = norm(Y-Yold,'fro')/norm(Y,'fro');
   rel_diffXY = norm(Y-X,'fro')/norm(Y,'fro');
   iter = iter + 1;
   if prnt
      fprintf('%2.0f:  %9.2e  %9.2e  %9.2e', ...
               iter, rel_diffX, rel_diffY, rel_diffXY)
      if flag >= 1, fprintf('  np = %g\n',np), else fprintf('\n'), end
   if iter > maxits
       error(['Stopped after ' num2str(maxits) ' its. Try increasing MAXITS.'])


function A = proj_spd(A)

if ~isequal(A,A'), error('Not symmetric!'), end
[V,D] = eig(A);
A = V*diag(max(diag(D),0))*V';
A = (A+A')/2; % Ensure symmetry.

function [A,n_pos_eig_found] = proj_spd_eigs(A,n_pos_eig,tol)

if ~isequal(A,A'), error('Not symmetric!'), end
k = n_pos_eig + 10; % 10 is safety factor.
if k > length(A), k = n_pos_eig; end
opts.disp = 0;
[V,D] = eigs(A,k,'LA',opts); d = diag(D);
j = (d > tol*max(d));
n_pos_eig_found = sum(j);
A = V(:,j)*D(j,j)*V(:,j)';  % Build using only the selected eigenpairs.
A = (A+A')/2; % Ensure symmetry.

function A = proj_unitdiag(A)

n = length(A);
A(1:n+1:n^2) = 1;


  • Links updated August 4, 2014.
  • nearcorr.m corrected November 15, 2014: iter was incorrectly initialized (thanks to Mike Croucher for pointing this out).
  • Added link to Mike Croucher’s Python alternating directions code, November 17, 2014.
  • Corrected an error in the convergence test, June 7, 2015. Effect on performance will be minimal (thanks to Nataša Strabić for pointing this out).



Nicholas J. Higham, Computing the Nearest Correlation Matrix—A Problem from Finance, IMA J. Numer. Anal. 22, 329–343, 2002.


Craig Lucas, Computing Nearest Covariance and Correlation Matrices, M.Sc. Thesis, University of Manchester, 2001.


Ruediger Borsdorf, A Newton Algorithm for the Nearest Correlation Matrix, M.Sc. Thesis, University of Manchester, 2007.


Rene Escalante and Marcos Raydan, Alternating Projection Methods, SIAM, 2011.


Hou-Duo Qi and Defeng Sun, A Quadratically Convergent Newton Method for Computing the Nearest Correlation Matrix, SIAM J. Matrix Anal. Appl. 28, 360-385, 2006


Ruediger Borsdorf and Nicholas J. Higham, A Preconditioned Newton Algorithm for the Nearest Correlation Matrix, IMA J. Numer. Anal. 30, 94-107, 2010.


Ruediger Borsdorf, Nicholas Higham and Marcos Raydan, Computing a Nearest Correlation Matrix with Factor Structure, SIAM J. Matrix Anal., Appl. 31, 2603-2622, 2010


Ruediger Borsdorf, Structured Matrix Nearness Problems: Theory and Algorithms, Ph.D. Thesis, University of Manchester, 2012.