When Does Thresholding Preserve Positive Definiteness?

Does a symmetric positive definite matrix remain positive definite when we set one or more elements to zero? This question arises in thresholding, in which elements of absolute value less than some tolerance are set to zero. Thresholding is used in some applications to remove correlations thought to be spurious, so that only statistically significant ones are retained.

We will focus on the case where just one element is changed and consider an arbitrary target value rather than zero. Given an n\times n symmetric positive definite matrix A we define A(t) to be the matrix resulting from adding t to the (i,j) and (j,i) elements and we ask when is A(t) positive definite. We can write

\notag   A(t) = A + t(e_i^{}e_j^T + e_j^{}e_i^T) \equiv A + tE_{ij},

where e_i is the ith column of the identity matrix. The perturbation E_{ij} has rank 2, with eigenvalues -1, 1, and 0 repeated n-2 times. Hence we can write E_{ij} in the form E_{ij} = pp^T - qq^T, where p^Tp = q^Tq = 1 and p^Tq = 0. Adding pp^T to A causes each eigenvalue to increase or stay the same, while subtracting qq^T decreases or leaves unchanged each eigenvalue. However, more is true: after each of these rank-1 perturbations the eigenvalues of the original and perturbed matrices interlace, by Weyl’s theorem. Hence, with the eigenvalues of A ordered as \lambda_n(A) \le \cdots \le \lambda_1(A), we have (Horn and Johnson, Cor. 4.3.7)

\notag \begin{aligned}   \lambda_n(A(t)) &\le \lambda_{n-1}(A), \\   \lambda_{i+1}(A) &\le \lambda_i(A(t)) \le \lambda_{i-1}(A),    \quad i = 2\colon n-1, \\   \lambda_2(A) &\le \lambda_1(A(t)). \end{aligned}

Because A is positive definite these inequalities imply that \lambda_{n-1}(A(t)) \ge \lambda_n(A) > 0, so A(t) has at most one negative eigenvalue. Since \det(A(t)) is the product of the eigenvalues of A(t) this means that A(t) is positive definite precisely when \det(A(t)) > 0.

There is a simple expression for \det(A(t)), which follows from a lemma of Chan (1984), as explained by Georgescu, Higham, and Peters (2018):

\notag  \det(A(t)) = \det(A)\big(1+ 2t b_{ij} + t^2(b_{ij}^2-b_{ii}b_{jj})\big),

where B = A^{-1}. Hence the condition for A(t) to be positive definite is

\notag  q_{ij}(t) = 1 + 2t b_{ij} + t^2(b_{ij}^2-b_{ii}b_{jj}) > 0.

We can factorize

\notag     q_{ij}(t) = \Bigl( t\bigl(b_{ij}  - \sqrt{b_{ii}b_{jj}}\bigr) + 1 \Bigr)                 \Bigl( t\bigl(b_{ij}  + \sqrt{b_{ii}b_{jj}}\bigr) + 1 \Bigr),

so q_{ij}(t) > 0 for

\notag    t\in \left( \displaystyle\frac{-1}{ \sqrt{b_{ii}b_{jj}} + b_{ij} },                 \displaystyle\frac{1}{ \sqrt{b_{ii}b_{jj}} - b_{ij} } \right) =: I_{ij},

where the endpoints are finite because B, like A, is positive definite and so |b_{ij}| < \sqrt{b_{ii}b_{jj}}.

The condition for A to remain positive definite when a_{ij} is set to zero is q_{ij}(-a_{ij}) > 0, or equivalently -a_{ij} \in I_{ij}. To check either of these conditions we need just b_{ij}, b_{ii}, and b_{jj}. These elements can be computed without computing the whole inverse by solving the equations Ab_k = e_k for k = i,j, for the kth column b_k of B, making use of a Cholesky factorization of A.

As an example, we consider the 4\times 4 Lehmer matrix, which has (i,j) element i/j for i \ge j:

\notag   A = \begin{bmatrix}         1           & \frac{1}{2}  & \frac{1}{3} & \frac{1}{4} \\[3pt]         \frac{1}{2} &           1  & \frac{2}{3} & \frac{1}{2} \\[3pt]         \frac{1}{3} &  \frac{2}{3} & 1           & \frac{3}{4} \\[3pt]         \frac{1}{4} &  \frac{1}{2} & \frac{3}{4} &  1         \end{bmatrix}.

The smallest eigenvalue of A is 0.208. Any off-diagonal element except the (2,4) element can be zeroed without destroying positive definiteness, and if the (2,4) element is zeroed then the new matrix has smallest eigenvalue -0.0249. For i=2 and j=4, the following plot shows in red \lambda_{\min}(A(t)) and in blue q_{24}(t); the black dots are the endpoints of the closure of the interval I_{24} = (-0.453,0.453) and the vertical black line is the value -a_{24}. Clearly, -a_{24} lies outside I_{24}, which is why zeroing this element causes a loss of positive definiteness. Note that I_{24} also tells us that we can increase a_{24} to any number less than 0.953 without losing definiteness.

pdplot_lehmer.jpg

Given a positive definite matrix and a set S of elements to be modified we may wish to determine subsets (including a maximal subset) of S for which the modifications preserve definiteness. Efficiently determining these subsets appears to be an open problem.

In practical applications thresholding may lead to an indefinite matrix. Definiteness must then be restored to obtain a valid correlation matrix. One way to do this is to find the nearest correlation matrix in the Frobenius norm such that the zeroed elements remain zero. This can be done by the alternating projections method with a projection to keep the zeroed elements fixed. Since the nearest correlation matrix is positive semidefinite, it is also desirable to to incorporate a lower bound \delta > 0 on the smallest eigenvalue, which corresponds to another projection. Both these projections are supported in the algorithm of Higham and Strabić (2016), implemented in the code at https://github.com/higham/anderson-accel-ncm. For the Lehmer matrix, the nearest correlation matrix with zero (2,4) element and eigenvalues at least \delta = 0.01 is (to four significant figures)

\notag   \begin{bmatrix}    1       &    0.4946  &    0.3403  &    0.2445  \\    0.4946  &    1       &    0.6439  &    0       \\    0.3403  &    0.6439  &    1       &    0.7266  \\    0.2445  &    0       &    0.7266  &    1    \end{bmatrix}.

A related question is for what patterns of elements that are set to zero is positive definiteness guaranteed to be preserved for all positive definite A? Clearly, setting all the off-diagonal elements to zero preserves definiteness, since the diagonal of a positive definite matrix is positive. Guillot and Rajaratnam (2012) show that the answer to the question is that the new matrix must be a symmetric permutation of a block diagonal matrix. However, for particular A this restriction does not necessarily hold, as the Lehmer matrix example shows.

References

Leave a comment