I’ve written before (here) about the increasingly common problem of matrices that are supposed to be correlation matrices (symmetric and positive semidefinite with ones on the diagonal) turning out to have some negative eigenvalues. This is usually bad news because it means that subsequent computations are unjustified and even dangerous. The problem occurs in a wide variety of situations. For example in portfolio optimization a consequence could be to take arbitrarily large positions in a stock, as discussed by Schmelzer and Hauser in Seven Sins in Portfolio Optimization.

Much research has been done over the last fifteen years or so on how to compute the nearest correlation matrix to a given matrix, and these techniques provide a natural way to correct an “invalid” correlation matrix. Of course, other approaches can be used, such as going back to the underlying data and massaging it appropriately, but it is clear from the literature that this is not always possible and practitioners may not have the mathematical or statistical knowledge to do it.

Nataša Strabić and I have built up a collection of invalid correlation matrices, which we used most recently in work on Bounds for the Distance to the Nearest Correlation Matrix. These are mostly real-life matrices, which makes them valuable for test purposes.

We have made our collection of invalid correlation matrices available in MATLAB form on GitHub as the repository matrices-correlation-invalid. I am delighted to be able to include, with the permission of investment company Orbis, two relatively large matrices, of dimensions 1399 and 3120, arising in finance. These were the matrices I used in my original 2002 paper.