Updated Catalogue of Software for Matrix Functions

help-matfun.jpg
From “help matfun” in MATLAB.

Edvin Deadman and I have updated the catalogue of software for matrix functions that we produced in 2014 (and which was discussed in this post). The new version, which has undergone some minor reorganization, is available here. It covers what is available in languages (C++, Fortran, Java, Julia, Python), problem solving environments (GNU Octave, Maple, Mathematica, MATLAB, R, Scilab), and libraries (GNU Scientific Library, NAG Library, SLICOT).

nag-mark25.jpg
From NAG LIbrary Mark 25 News.

Here are some highlights of changes in the last two years that are reflected in the new version.

Other changes to the catalogue include these.

  • SLICOT has been added.
  • Two more R packages are included.

Suggestions for inclusion in a future revision are welcome.

The Improved MATLAB Functions Expm and Logm

pcam-p97-exp-short.jpg
Equation from the Princeton Companion to Applied Mathematics, article “Functions of Matrices” (p. 97)

The matrix exponential is a ubiquitous matrix function, important both for theory and for practical computation. The matrix logarithm, an inverse to the exponential, is also increasingly used (see my earlier post, 400 Years of Logarithms).

MATLAB R2015b introduced new versions of the expm and logm functions. The Release Notes say

The algorithms for expm, logm, and sqrtm show increased accuracy, with logm and sqrtm additionally showing improved performance.

The help text for expm and logm is essentially unchanged from the previous versions, so what’s different about the new functions? (I will discuss sqrtm in a future post.)

The answer is that both functions make use of new backward error bounds that can be much smaller than the old ones for very nonnormal matrices, and so help to avoid a phenomenon known as overscaling. The key change is that when bounding a matrix power series p(X) = a_0 I + a_1 X + a_2 X^2 + \cdots, instead of bounding the kth term a_k X^k by |a_k| \|X\|^k, a potentially smaller bound is used.

This is best illustrated by example. Suppose we want to bound \|X^{12}\| and are not willing to compute X^{12} but are willing to compute lower powers of X. We have 12 = 6 \times 2 = 4 \times 3, so \|X^{12}\| is bounded by each of the terms (\|X^2\|)^6, (\|X^3\|)^4, (\|X^4\|)^3, and (\|X^6\|)^2. But it is easy to see that (\|X^6\|)^2 \le (\|X^2\|)^6 and (\|X^6\|)^2 \le (\|X^3\|)^4, so we can discard two of the bounds, ending up with

\|X^{12}\| \le \min( \|X^4\|^3, \|X^6\|^2 ).

This argument can be generalized so that every power of X is bounded in terms of the norms of X^p for values of p up to some small, fixed value. The gains can be significant. Consider the matrix

X = \begin{bmatrix}1 & 100 \\ 0 & 1 \end{bmatrix}.

We have \|X\|^{12} \approx 10^{24}, but

X^k = \begin{bmatrix}1 & 100k \\ 0 & 1 \end{bmatrix},

so the bound above is roughly \|X^{12}\| \le 6 \times 10^{7}, which is a significant improvement.

One way to understand what is happening is to note the inequality

\rho(X) \le \| X^k\| ^{1/k} \le \|X\|,

where \rho is the spectral radius (the largest modulus of any eigenvalue). The upper bound corresponds to the usual analysis. The lower bound is something that we cannot use to bound the norm of the power series. The middle term is what we are using, and as k\to\infty the middle term converges to the lower bound, which can be arbitrarily smaller than the upper bound.

What is the effect of these bounds on the algorithms in expm and logm? Both algorithms make use of Padé approximants, which are good only for small-normed matrices, so the algorithms begin by reducing the norm of the input matrix. Backward error bounds derived by bounding a power series as above guide the norm reduction and if the bounds are weak then the norm is reduced too much, which can result in loss of accuracy in floating point arithmetic—the phenomenon of overscaling. The new bounds greatly reduce the chance of overscaling.

In his blog post A Balancing Act for the Matrix Exponential, Cleve Moler describes a badly scaled 3-by-3 matrix for which the original expm suffers from overscaling and a loss of accuracy, but notes that the new algorithm does an excellent job.

The new logm has another fundamental change: it applies inverse scaling and squaring and Padé approximation to the whole triangular Schur factor, whereas the previous logm applied this technique to the individual diagonal blocks in conjunction with the Parlett recurrence.

For more on the algorithms underlying the new codes see these papers. The details of how the norm of a matrix power series is bounded are given in Section 4 of the first paper.

Corless and Fillion’s A Graduate Introduction to Numerical Methods from the Viewpoint of Backward Error Analysis

cofi13_cover.jpg

I acquired this book when it first came out in 2013 and have been dipping into it from time to time ever since. At 868 pages long, the book contains a lot of material and I have only sampled a small part of it. In this post I will not attempt to give a detailed review but rather will explain the distinctive features of the book and why I like it.

As the title suggests, the book is pitched at graduate level, but it will also be useful for advanced undergraduate courses. The book covers all the main topics of an introductory numerical analysis course: floating point arithmetic, interpolation, nonlinear equations, numerical linear algebra, quadrature, numerical solution of differential equations, and more.

In order to stand out in the crowded market of numerical analysis textbooks, a book needs to offer something different. This one certainly does.

  • The concepts of backward error and conditioning are used throughout—not just in the numerical linear algebra chapters.
  • Complex analysis, and particularly the residue theorem, is exploited throughout the book, with contour integration used as a fundamental tool in deriving interpolation formulas. I was pleased to see section 11.3.2 on the complex step approximation to the derivative of a real-valued function, which provides an interesting alternative to finite differences. Appendix B, titled “Complex Numbers”, provides in just 8 pages excellent advice on the practical usage of complex numbers and functions of a complex variable that would be hard to find in complex analysis texts. For example, it has a clear discussion of branch cuts, making use of Kahan’s counterclockwise continuity principle (eschewing Riemann surfaces, which have “almost no traction in the computer world”), and makes use of the unwinding number introduced by Corless, Hare, and Jeffrey in 1996.
  • The barycentric formulation of Lagrange interpolation is used extensively, possibly for the first time in a numerical analysis text. This approach was popularized by Berrut and Trefethen in their 2004 SIAM Review paper, and my proof of the numerical stability of the formulas has helped it to gain popularity. Polynomial interpolation and rational interpolation are both covered.
  • Both numerical and symbolic computation are employed—whichever is the most appropriate tool for the topic or problem at hand. Corless is well known for his contributions to symbolic computation and to Maple, but he is equally at home in the world of numerics. Chebfun is also used in a number of places. In addition, section 11.7 gives a 2-page treatment of automatic differentiation.

This is a book that one can dip into at any page and quickly find something that is interesting and beyond standard textbook content. Not many numerical analysis textbooks include the Lambert W function, a topic on which Corless is uniquely qualified to write. (I note that Corless and Jeffrey wrote an excellent article on the Lambert W function for the Princeton Computation to Applied Mathematics.) And not so many use pseudospectra.

I like Notes and References sections and this book has lots of them, with plenty of detail, including references that I was unaware of.

As regards the differential equation content, it includes initial and boundary value problems for ODEs, as well as delay differential equations (DDEs) and PDEs. The DDE chapter uses the MATLAB dde23 and ddesd functions for illustration and, like the other differential equation chapters, discusses conditioning.

The book would probably have benefited from editing to reduce its length. The index is thorough, but many long entries need breaking up into subentries. Navigation of the book would be easier if algorithms, theorems, definitions, remarks, etc., had been numbered in one sequence instead of as separate sequences.

Part of the book’s charm is its sometimes unexpected content. How many numerical analysis textbooks recommend reading a book on the programming language Forth (a small, reverse Polish notation-based language popular on microcomputers when I was a student)? And how many would point out the 1994 “rediscovery” of the trapezoidal rule in an article in the journal Diabetes Care (Google “Tai’s model” for some interesting responses to that article).

I bought the book from SpringerLink via the MyCopy feature, whereby any book available electronically via my institution’s subscription can be bought in (monochrome) hard copy for 24.99 euros, dollars, or pounds (the same price in each currency!).

I give the last word to John Butcher, who concludes the Foreword with “I love this book.”

Anderson Acceleration

Anderson acceleration, also known in quantum chemistry as Pulay mixing or direct inversion in the iterative subspace (DIIS), is a technique for accelerating the convergence of a fixed-point iteration. It has been widely used in electronic structure computations, but does not seem to be well known to numerical analysts.

Anderson’s original paper is from 1965 and is well cited, as Google Scholar shows: and65-gs.jpg I learned about Anderson acceleration in the minisymposium Anderson Acceleration and Applications organized by Tim Kelley at the SIAM Conference on Computational Science and Engineering in Salt Lake City in March 2015. Tim gave an excellent overview of the topic in the opening talk. The slides for that talk are available on Tim’s website.

PhD student Nataša Strabić and I have shown that Anderson acceleration works very well for speeding up the alternating projections method for computing the nearest correlation matrix. It typically gives a reduction in the number of iterations by a factor at least 2 for the standard nearest correlation matrix problem and by at least a factor 3 when additional constraints are imposed on the matrix (specified elements fixed and a lower bound on the smallest eigenvalue). In some cases the reduction is by a factor of as much as 25. Since the overhead of Anderson acceleration is small, significant speedups are obtained.

In my 2013 post The Nearest Correlation Matrix I included a MATLAB code nearcorr.m. In place of this I now recommend our new accelerated code nearcorr_aa.m, which is available from the repository anderson-accel-ncm on GitHub. Our paper describing this work is available on MIMS EPrints.

For me this project is an excellent illustration of the importance of going to conferences in order to learn of new ideas and new developments.

The Spotlight Factor

In my Handbook of Writing for the Mathematical Sciences I described the spotlight factor, originally introduced by Tompa in 1989. The spotlight factor is defined for the first author of a paper in which there are n authors listed alphabetically, and it is assumed that the paper is from a community where it is the custom to order authors alphabetically.

The spotlight factor is the probability that if n-1 coauthors are chosen independently at random they will all have surnames later in the alphabet than the first author. This definition is not precise, since it is not clear what is the sample space of all possible names, so it is better to regard the spotlight factor as being defined by the formula given by Tompa, which is implemented in the MATLAB function below.

The smallest spotlight factor I have found is the value 0.0244 for Zielinski, for the paper

Pawel Zielinski and Krystyna Zietak, The Polar Decomposition—Properties, Applications and Algorithms, Applied Mathematics, Ann. Pol. Math. Soc. 38, 23-49, 1995

This beats the best factor of 0.0251 reported by Tompa in a 1990 follow-up paper.

Can you do better?

Here is a MATLAB M-file to compute the spotlight factor, preceded by an example of its usage:

>> spotlight('zielinski',1)
ans =
   2.4414e-02
function s = spotlight(x, k)
%SPOTLIGHT   Tompa's spotlight factor of authorship.
%   SPOTLIGHT(X, K) is the spotlight factor for the author whose 
%   last name is specified in the string X, with K coauthors.
%   Mixed upper and lower case can be used.
%   Smaller spotlight factors correspond to rarer events.

%   Reference:
%   Martin Tompa, Figures of Merit, SIGACT News 20 (1), 62-71, 1989

if ~ischar(x), error('First argument must be a string.'), end
if nargin < 2, error('Must give two arguments.'), end

x = double(upper(x)) - double('A') + 1;
x( find(x < 0 | x > 26) ) = 0;  % Handle punctuation and spaces.

s = 0;

% Ideally use Horner's rule, but the following is clearer.

for i=1:length(x)
    t = x(i);
    s = s + t/27^i;
end

s = (1 - s)^k;

Making Sense of Multivalued Matrix Functions with the Matrix Unwinding Function

Try the following quiz. Let A be an n\times n real or complex matrix. Consider the principal logarithm—the one for which \log z has imaginary part in (-\pi,\pi]—and define z^{t} = e^{t \log z} for t\in\mathbb{C} (an important special case being t = 1/p for an integer p) .

True or false:

  1. \log e^A = A for all A, in other words passing A through the exponential then the logarithm takes us on a round trip.
  2. (I-A^2)^{1/2} = (I-A)^{1/2}(I+A)^{1/2} for all A.
  3. (AB)^{t} = A^{t}B^{t} whenever A and B commute.

The answers are

  1. False. Yet e^{\log A} = A is always true.
  2. True. Yet the similar identity (A^2-I)^{1/2}=(A-I)^{1/2}(A+I)^{1/2} is false.
  3. False.

At first sight these results may seem rather strange. How can we understand them? If you take the viewpoint that each occurrence of \log and a power t in the above expressions stands for the families of all possible logarithms and powers then the identities are all true. But from a computational viewpoint we are usually concerned with a particular branch of each function, the principal branch, so equality cannot be taken for granted.

An excellent tool for understanding these identities is a new matrix function called the matrix unwinding function. This function is defined for any square matrix A by U(A) = (A - \log e^A )/(2\pi i), and it arises from the scalar unwinding number introduced by Corless, Hare and Jeffrey in 1996 1, 2. There is nothing special about A and B being matrices in this quiz; the answers are the same if they are scalars. But the matrix unwinding function neatly handles the extra subtleties of the matrix case.

130712-1101-57-2430.jpg
Mary’s talk at the 2014 SIAM Annual Meeting in San Diego.

From the definition we have \log e^A = A + 2\pi i U(A), so the relation in the first quiz question is clearly valid when U(A) = 0, which is the case when the eigenvalues of A have imaginary parts lying on the interval (-\pi,\pi]. Each of the above identities can be understood by deriving an exact relation in which the unwinding function provides the discrepancy between the left and right-hand sides. For example,

(AB)^t = A^t B^t e^{-2\pi t i U(\log A + \log B)}.

Mary Aprahamian and I have recently published the paper The Matrix Unwinding Function, with an Application to Computing the Matrix Exponential, (SIAM J. Matrix. Anal. Appl., 35, 88-109, 2014), in which we introduce the matrix unwinding function and develop its many interesting properties. We analyze the identities discussed above, along with various others. Thanks to the University of Manchester’s Open Access funds, that paper is available for anyone to download from the SIAM website, using the given link.

The matrix unwinding function has another use. Note that e^A=e^{\log e^A}=e^{A-2\pi i U(A)} and the matrix A-2\pi i U(A) has eigenvalues with imaginary parts in (-\pi,\pi]. The scaling and squaring method for computing the matrix exponential is at its most efficient when A has norm of order 1, and this argument reduction operation tends to reduce the norm of A when A has eigenvalues with large imaginary part. In the paper we develop this argument reduction and show that it can lead to substantial computational savings.

How can we compute U(A)? The following incomplete MATLAB code implements the Schur algorithm developed in the paper. The full code is available.

function U = unwindm(A,flag)
%UNWINDM  Matrix unwinding function.
%   UNWINDM(A) is the matrix unwinding function of the square matrix A.

%   Reference: M. Aprahamian and N. J. Higham.
%   The matrix unwinding function, with an application to computing the
%   matrix exponential.  SIAM J. Matrix Anal. Appl., 35(1):88-109, 2014.

%   Mary Aprahamian and Nicholas J. Higham, 2013.

if nargin < 2, flag = 1; end
[Q,T] = schur(A,'complex');

ord = blocking(T);
[ord, ind] = swapping(ord);  % Gives the blocking.
ord = max(ord)-ord+1;        % Since ORDSCHUR puts highest index top left.
[Q,T] = ordschur(Q,T,ord);
U = Q * unwindm_tri(T) * Q';

%%%%%%%%%%%%%%%%%%%%%%%%%%%
function F = unwindm_tri(T)
%UNWINDM_tri   Unwinding matrix of upper triangular matrix.

n = length(T);
F = diag( unwind( diag(T) ) );

% Compute off-diagonal of F by scalar Parlett recurrence.
for j=2:n
     for i = j-1:-1:1
         if F(i,i) == F(j,j)
            F(i,j) = 0;        % We're within a diagonal block.
         else   
            s = T(i,j)*(F(i,i)-F(j,j));
            if j-i >= 2
               k = i+1:j-1;
               s = s + F(i,k)*T(k,j) - T(i,k)*F(k,j);
            end
            F(i,j) = s/(T(i,i)-T(j,j));
         end
     end   
end

%%%%%%%%%%%%%%%%%%%%%%
function u = unwind(z)
%UNWIND  Unwinding number.
%   UNWIND(A) is the (scalar) unwinding number.

u = ceil( (imag(z) - pi)/(2*pi) );

... Other subfunctions omitted

Here is an example. As it illustrates, the unwinding matrix of a real matrix is usually pure imaginary.

>> A = [1 4; -1 1]*4, U = unwindm(A)
A =
     4    16
    -4     4
U =
   0.0000 + 0.0000i   0.0000 - 2.0000i
   0.0000 + 0.5000i  -0.0000 + 0.0000i
>> residual = A - logm(expm(A))
residual =
   -0.0000   12.5664
   -3.1416   -0.0000
>> residual - 2*pi*i*U
ans =
   1.0e-15 *
  -0.8882 + 0.0000i   0.0000 + 0.0000i
  -0.8882 + 0.0000i  -0.8882 + 0.3488i

Footnotes:

1

Robert Corless and David Jeffrey, The Unwinding Number, SIGSAM Bull 30, 28-35, 1996

2

David Jeffrey, D. E. G. Hare and Robert Corless, Unwinding the Branches of the Lambert W Function, Math. Scientist 21, 1-7, 1996

Cataloguing Software for Matrix Functions

I began working on functions of matrices just over thirty years ago, when I was an MSc student, my original interest being in the matrix square root. In those days relatively little research had been done on the topic and no software for evaluating matrix functions was generally available. Since then interest in matrix functions has grown greatly.

Functions of interest include the exponential, the logarithm, and real powers, along with all kinds of trigonometric functions and some less generally known functions such as the sign function and the unwinding function.

130410-1301-49-0762.jpg
Gil Strang at the Advances in Matrix Functions and Matrix Equations Workshop in Manchester, April 2013.

A large amount of software for evaluating matrix functions now exists, covering many languages (C++, Fortran, Julia, Python, …) and problem solving environments (GNU Octave, Maple, Mathematica, MATLAB, R, Scilab, …). Some of it is part of a core product or package, while other codes are available individually on researchers’ web sites. It is hard to keep up with what is available.

Edvin Deadman and I therefore decided to produce a catalogue of matrix function software, which is available in the form of MIMS EPrint 2014.8. We have organized the catalogue by language/package and documented the algorithms that are implemented. The EPrint also contains a summary of the rapidly growing number of applications in which matrix functions are used, including some that we discovered only recently, and a list of what we regard as the best current algorithms.

Producing the catalogue took more work than I expected. Many packages are rather poorly documented and we sometimes had to delve deep into documentation or source code in order to find out which algorithms had been implemented.

One thing our survey shows is that the most complete and up to date collection of codes for matrix functions currently available is that in the NAG Library, which contains over 40 codes all implementing state of the art algorithms. This is no accident. We recently completed a three year Knowledge Transfer Partnership (KTP) funded by the Technology Strategy Board, the University of Manchester, and EPSRC, whose purpose was to translate matrix function algorithms into software for the NAG Engine (the underlying code base from which all NAG products are built) and to embed processes and expertise in developing matrix functions software into the company. Edvin was the Associate employed on the project and wrote all the codes, which are in Fortran.

A video about the KTP project been made by the University of Manchester and more information about the project can be obtained from Edvin’s posts at the NAG blog:

We intend to update the catalogue from time to time and welcome notification of errors and omissions.

The Nearest Correlation Matrix

A correlation matrix is a symmetric matrix with unit diagonal and nonnegative eigenvalues. In 2000 I was approached by a London fund management company who wanted to find the nearest correlation matrix (NCM) in the Frobenius norm to an almost correlation matrix: a symmetric matrix having a significant number of (small) negative eigenvalues. This problem arises when the data from which the correlations are constructed is asynchronous or incomplete, or when models are stress-tested by artificially adjusting individual correlations. Solving the NCM problem (or obtaining a true correlation matrix some other way) is important in order to avoid subsequent calculations breaking down due to negative variances or volatilities, for example.

Algorithms

The convexity properties of the problem mean that there is a unique nearest correlation matrix, which is hence a global minimizer. In the 1990s several algorithms had been proposed for computing it, but none was guaranteed to work. Prompted by the approach from the company, I investigated the problem. I proved some results characterizing the solution and derived an alternating projections algorithm for computing it 1. The algorithm repeatedly projects onto the set of matrices with unit diagonal and the cone of symmetric positive semidefinite matrices. It is guaranteed to converge to the minimum, but does so at a linear rate. An important feature of the algorithm is that other projections can be added on. Thus, for example, if we want to leave the trailing principal submatrix of order three unchanged, we simply restore it at the end of each iteration 2, 3.

The alternating projections algorithm is widely used, but can be slow to converge, especially for large matrices 4. In 2006, Qi and Sun 5 derived a Newton method for the NCM problem. They work with the dual of the original problem, which is unconstrained. The objective function of the dual is not twice continuously differentiable, but by using the theory of strongly semismooth matrix functions Qi and Sun show that Newton’s method nevertheless has global quadratic convergence.

Ruediger Borsdorf and I, building on work in his M.Sc. thesis 3, built an algorithm that solves the Newton equations using minres with a Jacobi preconditioner (a nontrivial task since the coefficient matrix is not explicitly available), and has some other refinements described in 6. This algorithm has been implemented in the NAG Library 7.

In subsequent work, Borsdorf, Marcos Raydan and I 8 , 9 used the spectral projected gradient method (SPGM) to solve the k-factor NCM, in which the correlation matrix is constrained to have the form of a diagonal matrix plus a rank-k matrix. This problem variant arises in multifactor normal copula models, collateralized debt obligations (CDOs), and multivariate time series. One existing previous algorithm can fail to converge or solve the problem, but the SPGM has guaranteed convergence to a stationary point. This algorithm has also been implemented in the NAG Library.

The NCM problem has proved to be of very wide interest beyond the world of finance, as indicated by the fact that 1 is now my third best cited paper on the Web of Science. Recent applications in which the problem arises include reconstructing 20th century sea levels, genetic evaluations for thoroughbred horse breeding, modelling public health data sets, modelling storm damage of buildings, and a Kriging model for reservoirs.

Software

I regularly receive emails asking for software implementing algorithms for the NCM problem. I thought it would be useful to summarize what is available. In general, the Newton method is preferred, but the alternating projections method is more flexible as regards incorporating additional constraints.

Original NCM Problem

Alternating Projections Method

Newton Method

k-Factor NCM Problem

A MATLAB Alternating Projections Function

I thought it would be useful to provide my own MATLAB function nearcorr.m implementing the alternating projections algorithm. The listing is below. To see how it compares with the NAG code g02aa.m I ran the test code

%NEARCORR_TEST  Compare g02aa and nearcorr.

rng(10)                                  % Seed random number generators.
n = 100;
A = gallery('randcorr',n);               % Random correlation matrix. 
E  = randn(n)*1e-1;  A = A + (E + E')/2; % Perturb it.
tol = 1e-10;

% A = cor1399; tol = 1e-4;

fprintf('g02aa:\n')
maxits = int64(-1);  % For linear equation solver.
maxit = int64(-1);   % For Newton iteration.
tic
[~,X1,iter1,feval,nrmgrd,ifail] = g02aa(A,'errtol',tol,'maxits',maxits, ...
                                          'maxit',maxit);
toc

fprintf('  Newton steps taken: %d\n', iter1);
fprintf('  Norm of gradient of last Newton step: %6.4f\n', nrmgrd);
if ifail > 0, fprintf('  g02aa failed with ifail = %g\n', ifail), end

fprintf('nearcorr:\n')
tic
[X2,iter2] = nearcorr(A,tol,[],[],[],[],1);
toc
fprintf('  Number of iterations: %d\n', iter2);

fprintf('  Normwise relative difference between computed solutions:')
fprintf('%9.2e\n', norm(X1-X2,1)/norm(X1,1))

Running under Windows 7 on an Ivy Bridge Core i7 processor @4.4Ghz I obtained the following results, where the “real-life” matrix is based on stock data:

Matrix Code Time (secs) Iterations
1. Random (100), tol = 1e-10 g02aa 0.023 4
nearcorr 0.052 15
2. Random (500), tol = 1e-10 g02aa 0.48 4
nearcorr 3.01 26
3. Real-life (1399), tol = 1e-4 g02aa 6.8 5
nearcorr 100.6 68

The results show that while nearcorr can be fast for small dimensions, the number of iterations, and hence its run time, tends to increase with the dimension and it can be many times slower than the Newton method. This is a stark illustration of the difference between quadratic convergence and linear (with problem-dependent constant) convergence. Here is my MATLAB function nearcorr.m.

function [X,iter] = nearcorr(A,tol,flag,maxits,n_pos_eig,w,prnt)
%NEARCORR    Nearest correlation matrix.
%   X = NEARCORR(A,TOL,FLAG,MAXITS,N_POS_EIG,W,PRNT)
%   finds the nearest correlation matrix to the symmetric matrix A.
%   TOL is a convergence tolerance, which defaults to 16*EPS.
%   If using FLAG == 1, TOL must be a 2-vector, with first component
%   the convergence tolerance and second component a tolerance
%   for defining "sufficiently positive" eigenvalues.
%   FLAG = 0: solve using full eigendecomposition (EIG).
%   FLAG = 1: treat as "highly non-positive definite A" and solve
%             using partial eigendecomposition (EIGS).
%   MAXITS is the maximum number of iterations (default 100, but may
%   need to be increased).
%   N_POS_EIG (optional) is the known number of positive eigenvalues of A.
%   W is a vector defining a diagonal weight matrix diag(W).
%   PRNT = 1 for display of intermediate output.

%   By N. J. Higham, 13/6/01, updated 30/1/13, 15/11/14, 07/06/15.
%   Reference:  N. J. Higham, Computing the nearest correlation
%   matrix---A problem from finance. IMA J. Numer. Anal.,
%   22(3):329-343, 2002.

if ~isequal(A,A'), error('A must by symmetric.'), end
if nargin < 2 || isempty(tol), tol = length(A)*eps*[1 1]; end
if nargin < 3 || isempty(flag), flag = 0; end
if nargin < 4 || isempty(maxits), maxits = 100; end
if nargin < 6 || isempty(w), w = ones(length(A),1); end
if nargin < 7, prnt = 1; end

n = length(A);
if flag >= 1
   if nargin < 5 || isempty(n_pos_eig)
      [V,D] = eig(A); d = diag(D);
      n_pos_eig = sum(d >= tol(2)*d(n));
   end
   if prnt, fprintf('n = %g, n_pos_eig = %g\n', n, n_pos_eig), end
end

X = A; Y = A;
iter = 0;
rel_diffX = inf; rel_diffY = inf; rel_diffXY = inf;
dS = zeros(size(A));

w = w(:); Whalf = sqrt(w*w');

while max([rel_diffX rel_diffY rel_diffXY]) > tol(1)

   Xold = X;
   R = Y - dS;
   R_wtd = Whalf.*R;
   if flag == 0
      X = proj_spd(R_wtd);
   elseif flag == 1
      [X,np] = proj_spd_eigs(R_wtd,n_pos_eig,tol(2));
   end
   X = X ./ Whalf;
   dS = X - R;
   Yold = Y;
   Y = proj_unitdiag(X);
   rel_diffX = norm(X-Xold,'fro')/norm(X,'fro');
   rel_diffY = norm(Y-Yold,'fro')/norm(Y,'fro');
   rel_diffXY = norm(Y-X,'fro')/norm(Y,'fro');
   iter = iter + 1;
   if prnt
      fprintf('%2.0f:  %9.2e  %9.2e  %9.2e', ...
               iter, rel_diffX, rel_diffY, rel_diffXY)
      if flag >= 1, fprintf('  np = %g\n',np), else fprintf('\n'), end
   end
   if iter > maxits
       error(['Stopped after ' num2str(maxits) ' its. Try increasing MAXITS.'])
   end

end

%%%%%%%%%%%%%%%%%%%%%%%%
function A = proj_spd(A)
%PROJ_SPD

if ~isequal(A,A'), error('Not symmetric!'), end
[V,D] = eig(A);
A = V*diag(max(diag(D),0))*V';
A = (A+A')/2; % Ensure symmetry.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [A,n_pos_eig_found] = proj_spd_eigs(A,n_pos_eig,tol)
%PROJ_SPD_EIGS

if ~isequal(A,A'), error('Not symmetric!'), end
k = n_pos_eig + 10; % 10 is safety factor.
if k > length(A), k = n_pos_eig; end
opts.disp = 0;
[V,D] = eigs(A,k,'LA',opts); d = diag(D);
j = (d > tol*max(d));
n_pos_eig_found = sum(j);
A = V(:,j)*D(j,j)*V(:,j)';  % Build using only the selected eigenpairs.
A = (A+A')/2; % Ensure symmetry.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function A = proj_unitdiag(A)
%PROJ_SPD

n = length(A);
A(1:n+1:n^2) = 1;

Updates

  • Links updated August 4, 2014.
  • nearcorr.m corrected November 15, 2014: iter was incorrectly initialized (thanks to Mike Croucher for pointing this out).
  • Added link to Mike Croucher’s Python alternating directions code, November 17, 2014.
  • Corrected an error in the convergence test, June 7, 2015. Effect on performance will be minimal (thanks to Nataša Strabić for pointing this out).

Footnotes:

1

Nicholas J. Higham, Computing the Nearest Correlation Matrix—A Problem from Finance, IMA J. Numer. Anal. 22, 329–343, 2002.

2

Craig Lucas, Computing Nearest Covariance and Correlation Matrices, M.Sc. Thesis, University of Manchester, 2001.

3

Ruediger Borsdorf, A Newton Algorithm for the Nearest Correlation Matrix, M.Sc. Thesis, University of Manchester, 2007.

4

Rene Escalante and Marcos Raydan, Alternating Projection Methods, SIAM, 2011.

5

Hou-Duo Qi and Defeng Sun, A Quadratically Convergent Newton Method for Computing the Nearest Correlation Matrix, SIAM J. Matrix Anal. Appl. 28, 360-385, 2006

6

Ruediger Borsdorf and Nicholas J. Higham, A Preconditioned Newton Algorithm for the Nearest Correlation Matrix, IMA J. Numer. Anal. 30, 94-107, 2010.

8

Ruediger Borsdorf, Nicholas Higham and Marcos Raydan, Computing a Nearest Correlation Matrix with Factor Structure, SIAM J. Matrix Anal., Appl. 31, 2603-2622, 2010

9

Ruediger Borsdorf, Structured Matrix Nearness Problems: Theory and Algorithms, Ph.D. Thesis, University of Manchester, 2012.

Trefethen’s Approximation Theory and Approximation Practice

This new 305-page SIAM book by Nick Trefethen presents a modern approach to approximation by polynomials and rational functions. Much of the theory here underlies the Chebfun software package and almost every page of the book contains examples computed using Chebfun.

file://d:/dropbox/org/images/tref12_cover.jpg

The book is certainly a must-read for anyone interested in numerical computation. But the most unusual feature of the book is not immediately obvious: it was entirely produced from 29 MATLAB M-files, one for each chapter. Each M-file contains the book’s text in comment lines intertwined with the MATLAB code that generates the examples and the figures. The book was created by using the MATLAB command publish to generate LaTeX output, which was then run through LaTeX (with a few tweaks for the actual printed book). Nick has made the M-files available at the book’s web page and you can generate the book by running them all through publish.

When I ran publish on one of the M-files it gave a strange error beginning

No method 'createTextNode' with matching signature found for class
'org.apache.xerces.dom.DocumentImpl'.

and I got the same error whatever M-file I tried to publish. This seems to be caused by a clash with some nonstandard M-file on my path, because if I reset the MATLAB path with the matlabrc command (and then add back chebfun to the path) everything works fine.