What Is the CS Decomposition?

The CS (cosine-sine) decomposition reveals close relationships between the singular value decompositions (SVDs) of the blocks an orthogonal matrix expressed in block 2\times 2 form. In full generality, it applies when the diagonal blocks are not necessarily square. We focus here mainly on the most practically important case of square diagonal blocks.

Let Q\in\mathbb{R}^{n \times n} be orthogonal and suppose that n = 2p is even and Q is partitioned into four equally sized blocks:

\notag    Q =    \begin{array}[b]{@{\mskip35mu}c@{\mskip-20mu}c@{\mskip-10mu}c@{}}    \scriptstyle p &    \scriptstyle p &    \\    \multicolumn{2}{c}{        \left[\begin{array}{c@{~}c@{~}}                  Q_{11}& Q_{12} \\                  Q_{21}& Q_{22} \\              \end{array}\right]}    & \mskip-12mu\          \begin{array}{c}              \scriptstyle p \\              \scriptstyle p              \end{array}    \end{array}.

Then there exist orthogonal matrices U_1,U_2,V_1,V_2\in\mathbb{R}^{p \times p} such that

\notag    \begin{bmatrix}  U_1^T & 0\\                          0   & U_2^T    \end{bmatrix}    \begin{bmatrix}  Q_{11} & Q_{12}\\                          Q_{21} & Q_{22}    \end{bmatrix}    \begin{bmatrix}  V_1 & 0\\                          0   & V_2    \end{bmatrix}    =    \begin{array}[b]{@{\mskip36mu}c@{\mskip-13mu}c@{\mskip-10mu}c@{}}    \scriptstyle p &    \scriptstyle p &    \\    \multicolumn{2}{c}{        \left[\begin{array}{@{\mskip3mu}rr@{~}}                  C &    S         \\                 -S &    C              \end{array}\right]}    & \mskip-12mu\          \begin{array}{c}              \scriptstyle p \\              \scriptstyle p              \end{array}    \end{array},

where C = \mathrm{diag}(c_i) and S = \mathrm{diag}(s_i) with c_i \ge 0, s_i \ge 0, and c_i^2 + s_i^2 = 1 for all i. This CS decomposition comprises four SVDs:

\notag   \begin{alignedat}{2}   Q_{11} &= U_1CV_1^T,   &\quad Q_{12} &= U_1 S V_2^T, \\   Q_{21} &= U_2 (-S) V_1^T,   &\quad Q_{22} &= U_2C V_2^T.   \end{alignedat}

(Strictly speaking, for Q_{21} we need to move the minus sign from S to U_2 or V_1 to obtain an SVD.) The orthogonality ensures that there are only four different singular vector matrices instead of eight, and it makes the singular values of the blocks closely linked. We also obtain SVDs of four cross products of the blocks: Q_{11}^TQ_{12} = V_1^T CS V_2^T, etc.

Note that for p = 1, the CS decomposition reduces to the fact that any 2\times 2 orthogonal matrix is of the form \left[\begin{smallmatrix} c & s \\ -s & c \end{smallmatrix}\right] (a rotation ) up to multiplication of a row or column by -1.

A consequence of the decomposition is that Q_{11} and Q_{22} have the same 2-norms and Frobenius norms, as do their inverses if they are nonsingular. The same is true for Q_{12} and Q_{21}.

Now we drop the requirement that n is even and consider diagonal blocks of different sizes:

\notag    Q =    \begin{array}[b]{@{\mskip33mu}c@{\mskip-16mu}c@{\mskip-10mu}c@{}}    \scriptstyle p &    \scriptstyle n-p &    \\    \multicolumn{2}{c}{        \left[\begin{array}{c@{~}c@{~}}                  Q_{11}& Q_{12} \\                  Q_{21}& Q_{22} \\              \end{array}\right]}    & \mskip-12mu\          \begin{array}{c}              \scriptstyle p \\              \scriptstyle n-p              \end{array}    \end{array}, \quad p \le \displaystyle\frac{n}{2}.

The CS decomposition now has the form

\notag    \begin{bmatrix}  U_1^T & 0\\                          0   & U_2^T    \end{bmatrix}    \begin{bmatrix}  Q_{11} & Q_{12}\\                          Q_{21} & Q_{22}    \end{bmatrix}    \begin{bmatrix}  V_1 & 0\\                          0   & V_2    \end{bmatrix}    =    \begin{array}[b]{@{\mskip35mu}c@{\mskip30mu}c@{\mskip-10mu}c@{}c}    \scriptstyle p &    \scriptstyle p &    \scriptstyle n-2p &    \\    \multicolumn{3}{c}{    \left[\begin{array}{c@{~}|c@{~}c}    C &    S      & 0   \\    \hline   -S &    C      & 0   \\    0 &    0      & I_{n-2p}    \end{array}\right]}    & \mskip-12mu    \begin{array}{c}    \scriptstyle p \\    \scriptstyle p \\    \scriptstyle n-2p    \end{array}    \end{array},

with U_1, U_2, C, and S, and V_1 and V_2 (both now (n-p) \times )n-p)), having the same properties as before. The new feature for p < n/2 is the identity matrix in the bottom right-hand corner on the right-hand side. Here is an example with p = 2 and n=5, with elements shown to two decimal places:

\notag \begin{aligned} \left[\begin{array}{rr|rrr}  0.71  & -0.71  &  0     &  0     &  0     \\ -0.71  & -0.71  &  0     &  0     &  0     \\\hline  0     &  0     &  0.17  &  0.61  & -0.78  \\  0     &  0     & -0.58  & -0.58  & -0.58  \\  0     &  0     & -0.80  &  0.54  &  0.25  \\  \end{array}\right] \left[\begin{array}{rr|rrr} -0.60  & -0.40  & -0.40  & -0.40  & -0.40  \\  0.40  &  0.60  & -0.40  & -0.40  & -0.40  \\\hline  0.40  & -0.40  &  0.60  & -0.40  & -0.40  \\  0.40  & -0.40  & -0.40  &  0.60  & -0.40  \\  0.40  & -0.40  & -0.40  & -0.40  &  0.60  \\ \end{array}\right] \\ \times \left[\begin{array}{rr|rrr} -0.71  &  0.71  &  0     &  0     &  0     \\ -0.71  & -0.71  &  0     &  0     &  0     \\\hline  0     &  0     &  0.17  &  0.58  & -0.80  \\  0     &  0     &  0.61  &  0.58  &  0.54  \\  0     &  0     & -0.78  &  0.58  &  0.25  \\ \end{array}\right] = \left[\begin{array}{rr|rrr}  1.00  &  0     &  0     &  0     &  0     \\  0     &  0.20  &  0     &  0.98  &  0     \\\hline  0     &  0     &  1.00  &  0     &  0     \\  0     & -0.98  &  0     &  0.20  &  0     \\  0     &  0     &  0     &  0     &  1.00  \\ \end{array}\right]. \end{aligned}

We mention two interesting consequences of the CS decomposition.

  • With p=1: if q_{11} = 0 then Q_{22} is singular.
  • For unequally sized diagonal blocks it is no longer always true that Q_{11} and Q_{22} have the same norms, but their inverses do: \|Q_{11}^{-1}\|_2 = \|Q_{22}^{-1}\|_2 = 1/\min_ic_i \ge 1. When p = 1, this relation becomes \|Q_{22}^{-1}\|_2 = 1/|q_{11}|.

The CS decomposition also exists for a rectangular matrix with orthonormal columns,

\notag    Q =    \begin{array}[b]{@{\mskip-25mu}c@{\mskip-20mu}c@{}}    \scriptstyle n    \\    \multicolumn{1}{c}{        \left[\begin{array}{@{}c@{}}                  Q_{1}\\                  Q_{2}              \end{array}\right]}    & \mskip-12mu\          \begin{array}{c}              \scriptstyle p \\              \scriptstyle q          \end{array}    \end{array}, \quad p\ge n, \quad q \ge n.

Now the decomposition takes the form

\notag    \begin{bmatrix}  U_1^T & 0\\                          0   & U_2^T    \end{bmatrix}    \begin{bmatrix}  Q_{1}\\                     Q_{2}    \end{bmatrix}    V    =    \begin{array}[b]{@{\mskip-25mu}c@{\mskip-20mu}c@{}}    \scriptstyle n    \\    \multicolumn{1}{c}{        \left[\begin{array}{c@{~}}                  C\\                  S              \end{array}\right]}    & \mskip-12mu\          \begin{array}{c}              \scriptstyle p \\              \scriptstyle q          \end{array}    \end{array},

where U_1\in\mathbb{R}^{p\times p}, U_2\in\mathbb{R}^{q\times q}, and V\in\mathbb{R}^{n\times n} are orthogonal and C and S have the same form as before except that they are rectangular.

The most general form of the CS decomposition is for an orthogonal matrix with diagonal blocks that are not square. Now the matrix on the right-hand side has a more complicated block structure (see the references for details).

The CS decomposition arises in measuring angles and distances between subspaces. These are defined in terms of the orthogonal projectors onto the subspaces, so singular values of orthonormal matrices naturally arise.

Software for computing the CS decomposition is available in LAPACK, based on an algorithm of Sutton (2009). We used a MATLAB interface to it, available on MathWorks File Exchange, for the numerical example. Note that the output of this code is not quite in the form in which we have presented the decomposition, so some post-processing is required to achieve it.

References

This is a minimal set of references, which contain further useful references within.

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

What’s New in MATLAB R2020a and R2020b?

In this post I discuss new features in MATLAB R2020a and R2020b. As usual in this series, I focus on a few of the features most relevant to my work. See the release notes for a detailed list of the many changes in MATLAB and its toolboxes.

Exportgraphics (R2020a)

The exportgraphics function is very useful for saving to a file a tightly cropped version of a figure with the border white instead of gray. Simple usages are

exportgraphics(gca,'image.pdf')
exportgraphics(gca,'image.jpg','Resolution',200)

I have previously used the export_fig function, which is not built into MATLAB but is available from File Exchange; I think I will be using exportgraphics instead from now on.

Svdsketch (R2020b)

The new svdsketch function computes the singular value decomposition (SVD) USV^T of a low rank approximation to a matrix (U and V orthogonal, S diagonal with nonnegative diagonal entries). It is mainly intended for use with matrices that are close to having low rank, as is the case in various applications.

This function uses a randomized algorithm that computes a sketch of the given m-by-n matrix A, which is essentially a product Q^TA, where Q is an orthonormal basis for the product A\Omega, where \Omega is a random n-by-k matrix. The value of k is chosen automatically to achieve \|USV^T-A\|_F \le \mathrm{tol}\|A\|_F, where \mathrm{tol} is a tolerance that defaults to \epsilon^{1/4} and must not be less than \epsilon^{1/2}, where \epsilon is the machine epsilon (2\times 10^{-16} for double precision). The algorithm includes a power method iteration that refines the sketch before computing the SVD.

The output of the function is an SVD in which U and V are numerically orthogonal and the singular values in S of size \mathrm{tol} or larger are good approximations to singular values of A, but smaller singular values in S may not be good approximations to singular values of A.

Here is an example. The code

n = 8; rng(1); 8; A = gallery('randsvd',n,1e8,3);
[U,S,V] = svdsketch(A,1e-3);
rel_res = norm(A-U*S*V')/norm(A)
singular_values = [svd(A) [diag(S); zeros(n-length(S),1)]]

produces the following output, with the exact singular values in the first column and the approximate ones in the second column:

rel_res =
   1.9308e-06
singular_values =
   1.0000e+00   1.0000e+00
   7.1969e-02   7.1969e-02
   5.1795e-03   5.1795e-03
   3.7276e-04   3.7276e-04
   2.6827e-05   2.6827e-05
   1.9307e-06            0
   1.3895e-07            0
   1.0000e-08            0

The approximate singular values are correct down to around 10^{-5}, which is more than the 10^{-3} requested. This is a difficult matrix for svdsketch because there is no clear gap in the singular values of A.

Axis Padding (R2020b)

The padding property of an axis puts some padding between the axis limits and the surrounding box. The code

x = linspace(0,2*pi,50); plot(x,tan(x),'linewidth',1.4)
title('Original axis')
axis padded, title('Padded axis')

produces the output

padded_1a.jpg

padded_2a.jpg

Turbo Colormap (2020b)

The default colormap changed from jet (the rainbow color map) to parula in R2014b (with a tweak in R2017a), because parula is more perceptually uniform and maintains information when printed in monochrome. The new turbo colormap is a more perceptually uniform version of jet, as these examples show. Notice that turbo has a longer transition through the greens and yellows. If you can’t give up on jet, use turbo instead.

Turbo:

turbo.jpg

Jet:

colormap_jet.jpg

Parula:

colormap_parula.jpg

ND Arrays (R2020b)

The new pagemtimes function performs matrix multiplication on pages of n-dimensional arrays, while pagetranspose and pagectranspose carry out the transpose and conjugate transpose, respectively, on pages of n-dimensional arrays.

Performance

Both releases report significantly improved speed of certain functions, including some of the ODE solvers.

What Is the Singular Value Decomposition?

A singular value decomposition (SVD) of a matrix A\in\mathbb{R}^{m\times n} is a factorization

\notag     A = U\Sigma V^T,

where U\in\mathbb{R}^{m\times m} and V\in\mathbb{R}^{n\times n} are orthogonal, \Sigma = \mathrm{diag}(\sigma_1,\dots, \sigma_p)\in\mathbb{R}^{m\times n}, where p = \min(m,n), and \sigma_1\ge \sigma_2\ge \cdots \ge \sigma_p \ge 0.

Partition U =[ u_1,\dots,u_m] and V = [v_1,\dots, v_n]. The \sigma_i are called the singular values of A and the u_i and v_i are the left and right singular vectors. We have Av_i = \sigma_i u_i, i = 1 \colon p. The matrix \Sigma is unique but U and V are not. The form of \Sigma is

\notag  \Sigma =       \left[\begin{array}{ccc}\sigma_1&&\\ &\ddots&\\& &\sigma_n\\\hline         &\rule{0cm}{15pt} \text{\Large 0} &      \end{array}\right]      \mathrm{for}~ m \ge n, \quad    \Sigma =     \begin{bmatrix}         \begin{array}{ccc|c@{\mskip5mu}}\sigma_1&&\\ &\ddots&               & \text{\Large 0}           \\& &\sigma_m\end{array}\\      \end{bmatrix} \mathrm{for}~ m \le n

Here is an example, in which the entries of A have been specially chosen to give simple forms for the elements of the factors:

\notag A = \left[\begin{array}{rr} 0 & \frac{4}{3}\\[\smallskipamount]                        -1 & -\frac{5}{3}\\[\smallskipamount]                        -2 & -\frac{2}{3} \end{array}\right] = \underbrace{ \displaystyle\frac{1}{3} \left[\begin{array}{rrr} 1 & -2 & -2\\ -2 & 1 & -2\\ -2 & -2 & 1 \end{array}\right] }_U \mskip5mu \underbrace{ \left[\begin{array}{cc} 2\,\sqrt{2} & 0\\ 0 & \sqrt{2}\\ 0 & 0 \end{array}\right] }_{\Sigma} \mskip5mu \underbrace{ \displaystyle\frac{1}{\sqrt{2}} \left[\begin{array}{cc} 1 & 1\\ 1 & -1  \end{array}\right] }_{V^T}.

The power of the SVD is that it reveals a great deal of useful information about norms, rank, and subspaces of a matrix and it enables many problems to be reduced to a trivial form.

Since U and V are nonsingular, \mathrm{rank}(A) = \mathrm{rank}(\Sigma) = r, where r \le p is the number of nonzero singular values. Since the 2-norm and Frobenius norm are invariant under orthogonal transformations, \|A\| = \|\Sigma\| for both norms, giving

\notag   \|A\|_2 = \sigma_1, \quad   \|A\|_F = \Bigl(\displaystyle\sum_{i=1}^r \sigma_i^2\Bigr)^{1/2},

and hence \|A\|_2 \le \|A\|_F \le r^{1/2} \|A\|_2. The range space and null space of A are given in terms of the columns of U and V by

\notag \begin{aligned} \mathrm{null}(A)  &= \mathrm{span} \{ v_{r+1}, \dots,v_n \},\\ \mathrm{range}(A) &= \mathrm{span} \{u_1,u_2,\dots, u_r\}. \end{aligned}

We can write the SVD as

\notag \qquad\qquad     A = \begin{bmatrix} u_1, u_2 \dots, u_r \end{bmatrix}         \mathrm{diag}(\sigma_1,\dots, \sigma_r)        \begin{bmatrix} v_1^T\\ v_2^T\\ \vdots\\ v_r^T \end{bmatrix}     = \displaystyle\sum_{i=1}^{r} \sigma_i u_i v_i^T,  \qquad\qquad(*)

which expresses A as a sum of r rank-1 matrices, the ith of which has 2-norm \sigma_i. The famous Eckart–Young theorem (1936) says that

\notag  \min_{\mathrm{rank}(B) = k} \|A-B\|_q =  \begin{cases}      \sigma_{k+1},                                & q = 2,  \\      \Bigl(\sum_{i=k+1}^r \sigma_i^2\Bigr)^{1/2}, & q = F,   \end{cases}

and that the minimum is attained at

\notag    A_k = U D_k V^T, \quad    D_k = \mathrm{diag}(\sigma_1, \dots, \sigma_k, 0, \dots, 0).

In other words, truncating the sum (*) after k < r terms gives the best rank-k approximation to A in both the 2-norm and the Frobenius norm. In particular, this result implies that when A has full rank the distance from A to the nearest rank-deficient matrix is \sigma_r.

Relations with Symmetric Eigenvalue Problem

The SVD is not directly related to the eigenvalues and eigenvectors of A. However, for m\ge n, A = U \Sigma V^T implies

\notag  A^T\!A = V \mathrm{diag}(\sigma_1^2,\dots,\sigma_n^2) V^T, \quad  AA^T = U \mathrm{diag}(\sigma_1^2,\dots,\sigma_n^2,\underbrace{0,\dots,0}_{m-n}) U^T,

so the singular values of A are the square roots of the eigenvalues of the symmetric positive semidefinite matrices A^T\!A and AA^T (modulo m-n zeros in the latter case), and the singular vectors are eigenvectors. Moreover, the eigenvalues of the (m+n)\times (m+n) matrix

\notag     C = \begin{bmatrix}               0 & A \\               A^T & 0         \end{bmatrix}

are plus and minus the singular values of A, together with |m-n| additional zeros if m \ne n, and the eigenvectors of C and the singular vectors of A are also related.

Consequently, by applying results or algorithms for the eigensystem of a symmetric matrix to A^T\!A, AA^T, or C one obtains results or algorithms for the singular value decomposition of A.

Connections with Other Problems

The pseudoinverse of a matrix A\in\mathbb{R}^{n\times n} can be expressed in terms of the SVD as

\notag    A^+ = V\mathrm{diag}(\sigma_1^{-1},\dots,\sigma_r^{-1},0,\dots,0)U^T.

The least squares problem \min_x \|b - Ax\|_2, where A\in\mathbb{R}^{m\times n} with m \ge n is solved by x = A^+b, and when A is rank-deficient this is the solution of minimum 2-norm. For m < n this is an underdetermined system and x = A^+b gives the minimum 2-norm solution.

We can write A = U\Sigma V^T = UV^T \cdot V \Sigma V^T \equiv PQ, where P is orthogonal and Q is symmetric positive semidefinite. This decomposition A = PQ is the polar decomposition and Q = (A^T\!A)^{1/2} is unique. This connection between the SVD and the polar decomposition is useful both theoretically and computationally.

Applications

The SVD is used in a very wide variety of applications—too many and varied to attempt to summarize here. We just mention two.

The SVD can be used to help identify to which letters vowels and consonants have been mapped in a substitution cipher (Moler and Morrison, 1983).

An inverse use of the SVD is to construct test matrices by forming a diagonal matrix of singular values from some distribution then pre- and post-multiplying by random orthogonal matrices. The result is matrices with known singular values and 2-norm condition number that are nevertheless random. Such “randsvd” matrices are widely used to test algorithms in numerical linear algebra.

History and Computation

The SVD was introduced independently by Beltrami in 1873 and Jordan in 1874. Golub popularized the SVD as an essential computational tool and developed the first reliable algorithms for computing it. The Golub–Reinsch algorithm, dating from the late 1960s and based on bidiagonalization and the QR algorithm, is the standard way to compute the SVD. Various alternatives are available; see the references.

References

This is a minimal set of references, which contain further useful references within.

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

What Is the Complex Step Approximation?

In many situations we need to evaluate the derivative of a function but we do not have an explicit formula for the derivative. The complex step approximation approximates the derivative (and the function value itself) from a single function evaluation. The catch is that it involves complex arithmetic.

For an analytic function f we have the Taylor expansion

\notag     \qquad\qquad\qquad\qquad  f(x + \mathrm{i}h) = f(x) + \mathrm{i}h f'(x) - h^2\displaystyle\frac{f''(x)}{2}                             + O(h^3),    \qquad\qquad\qquad\qquad(*)

where \mathrm{i} = \sqrt{-1} is the imaginary unit. Assume that f maps the real line to the real line and that x and h are real. Then equating real and imaginary parts in (*) gives \mathrm{Re} f(x+\mathrm{i}h) = f(x) + O(h^2) and \mathrm{Im} f(x+\mathrm{i}h) = hf'(x) + O(h^3). This means that for small h, the approximations

\notag     f(x) \approx \mathrm{Re} f(x+\mathrm{i}h), \quad     f'(x) \approx \mathrm{Im}  \displaystyle\frac{f(x+\mathrm{i}h)}{h}

both have error O(h^2). So a single evaluation of f at a complex argument gives, for small h, a good approximation to f'(x), as well as a good approximation to f(x) if we need it.

The usual way to approximate derivatives is with finite differences, for example by the forward difference approximation

\notag       f'(x) \approx \displaystyle\frac{f(x+h) - f(x)}{h}.

This approximation has error O(h) so it is less accurate than the complex step approximation for a given h, but more importantly it is prone to numerical cancellation. For small h, f(x+h) and f(x) agree to many significant digits and so in floating-point arithmetic the difference approximation suffers a loss of significant digits. Consequently, as h decreases the error in the computed approximation eventually starts to increase. As numerical analysis textbooks explain, the optimal choice of h that balances truncation error and rounding errors is approximately

\notag   h_{\mathrm{opt}} = 2\Bigl|\displaystyle\frac{u f(x)}{f''(x))} \Bigr|^{1/2},

where u is the unit roundoff. The optimal error is therefore of order u^{1/2}.

A simple example illustrate these ideas. For the function f(x) = \mathrm{e}^x with x = 1, we plot in the figure below the relative error for the finite difference, in blue, and the relative error for the complex step approximation, in orange, for h ranging from about 10^{-5} to 10^{-11}. The dotted lines show u and u^{1/2}. The computations are in double precision (u \approx 1.1\times 10^{-16}). The finite difference error decreases with h until it reaches about h_{\mathrm{opt}} = 2.1\times 10^{-8}; thereafter the error grows, giving the characteristic V-shaped error curve. The complex step error decreases steadily until it is of order u for h \approx u^{1/2}, and for each h it is about the square of the finite difference error, as expected from the theory.

cstep_ex.jpg

Remarkably, one can take h extremely small in the complex step approximation (e.g., h = 10^{-100}) without any ill effects from roundoff.

The complex step approximation carries out a form of approximate automatic differentiation, with the variable h functioning like a symbolic variable that propagates through the computations in the imaginary parts.

The complex step approximation applies to gradient vectors and it can be extended to matrix functions. If f is analytic and maps real n\times n matrices to real n\times n matrices and A and E are real then (Al-Mohy and Higham, 2010)

\notag     L_f(A,E) \approx \mathrm{Im} \displaystyle\frac{f(A+\mathrm{i}hE)}{h},

where L_f(A,E) is the Fréchet derivative of f at A in the direction E. It is important to note that the method used to evaluate f must not itself use complex arithmetic (as methods based on the Schur decomposition do); if it does, then the interaction of those complex terms with the much smaller \mathrm{i}hE term can lead to damaging subtractive cancellation.

The complex step approximation has also been extended to higher derivatives by using “different imaginary units” in different components (Lantoine et al., 2012).

Here are some applications where the complex step approximation has been used.

  • Sensitivity analysis in engineering applications (Giles et al., 2003).
  • Approximating gradients in deep learning (Goodfellow et al., 2016).
  • Approximating the exponential of an operator in option pricing (Ackerer and Filipović, 2019).

Software has been developed for automatically carrying out the complex step method—for example, by Shampine (2007).

The complex step approximation has been rediscovered many times. The earliest published appearance that we are aware of is in a paper by Squire and Trapp (1998), who acknowledge earlier work of Lyness and Moler on the use of complex variables to approximate derivatives.

References

This is a minimal set of references, which contain further useful references within.

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.