What Is the Pseudoinverse of a Matrix?

The pseudoinverse is an extension of the concept of the inverse of a nonsingular square matrix to singular matrices and rectangular matrices. It is one of many generalized inverses, but the one most useful in practice as it has a number of special properties.

The pseudoinverse of a matrix A\in\mathbb{C}^{m\times n} is an n \times m matrix X that satisfies the Moore–Penrose conditions

\notag \begin{array}{rcrc}  \mathrm{(1)}   &    AXA = A,  \; & \mathrm{(2)} & XAX=X,\\  \mathrm{(3)} & (AX)^* = AX, \; & \mathrm{(4)} & (XA)^* = XA. \end{array}

Here, the superscript * denotes the conjugate transpose. It can be shown that there is a unique X satisfying these equations. The pseudoinverse is denoted by A^+; some authors write A^\dagger.

The most important property of the pseudoinverse is that for any system of linear equations Ax = b (overdetermined or underdetermined) x = A^+b minimizes \|Ax - b\|_2 and has the minimum 2-norm over all minimizers. In other words, the pseudoinverse provides the minimum 2-norm least squares solution to Ax = b.

The pseudoinverse can be expressed in terms of the singular value decomposition (SVD). If A = U\Sigma V^* is an SVD, where the m\times m matrix U and n\times n matrix V are unitary and \Sigma = \mathrm{diag}(\sigma_1,\dots , \sigma_p) with \sigma_1 \ge \sigma_2 \ge \cdots \ge \sigma_r > \sigma_{r+1} = \cdots =\sigma_p = 0 (so that \mathrm{rank}(A) = r) with p = \min(m,n), then

\notag    A^+ = V\mathrm{diag}(\sigma_1^{-1},\dots,\sigma_r^{-1},0,\dots,0)U^*,    \qquad (1)

where the diagonal matrix is n\times m. This formula gives an easy way to prove many identities satisfied by the pseudoinverse. In MATLAB, the function pinv computes A^+ using this formula.

From the Moore–Penrose conditions or (1) it can be shown that (A^+)^+ = A and (A^*)^+ = (A^+)^*.

For full rank A we have the concise formulas

\notag     A^+ =     \begin{cases}     (A^*A) ^{-1}A^*, & \textrm{if}~\mathrm{rank}(A) = n \le m, \\     A^*(AA^*)^{-1},  & \textrm{if}~ \mathrm{rank}(A) = m \le n.     \end{cases}    \qquad (2)

Consequently,

\notag   A^+A = I_n ~~\mathrm{if}~ \mathrm{rank}(A) = n, \qquad   AA^+ = I_m ~~\mathrm{if}~ \mathrm{rank}(A) = m.

Some special cases are worth noting.

  • The pseudoinverse of a zero m\times n matrix is the zero n\times m matrix.
  • The pseudoinverse of a nonzero vector x\in\mathbb{C}^n is x^*/(x^*x).
  • For x,y\in\mathbb{C}^n, (xy^*)^+ = (y^*)^+ x^+ and if x and y are nonzero then (xy^*)^+ = yx^*/ (x^*x\cdot y^*y).
  • The pseudoinverse of a Jordan block with eigenvalue 0 is the transpose:

\notag       \begin{bmatrix}         0 & 1 & 0 \\         0 & 0 & 1 \\         0 & 0 & 0 \\       \end{bmatrix}^+    =       \begin{bmatrix}         0 & 0 & 0 \\         1 & 0 & 0 \\         0 & 1 & 0         \end{bmatrix}.

The pseudoinverse differs from the usual inverse in various respects. For example, the pseudoinverse of a triangular matrix is not necessarily triangular (here we are using MATLAB with the Symbolic Math Toolbox):

>> A = sym([1 1 1; 0 0 1; 0 0 1]), X = pinv(A)
A =
[1, 1, 1]
[0, 0, 1]
[0, 0, 1]
X =
[1/2, -1/4, -1/4]
[1/2, -1/4, -1/4]
[  0,  1/2,  1/2]

Furthermore, it is not generally true that (AB)^+ = B^+A^+ for A\in\mathbb{C}^{m\times n} and B\in\mathbb{C}^{n\times p}. A sufficient condition for this equality to hold is that \mathrm{rank}(A) = \mathrm{rank}(B) = n.

It is not usually necessary to compute the pseudoinverse, but if it is required it can be obtained using (1) or (2) or from the Newton–Schulz iteration

\notag       X_{k+1} = 2X_k - X_kAX_k, \quad X_0 = \alpha A^*,

for which X_k \to A^+ as k\to\infty if 0 < \alpha < 2/\|A\|_2^2. The convergence is at an asymptotically quadratic rate. However, about 2\log_2\kappa_2(A) iterations are required to reach the asymptotic phase, where \kappa_2(A) = \|A\|_2 \|A^+\|_2, so the iteration is slow to converge when A is ill conditioned.

Notes and References

The pseudoinverse was first introduced by Eliakim Moore in 1920 and was independently discovered by Roger Penrose in 1955. For more on the pseudoinverse see, for example, Ben-Israel and Greville (2003) or Campbell and Meyer (2009). For analysis of the Newton–Schulz iteration see Pan and Schreiber (1991).

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

What Is the Numerical Range of a Matrix?

The numerical range of a matrix A\in\mathbb{C}^{n\times n}, also known as the field of values, is the set of complex numbers

W(A) = \biggl\{\, \displaystyle\frac{z^*Az}{z^*z}: 0\ne z\in\mathbb{C}^n \,\biggr\}.

The set W(A) is compact and convex (a nontrivial property proved by Toeplitz and Hausdorff), and it contains all the eigenvalues of A. For normal matrices it is the convex hull of the eigenvalues. For a Hermitian matrix, W(A) is a segment of the real axis, while for a skew-Hermitian matrix it is a segment of the imaginary axis.

The following figure plots in blue the boundaries of the the numerical ranges of nine 16\times 16 matrices, with the eigenvalues shown as black dots. They were plotted using the function fv in the Matrix Computation Toolbox. The matrices are from Anymatrix.

fov_plots.jpg

A method for computing the boundary of the numerical range is based on the observation that for any z\in\mathbb{C},

\notag  z^*Az = \underbrace{z^* \frac{1}{2}(A+A^*) z}_{\mathrm{\phantom{1234}real\phantom{1234}}}         + \underbrace{z^* \frac{1}{2}(A-A^*) z}_{\mathrm{pure~imaginary}}        \qquad (1)

This implies that the real part of z^*Az/z^*z lies between the largest and smallest eigenvalues of the Hermitian matrix (A+A^*)/2, which define a vertical strip in which the numerical range lies. Since W(\mathrm{e}^{\mathrm{i}\theta} A) = \mathrm{e}^{\mathrm{i}\theta} W(A), we can apply the same reasoning to the rotated matrix A(\theta) = \mathrm{e}^{\mathrm{i}\theta} A, and taking a range of \theta\in[0,\pi] we obtain an approximation the boundary of the numerical range.

The quantity

\notag  \omega(A) = \max \bigl\{\, \mathrm{Re}\, w : w \in W(A) \,\bigr\}

is called the numerical abscissa, and by (1), \omega(A) is the largest eigenvalue of the Hermitian matrix (A+A^*)/2. The numerical abscissa determines the rate of growth of \|\mathrm{e}^{tA}\| for small positive t.

Associated with the numerical range is the numerical radius

\notag      r(A) = \max\{\, |w| : w \in W(A) \,\}.

Note that r(A^*) = r(A). Also, r(A) \ge \rho(A), where \rho is the spectral radius (the largest absolute value of any eigenvalue), since W(A) contains the eigenvalues of A.

The numerical radius differs by at most a factor 2 from the 2-norm:

\notag       \frac{1}{2} \|A\|_2 \le r(A) \le \|A\|_2. \qquad (2)

When A is normal, r(A) = \|A\|_2.

The numerical radius is a matrix norm, but not a consistent norm (that is, r(AB) \le r(A)r(B) does not hold in general). However, it is it true that

\notag     r(A^k) \le r(A)^k, \quad k\ge 1.

Combining this with with the lower bound in (2) gives

\notag  \|A^k\|_2 \le 2 r(A)^k, \quad k \ge 1,

so if we know r(A) then we can bound \|A^k\|_2 for all k.

The numerical radius can be characterized as the solution of an optimization problem over the Hermitian matrices:

\notag      r(A) = - \max\biggl\{\, \lambda_{\min}\biggl(     \begin{bmatrix} Z\!\! & A \\ A^*\!\! & -Z \end{bmatrix} \biggr):                Z=Z^*\in\mathbb{C}^{n\times n} \,\biggr\}.

Notes and References

For proofs of the results given here see Horn and Johnson (2013) or Horn and Johnson (1991), and see the latter for details of the algorithm for computing the numerical range. See Benzi (2020) for a discussion of applications of the numerical range in numerical analysis.

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.