# What Is a Generalized Inverse?

The matrix inverse is defined only for square nonsingular matrices. A generalized inverse is an extension of the concept of inverse that applies to square singular matrices and rectangular matrices. There are many definitions of generalized inverses, all of which reduce to the usual inverse when the matrix is square and nonsingular.

A large class of generalized inverses of an $m \times n$ matrix $A$ can be defined in terms of the Moore–Penrose conditions, in which $X$ is $n\times m$:

$\begin{array}{rcrc} \mathrm{(1)} & AXA = A, \; & \mathrm{(2)} & XAX=X,\\ \mathrm{(3)} & (AX)^* = AX, \; & \mathrm{(4)} & (XA)^* = XA. \end{array}$

Here, the superscript $*$ denotes the conjugate transpose. A 1-inverse is any $X$ satisfying condition (1), a (1,3)-inverse is any $X$ satisfying conditions (1) and (3), and so on for any subset of the four conditions.

Condition (1) implies that if $Ax = b$ then $A(Xb) = A (XAx) = AXAx = Ax = b$, so $Xb$ solves the equation, meaning that any 1-inverse is an equation-solving inverse. Condition (2) implies that $X = 0$ if $A = 0$.

A (1,3) inverse can be shown to provide a least squares solution to an inconsistent linear system. A (1,4) inverse can be shown to provide the minimum 2-norm solution of a consistent linear system (where the 2-norm is defined by $\|x\|_2 = (x^*x)^{1/2}$).

There is not a unique matrix satisfying any one, two, or three of the Moore–Penrose conditions. But there is a unique matrix satisfying all four of the conditions, and it is called the Moore-Penrose pseudoinverse, denoted by $A^+$ or $A^{\dagger}$. For any system of linear equations $Ax = b$, $x = A^+b$ minimizes $\|Ax - b\|_2$ and has the minimum 2–norm over all minimizers.

The pseudoinverse can be expressed in terms of the singular value decomposition (SVD). If $A = U\Sigma V^*$ is an SVD, where the $m\times m$ matrix $U$ and $n\times n$ matrix $V$ are orthogonal, and $\Sigma = \mathrm{diag}(\sigma_1,\dots , \sigma_n)$ with $\sigma_1 \ge \sigma_2 \ge \cdots \ge \sigma_r > \sigma_{r+1} = \cdots =\sigma_n = 0$ (so that $\mathrm{rank}(A) = r$), then

$A^+ = V\mathrm{diag}(\sigma_1^{-1},\dots,\sigma_r^{-1},0,\dots,0)U^*.$

In MATLAB, the function pinv computes $A^+$ using this formula. If $\mathrm{rank}(A) = n$ then the concise formula $A^+ = (A^*A)^{-1}A^*$ holds.

For square matrices, the Drazin inverse is the unique matrix $A^D$ such that

$A^D A A^D = A^D, \quad A A ^D = A^D A, \quad A^{k+1} A^D = A^k,$

where $k = \mathrm{index}(A)$. The first condition is the same as the second of the Moore–Penrose conditions, but the second and third have a different flavour. The index of a matrix of $A$ is the smallest nonnegative integer $k$ such that $\mathrm{rank}(A^k) = \mathrm{rank}(A^{k+1})$; it is characterized as the dimension of the largest Jordan block of $A$ with eigenvalue zero.

If $\mathrm{index}(A)=1$ then $A^D$ is also known as the group inverse of $A$ and is denoted by $A^\#$. The Drazin inverse is an equation-solving inverse precisely when $\mathrm{index}(A)\le 1$, for then $AA^DA=A$, which is the first of the Moore–Penrose conditions.

The Drazin inverse can be represented explicitly as follows. If

$A = P \begin{bmatrix} B & 0 \\ 0 & N \end{bmatrix} P^{-1},$

where $P$ and $B$ are nonsingular and $N$ has only zero eigenvalues, then

$A^D = P \begin{bmatrix} B^{-1} & 0 \\ 0 & 0 \end{bmatrix} P^{-1}.$

Here is the pseudoinverse and the Drazin inverse for a particular matrix with index $2$:

$A = \left[\begin{array}{rrr} 1 & -1 & -1\\[3pt] 0 & 0 & -1\\[3pt] 0 & 0 & 0 \end{array}\right], \quad A^+ = \left[\begin{array}{rrr} \frac{1}{2} & -\frac{1}{2} & 0\\[3pt] -\frac{1}{2} & \frac{1}{2} & 0\\[3pt] 0 & -1 & 0 \end{array}\right], \quad A^D = \left[\begin{array}{rrr} 1 & -1 & 0\\[3pt] 0 & 0 & 0\\[3pt] 0 & 0 & 0 \end{array}\right].$

## Applications

The Moore–Penrose pseudoinverse is intimately connected with orthogonality, whereas the Drazin inverse has spectral properties related to those of the original matrix. The pseudoinverse occurs in all kinds of least squares problems. Applications of the Drazin inverse include population modelling, Markov chains, and singular systems of linear differential equations. It is not usually necessary to compute generalized inverses, but they are valuable theoretical tools.

## References

This is a minimal set of references, which contain further useful references within.

## 3 thoughts on “What Is a Generalized Inverse?”

1. Cleve Moler says:

Beautiful “What is” about the pseudoinverse. Well written, concise, accurate. It is an art to write an article about anything — a news story, for example — that is both concise and accurate.

I have one quibble. As you probably know, I am interested in mathematical typography. I see that you are not using MathJax or MathML, Why not? You don’t have mathematics, you have pictures of mathematics. All of the math, even the A’s and X’s, are little .png files. The inline math doesn’t have the proper baseline. The displayed math looks OK in my browser, but it is pixelated when you print it or enlarge it. I know you are interested in this topic as well. How have you decided to use whatever mathematical typesetting you are using?

Should I submit this, minus this question, as a comment.

I hope you and your family are well,

— Cleve

2. Thanks, Cleve. I’m using WordPress.com and it doesn’t support MathJax, apparently because it regards the required Javascript as a security risk. If I were to host my own WordPress installation (“WordPress.org”) I could install a MathJax plugin. I prefer to stick with WordPress.com for its ease of use. The relatively poor typesetting of math is why I’m making every “What Is” post available as a PDF file (see the end of each post). I actually write the posts in Emacs Org mode and export them to both WordPress (using Org2blog) and LaTeX.