When a nonsingular matrix is perturbed by a matrix of rank , the inverse also undergoes a rank- perturbation. More precisely, if has rank and is nonsingular then the identity shows that
The Sherman–Morrison–Woodbury formula provides an explicit formula for the inverse of the perturbed matrix .
We will begin with the simpler case of a rank- perturbation: , where and are -vectors, and we consider first the case where . We might expect that for some (consider a binomial expansion of the inverse). Multiplying out, we obtain
so the product equals the identity matrix when . The condition that be nonsingular is (as can also be seen from , derived in What Is a Block Matrix?). So
For the general case write . Inverting this equation and applying the previous result gives
subject to the nonsingularity condition . This is known as the Sherman–Morrison formula. It explicitly identifies the rank- change to the inverse.
As an example, if we take and (where is the th column of the identity matrix) then, writing , we have
The Frobenius norm of the change to is
If is sufficiently small then this quantity is approximately maximized for and such that the product of the norms of th column and th row of is maximized. For an upper triangular matrix and are likely to give the maximum, which means that the inverse of an upper triangular matrix is likely to be most sensitive to perturbations in the element of the matrix. To illustrate, we consider the matrix
The element of the following matrix is :
As our analysis suggests, the entry is the most sensitive to perturbation.
Now consider a perturbation , where and are . This perturbation has rank at most , and its rank is if and are both of rank . If is nonsingular then is nonsingular and
which is the Sherman–Morrison–Woodbury formula. The significance of this formula is that is , so if and is known then it is much cheaper to evaluate the right-hand side than to invert directly. In practice, of course, we rarely invert matrices, but rather exploit factorizations of them. If we have an LU factorization of then we can use it in conjunction with the Sherman–Morrison–Woodbury formula to solve in flops, as opposed to the flops required to factorize from scratch.
The Sherman–Morrison–Woodbury formula is straightforward to verify, by showing that the product of the two sides is the identity matrix. How can the formula be derived in the first place? Consider any two matrices and such that and are both defined. The associative law for matrix multiplication gives , or , which can be written as . Postmultiplying by gives
Setting and gives the special case of the Sherman–Morrison–Woodbury formula with , and the general formula follows from .
We will give a different derivation of an even more general formula using block matrices. Consider the block matrix
where is , and are , and is . We will obtain a formula for by looking at .
It is straightforward to verify that
In the block we see the right-hand side of a Sherman–Morrison–Woodbury-like formula, but it is not immediately clear how this relates to . Let , and note that . Then
and applying the above formula (appropriately renaming the blocks) gives, with denoting a block whose value does not matter,
Hence . Equating our two formulas for gives
provided that is nonsingular.
To see one reason why this formula is useful, suppose that the matrix and its perturbation are symmetric and we wish to preserve symmetry in our formulas. The Sherman–Morrison–Woodbury requires us to write the perturbation as , so the perturbation must be positive semidefinite. In , however, we can write an arbitrary symmetric perturbation as , with symmetric but possibly indefinite, and obtain a symmetric formula.
The matrix is the Schur complement of in . Consequently the inversion formula is intimately connected with the theory of Schur complements. By manipulating the block matrices in different ways it is possible to derive variations of . We mention just the simple rewriting
which is valid if is singular, as long as is nonsingular. Note that the formula is not symmetric when and . This variant can also be obtained by replacing by in the Sherman–Morrison–Woodbury formula.
Formulas for the change in a matrix inverse under low rank perturbations have a long history. They have been rediscovered on multiple occasions, sometimes appearing without comment within other formulas. Equation is given by Duncan (1944), which is the earliest appearance in print that I am aware of. For discussions of the history of these formulas see Henderson and Searle (1981) or Puntanen and Styan (2005).
This is a minimal set of references, which contain further useful references within.
- W. J. Duncan, LXXVIII. Some Devices for the Solution of Large Sets of Simultaneous Linear Equations, The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 35, 660–670, 1944.
- H. V. Henderson and S. R. Searle, On Deriving the Inverse of a Sum of Matrices, SIAM Rev. 23(1), 53–60, 1981.
- Simo Puntanen and George Styan, Historical Introduction: Issai Schur and the Early Development of the Schur Complement, pages 1-16 in Fuzhen Zhang, ed., The Schur Complement and Its Applications, Springer-Verlag, New York, 2005.
Related Blog Posts
- What Is a Block Matrix? (2020)
This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.
4 thoughts on “What Is the Sherman–Morrison–Woodbury Formula?”
Right before the section ‘General Formula’, it seems that the expression is missing a parenthesis: A + UV* = A(1 – A)^-1 UV*
Thanks for this nice post!
Thanks – corrected.
I clearly mistaken the factorization of A. Thanks for the correction!
While using Sherman-Morrison-Woodbury for deriving results for exponential and other functions of perturbed matrices for some time, I finally stumbled across your Theorem 1.35 in “N. J. Higham, Functions of Matrices: Theory and Computation, SIAM, 2008.” This theorem is simply: beautiful! I love it! It is a wonderful generalization of Sherman-Morrison-Woodbury and renders my former tedious calculations short and easy ones while lifting the caveats on corner cases. Easy to code, too. Thanks!
So, if you are in need for calculating functions on low-rank perturbed matrices: take a look at Theorem 1.35 🙂