## What Is the Singular Value Decomposition?

A singular value decomposition (SVD) of a matrix $A\in\mathbb{R}^{m\times n}$ is a factorization $\notag A = U\Sigma V^T,$

where $U\in\mathbb{R}^{m\times m}$ and $V\in\mathbb{R}^{n\times n}$ are orthogonal, $\Sigma = \mathrm{diag}(\sigma_1,\dots, \sigma_p)\in\mathbb{R}^{m\times n}$, where $p = \min(m,n)$, and $\sigma_1\ge \sigma_2\ge \cdots \ge \sigma_p \ge 0$.

Partition $U =[ u_1,\dots,u_m]$ and $V = [v_1,\dots, v_n]$. The $\sigma_i$ are called the singular values of $A$ and the $u_i$ and $v_i$ are the left and right singular vectors. We have $Av_i = \sigma_i u_i$, $i = 1 \colon p$. The matrix $\Sigma$ is unique but $U$ and $V$ are not. The form of $\Sigma$ is $\notag \Sigma = \left[\begin{array}{ccc}\sigma_1&&\\ &\ddots&\\& &\sigma_n\\\hline &\rule{0cm}{15pt} \text{\Large 0} & \end{array}\right] \mathrm{for}~ m \ge n, \quad \Sigma = \begin{bmatrix} \begin{array}{ccc|c@{\mskip5mu}}\sigma_1&&\\ &\ddots& & \text{\Large 0} \\& &\sigma_m\end{array}\\ \end{bmatrix} \mathrm{for}~ m \le n$

Here is an example, in which the entries of $A$ have been specially chosen to give simple forms for the elements of the factors: $\notag A = \left[\begin{array}{rr} 0 & \frac{4}{3}\\[\smallskipamount] -1 & -\frac{5}{3}\\[\smallskipamount] -2 & -\frac{2}{3} \end{array}\right] = \underbrace{ \displaystyle\frac{1}{3} \left[\begin{array}{rrr} 1 & -2 & -2\\ -2 & 1 & -2\\ -2 & -2 & 1 \end{array}\right] }_U \mskip5mu \underbrace{ \left[\begin{array}{cc} 2\,\sqrt{2} & 0\\ 0 & \sqrt{2}\\ 0 & 0 \end{array}\right] }_{\Sigma} \mskip5mu \underbrace{ \displaystyle\frac{1}{\sqrt{2}} \left[\begin{array}{cc} 1 & 1\\ 1 & -1 \end{array}\right] }_{V^T}.$

The power of the SVD is that it reveals a great deal of useful information about norms, rank, and subspaces of a matrix and it enables many problems to be reduced to a trivial form.

Since $U$ and $V$ are nonsingular, $\mathrm{rank}(A) = \mathrm{rank}(\Sigma) = r$, where $r \le p$ is the number of nonzero singular values. Since the $2$-norm and Frobenius norm are invariant under orthogonal transformations, $\|A\| = \|\Sigma\|$ for both norms, giving $\notag \|A\|_2 = \sigma_1, \quad \|A\|_F = \Bigl(\displaystyle\sum_{i=1}^r \sigma_i^2\Bigr)^{1/2},$

and hence $\|A\|_2 \le \|A\|_F \le r^{1/2} \|A\|_2$. The range space and null space of $A$ are given in terms of the columns of $U$ and $V$ by \notag \begin{aligned} \mathrm{null}(A) &= \mathrm{span} \{ v_{r+1}, \dots,v_n \},\\ \mathrm{range}(A) &= \mathrm{span} \{u_1,u_2,\dots, u_r\}. \end{aligned}

We can write the SVD as $\notag \qquad\qquad A = \begin{bmatrix} u_1, u_2 \dots, u_r \end{bmatrix} \mathrm{diag}(\sigma_1,\dots, \sigma_r) \begin{bmatrix} v_1^T\\ v_2^T\\ \vdots\\ v_r^T \end{bmatrix} = \displaystyle\sum_{i=1}^{r} \sigma_i u_i v_i^T, \qquad\qquad(*)$

which expresses $A$ as a sum of $r$ rank- $1$ matrices, the $i$th of which has $2$-norm $\sigma_i$. The famous Eckart–Young theorem (1936) says that $\notag \min_{\mathrm{rank}(B) = k} \|A-B\|_q = \begin{cases} \sigma_{k+1}, & q = 2, \\ \Bigl(\sum_{i=k+1}^r \sigma_i^2\Bigr)^{1/2}, & q = F, \end{cases}$

and that the minimum is attained at $\notag A_k = U D_k V^T, \quad D_k = \mathrm{diag}(\sigma_1, \dots, \sigma_k, 0, \dots, 0).$

In other words, truncating the this sum $(*)$ after $k < r$ terms gives the best rank- $k$ approximation to $A$ in both the $2$-norm and the Frobenius norm. In particular, this result implies that when $A$ has full rank the distance from $A$ to the nearest rank-deficient matrix is $\sigma_r$.

## Relations with Symmetric Eigenvalue Problem

The SVD is not directly related to the eigenvalues and eigenvectors of $A$. However, for $m\ge n$, $A = U \Sigma V^T$ implies $\notag A^T\!A = V \mathrm{diag}(\sigma_1^2,\dots,\sigma_n^2) V^T, \quad AA^T = U \mathrm{diag}(\sigma_1^2,\dots,\sigma_n^2,\underbrace{0,\dots,0}_{m-n}) U^T,$

so the singular values of $A$ are the square roots of the eigenvalues of the symmetric positive semidefinite matrices $A^T\!A$ and $AA^T$ (modulo $m-n$ zeros in the latter case), and the singular vectors are eigenvectors. Moreover, the eigenvalues of the $(m+n)\times (m+n)$ matrix $\notag C = \begin{bmatrix} 0 & A \\ A^T & 0 \end{bmatrix}$

are plus and minus the singular values of $A$, together with $|m-n|$ additional zeros if $m \ne n$, and the eigenvectors of $C$ and the singular vectors of $A$ are also related.

Consequently, by applying results or algorithms for the eigensystem of a symmetric matrix to $A^T\!A$, $AA^T$, or $C$ one obtains results or algorithms for the singular value decomposition of $A$.

## Connections with Other Problems

The pseudoinverse of a matrix $A\in\mathbb{R}^{n\times n}$ can be expressed in terms of the SVD as $\notag A^+ = V\mathrm{diag}(\sigma_1^{-1},\dots,\sigma_r^{-1},0,\dots,0)U^T.$

The least squares problem $\min_x \|b - Ax\|_2$, where $A\in\mathbb{R}^{m\times n}$ with $m \ge n$ is solved by $x = A^+b$, and when $A$ is rank-deficient this is the solution of minimum $2$-norm. For $m < n$ this is an underdetermined system and $x = A^+b$ gives the minimum 2-norm solution.

We can write $A = U\Sigma V^T = UV^T \cdot V \Sigma V^T \equiv PQ$, where $P$ is orthogonal and $Q$ is symmetric positive semidefinite. This decomposition $A = PQ$ is the polar decomposition and $Q = (A^T\!A)^{1/2}$ is unique. This connection between the SVD and the polar decomposition is useful both theoretically and computationally.

## Applications

The SVD is used in a very wide variety of applications—too many and varied to attempt to summarize here. We just mention two.

The SVD can be used to help identify to which letters vowels and consonants have been mapped in a substitution cipher (Moler and Morrison, 1983).

An inverse use of the SVD is to construct test matrices by forming a diagonal matrix of singular values from some distribution then pre- and post-multiplying by random orthogonal matrices. The result is matrices with known singular values and 2-norm condition number that are nevertheless random. Such “randsvd” matrices are widely used to test algorithms in numerical linear algebra.

## History and Computation

The SVD was introduced independently by Beltrami in 1873 and Jordan in 1874. Golub popularized the SVD as an essential computational tool and developed the first reliable algorithms for computing it. The Golub–Reinsch algorithm, dating from the late 1960s and based on bidiagonalization and the QR algorithm, is the standard way to compute the SVD. Various alternatives are available; see the references.

## References

This is a minimal set of references, which contain further useful references within.

## Related Blog Posts

1. José-Javier Martínez says: