Singular Value Inequalities

Recall that the singular value decomposition (SVD) of a matrix A \in\mathbb{C}^{m\times n} is a factorization A = U\Sigma V^*, where U\in\mathbb{C}^{m\times m} and V\in\mathbb{C}^{n\times n} are unitary and \Sigma = \mathrm{diag}(\sigma_1,\dots, \sigma_p)\in\mathbb{R}^{m\times n}, with \sigma_1\ge \sigma_2\ge \cdots \ge \sigma_p \ge 0, where where p = \min(m,n). We sometimes write \sigma_i(A) to specify the matrix to which the singular value belongs.

A standard technique for obtaining singular value inequalities for A is to apply eigenvalue inequalities to the Hermitian positive semidefinite matrices A^*A or AA^*, whose eigenvalues are the squares of the singular values of A, or to the Hermitian matrix

\notag    \begin{bmatrix}      0   & A \\      A^* & 0     \end{bmatrix}, \qquad (1)

whose eigenvalues are plus and minus the singular values of A together with |m-n| zero eigenvalues if m\ne n.

We begin with a variational characterization of singular values.

Theorem 1. For A\in\mathbb{C}^{m\times n},

\notag \begin{aligned}   \sigma_k &= \min_{\dim(S)=n-k+1} \, \max_{0\ne x\in S} \frac{\|Ax\|_2}{\|x\|_2}\\            &= \max_{\dim(S)= k} \, \min_{0\ne x\in S} \frac{\|Ax\|_2}{\|x\|_2},                  \quad k=1\colon \min(m,n), \end{aligned}

where S\subseteq \mathbb{C}^n.

Proof. The result is obtained by applying the Courant–Fischer theorem (a variational characterization of eigenvalues) to A^*A. ~\square

As a special case of Theorem 1, we have

\notag    \sigma_1 = \displaystyle\max_{x \ne 0}\frac{ \|Ax\|_2 }{ \|x\|_2 },               \qquad (2)

and, for m\ge n,

\notag    \sigma_n = \displaystyle\min_{x \ne 0}\frac{ \|Ax\|_2 }{ \|x\|_2 }.               \qquad (3)

The expression in the theorem can be rewritten using \|x\|_2 = \max_{y\ne 0}|y^*x|/\|y\|_2 (the equality case in the Cauchy–Schwarz inequality). For example, (2) is equivalent to

\notag   \sigma_1 = \displaystyle\max_{0\ne x\in \mathbb{C}^n\atop 0 \ne y \in \mathbb{C}^m}         \displaystyle\frac{|y^*Ax|}{\|x\|_2\|y\|_2}.

Our first perturbation result bounds the change in a singular value.

Theorem 2. For A,B\in\mathbb{C}^{m\times n},

\notag   |\sigma_i(A) - \sigma_i(B)| \le \|A - B \|_2, \quad i = 1\colon \min(m,n).    \qquad (4)

Proof. The bound is obtained by applying the corresponding result for the Hermitian eigenvalue problem to (1). ~\square

The bound (4) says that the singular values of a matrix are well conditioned under additive perturbation. Now we consider multiplicative perturbations.

The next result is an analogue for singular values of Ostrowski’s theorem for eigenvalues.

Theorem 3. For A\in \mathbb{C}^{m\times n} and nonsingular X\in\mathbb{C}^{n\times n} and Y\in\mathbb{C}^{m\times m},

\notag     \sigma_i(Y^*AX) = \theta_i \sigma_i(A), \quad i = 1\colon \min(m,n),      \qquad (5)

where \sigma_n(X)\sigma_m(Y) \le \theta_i \le \sigma_1(X) \sigma_1(Y).

A corollary of this result is

\notag   |\sigma_i(A) - \sigma_i(Y^*AX)| \le \sigma_i(A) \epsilon, \quad i = 1\colon \min(m,n),    \qquad (6)

where \epsilon = \max(\|X^*X - I\|_2,\|Y^*Y - I\|_2). The bounds (5) and (6) are intuitively reasonable, because unitary transformations preserve singular values and the bounds quantify in different ways how close X and Y are to being unitary.

Next, we have an interlacing property.

Theorem 4. Let A\in\mathbb{C}^{m\times n}, A_k = A(:,1\colon k), and q = \min(m,k). Then

\notag  \sigma_{i+1}(A_{k+1}) \le \sigma_i(A_k) \le \sigma_i(A_{k+1}), \quad   i=1\colon q, \quad k = 1\colon n-1,

where we define \sigma_{q+1}(A_{k+1}) = 0 if m < k+1.

Proof. The result is obtained by applying the Cauchy interlace theorem to A^*A, noting that A_k^*A_k is the leading principal submatrix of order k of A^*A. ~\square

An analogous result holds with rows playing the role of columns (just apply Theorem 4 to A^*).

Theorem 4 encompasses two different cases, which we illustrate with i = q and k = n-1. The first case is m \ge n, so that q = n-1 and

\notag   \sigma_n(A) \le \sigma_{n-1}(A_{n-1}) \le \sigma_{n-1}(A).

The second case is m < n, so q = m and

\notag   0 \le \sigma_m(A_{n-1}) \le \sigma_m(A).

Therefore Theorem 3 shows that removing a column from A does not increase any singular value and that when m\ge n no singular value decreases below \sigma_n(A). However, when m < n the smallest singular value of A_{n-1} may be less than the smallest singular value of A.

Here is a numerical example. Note that transposing A does not change its singular values.

>> rng(1), A = rand(5,4); % Case 1.
>> B = A(:,1:end-1); sv_A = svd(A)', sv_B = svd(B)'
sv_A =
   1.7450e+00   6.4492e-01   5.5015e-01   3.2587e-01
sv_B =
   1.5500e+00   5.8472e-01   3.6128e-01
> A = A'; B = A(:,1:end-1); sv_B = svd(B)' % Case 2
sv_B =
   1.7098e+00   6.0996e-01   4.6017e-01   1.0369e-01

By applying Theorem 4 repeatedly we find that if we partition A = [A_{11}~A_{12}] then \sigma_i(A_{11}) \le \sigma_i(A) for all i for which the left-hand side is defined.

References