A matrix norm is a function satisfying
with equality if and only if
(nonnegativity),
for all
,
(homogeneity),
for all
(the triangle inequality).
These are analogues of the defining properties of a vector norm.
An important class of matrix norms is the subordinate matrix norms. Given a vector norm on the corresponding subordinate matrix norm on
is defined by
or, equivalently,
For the -,
-, and
-vector norms it can be shown that
where denotes the largest singular value of
. To remember the formulas for the
– and
-norms, note that 1 is a vertical symbol (for columns) and
is a horizontal symbol (for rows). For the general Hölder
-norm no explicit formula is known for
for
.
An immediate implication of the definition of subordinate matrix norm is that
whenever the product is defined. A norm satisfying this condition is called consistent or submultiplicative.
Another commonly used norm is the Frobenius norm,
The Frobenius norm is not subordinate to any vector norm (since , whereas
for any subordinate norm), but it is consistent.
The -norm and the Frobenius norm are unitarily invariant: they satisfy
for any unitary matrices
and
. For the Frobenius norm the invariance follows easily from the trace formula.
As for vector norms, all matrix norms are equivalent in that any two norms differ from each other by at most a constant. This table shows the optimal constants such that
for
and the norms mentioned above. Each inequality in the table is attained for some
.
For any and any consistent matrix norm,
where is the spectral radius (the largest absolute value of any eigenvalue). To prove this inequality, let
be an eigenvalue of
and
the corresponding eigenvector (necessarily nonzero), and form the matrix
. Then
, so
, giving
since
. For a subordinate norm it suffices to take norms in the equation
.
A useful relation is
which follows from , where the first two equalities are obtained from the singular value decomposition and we have used (1).
Mixed Subordinate Matrix Norms
A more general subordinate matrix norm can be defined by taking different vector norms in the numerator and denominator:
Some authors denote this norm by .
A useful characterization of is given in the next result. Recall that
denotes the dual of the vector norm
.
Theorem 1. For
,
Proof. We have
where the second equality follows from the definition of dual vector norm and the fact that the dual of the dual norm is the original norm.
We can now obtain a connection between the norms of and
. Here,
denotes
.
Theorem 2. If
then
.
Proof. Using Theorem 1, we have
If we take the – and
-norms to be the same
-norm then we have
, where
(giving, in particular,
and
, which are easily obtained directly).
Now we give explicit formulas for the norm when
or
is
or
and the other is a general norm.
Theorem 3. For
,
For
,
where
and if
is symmetric positive semidefinite then
Proof. For (3),
with equality for
, where the maximum is attained for
. For (4), using the Hölder inequality,
Equality is attained for an
that gives equality in the Hölder inequality involving the
th row of
, where the maximum is attained for
.
Turning to (5), we have
. The unit cube
, where
, is a convex polyhedron, so any point within it is a convex combination of the vertices, which are the elements of
. Hence
implies
and then
Hence
, but trivially
and (5) follows.
Finally, if
is symmetric positive semidefinite let
. Then, using a Cholesky factorization
(which exists even if
is singular) and the Cauchy–Schwarz inequality,
Conversely, for
we have
so
. Hence
, using (5).
As special cases of (3) and (4) we have
We also obtain by using Theorem 2 and (5), for ,
The -norm has recently found use in statistics (Cape, Tang, and Priebe, 2019), the motivation being that because it satisfies
the -norm can be much smaller than the
-norm when
and so can be a better norm to use in bounds. The
– and
-norms are used by Rebrova and Vershynin (2018) in bounding the
-norm of a random matrix after zeroing a submatrix. They note that the
-norm of a random
matrix involves maximizing over infinitely many random variables, while the
-norm and
-norm involve only
and
random variables, respectively.
The () norm is not consistent, but for any vector norm
, we have
Matrices With Constant
-Norms
Let be a nonnegative square matrix whose row and column sums are all equal to
. This class of matrices includes magic squares and doubly stochastic matrices. We have
, so
by (2). But
for the vector
of
s, so
is an eigenvalue of
and hence
by (1). Hence
. In fact,
for all
(see Higham (2002, Prob. 6.4) and Stoer and Witzgall (1962)).
Computing Norms
The problem of computing or estimating may be difficult for two reasons: we may not have an explicit formula for the norm, or
may be given implicitly, as
,
, or
, for example, and we may wish to compute the norm without forming
. Both difficulties are overcome by a generalization of the power method proposed by Tao in 1975 for arbitrary norms and independently Boyd in 1984 for
and
both
-norms (see Higham, 2002, Sec.
15.2). The power method accesses
only through matrix–vector products with
and
, so
does not need to be explicitly available.
Let us focus on the case where and
are the same
-norm. The power method is implemented in the Matrix Computation Toolbox as
pnorm
and we use it here to estimate the -norm and the
-norm, which we compare with the
-,
-, and
-norms.
>> A = gallery('frank',4) A = 4 3 2 1 3 3 2 1 0 2 2 1 0 0 1 1 >> [norm(A,1) norm(A,2) pnorm(A,pi), pnorm(A,99), norm(A,inf)] ans = 8.0000e+00 7.6237e+00 8.0714e+00 9.8716e+00 1.0000e+01
The plot in the top half of the following figure shows the estimated -norm for the matrix
gallery('binomial',8)
for . This shape of curve is typical, because, as the plot in the lower half indicates,
is a convex function of
for
(see Higham, 2002, Sec.
6.3).
The power method for the norm, which is the max norm in (6), is investigated by Higham and Relton (2016) and extended to estimate the
largest values
or
.
A Norm Related to Grothendieck’s Inequality
A norm on can be defined by
Note that
A famous inequality of Grothendieck states that for all
, for some constant
independent of
. Much work has been done to estimate the smallest possible constant
, which is known to be in the interval
, or
for the analogous inequality for real data. See Friedland, Lim, and Zhang (2018) and the references therein.
Notes
The formula (5) with is due to Rohn (2000), who shows that evaluating it is NP-hard. For general
, the formula is given by Lewis (2010).
Computing mixed subordinate matrix norms based on -norms is generally NP-hard (Hendrickx and Olshevsky, 2010).
References
This is a minimal set of references, which contain further useful references within.
- Joshua Cape, Minh Tang, and Carey Priebe, The Two-To-Infinity Norm and Singular Subspace Geometry with Applications to High-Dimensional Statistics, Ann. Statist. 47(5), 2405–2439, 2019.
- Shmuel Friedland, Lek-Heng Lim and Jinjie Zhang, An Elementary and Unified Proof of Grothendieck’s Inequality, L’Enseignement Mathématique 64 (3), 327–351, 2018.
- Julien Hendrickx and Alex Olshevsky, Matrix
-Norms are NP-Hard to Approximate if
, SIAM J. Matrix Anal. Appl. 31(5), 2802–2812, 2010.
- Nicholas J. Higham, Accuracy and Stability of Numerical Algorithms, second edition, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2002.
- Nicholas J. Higham and Samuel Relton, Estimating the Largest Elements of a Matrix, SIAM J. Sci. Comput. 38(5), C584–C601, 2016,
- Andrew D. Lewis, A Top Nine List: Most Popular Induced Matrix Norms, Manuscript, 2010.
- E. Rebrova and R. Vershynin, Norms of Random Matrices: Local and Global Problems, Adv. Math. 324, 40–83, 2018.
- Josef Stoer and C. Witzgall, Transformations by Diagonal Matrices in a Normed Space, Numer. Math. 4(1), 158–171, 1962.
Related Blog Posts
- What Is a Unitarily Invariant Norm? (2021)
- What Is a Vector Norm? (2021)
This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.
Thanks, Nick, this is a great summary!
Sir, how could we compute L-1/L-infinity Mixed induced matrix norms?
By (3) it is $\max_{i,j} |a_{ij}|$.