What Is Fast Matrix Multiplication?

The definition of matrix multiplication says that for $n\times n$ matrices $A$ and $B$ , the product $C = AB$ is given by $c_{ij} = \sum_{k=1}^n a_{ik}b_{kj}$ . Each element of $C$ is an inner product of a row of $A$ and a column of $B$ , so if this formula is used then the cost of forming $C$ is $n^2(2n-1)$ additions and multiplications, that is, $O(n^3)$ operations. For over a century after the development of matrix algebra in the 1850s by Cayley, Sylvester and others, all methods for matrix multiplication were based on this formula and required $O(n^3)$ operations.

In 1969 Volker Strassen showed that when $n=2$ the product can be computed from the formulas

$\notag \begin{gathered} p_1 =(a_{11}+a_{22})(b_{11}+b_{22}), \\ p_2=(a_{21}+a_{22})b_{11}, \qquad p_3=a_{11}(b_{12}-b_{22}), \\ p_4=a_{22}(b_{21}-b_{11}), \qquad p_5=(a_{11}+a_{12})b_{22}, \\ p_6=(a_{21}-a_{11})(b_{11}+b_{12}), \qquad p_7=(a_{12}-a_{22})(b_{21}+b_{22}), \\ \noalign{\smallskip} C = \begin{bmatrix} p_1+p_4-p_5+p_7 & p_3+p_5 \\ p_2+p_4 & p_1+p_3-p_2+p_6 \end{bmatrix}. \end{gathered}$

The evaluation requires $7$ multiplications and $18$ additions instead of $8$ multiplications and $4$ additions for the usual formulas.

At first sight, Strassen’s formulas may appear simply to be a curiosity. However, the formulas do not rely on commutativity so are valid when the $a_{ij}$ and $b_{ij}$ are matrices, in which case for large dimensions the saving of one multiplication greatly outweighs the extra $14$ additions. Assuming $n$ is a power of $2$ , we can partition $A$ and $B$ into four blocks of size $n/2$ , apply Strassen’s formulas for the multiplication, and then apply the same formulas recursively on the half-sized matrix products.

Let us examine the number of multiplications for the recursive Strassen algorithm. Denote by $M(k)$ the number of scalar multiplications required to multiply two $2^k \times 2^k$ matrices. We have $M(k) = 7M(k-1)$ , so

$M(k) = 7M(k-1) = 7^2M(k-2) = \cdots = 7^k M(0) = 7^k.$

But $7^k = 2^{\log_2{7^k}} = (2^k)^{\log_2 7} = n^{\log_2 7} = n^{2.807\dots}$ . The number of additions can be shown to be of the same order of magnitude, so the algorithm requires $O(n^{\log_27})=O(n^{2.807\dots})$ operations.

Strassen’s work sparked interest in finding matrix multiplication algorithms of even lower complexity. Since there are $O(n^2)$ elements of data, which must each participate in at least one operation, the exponent of $n$ in the operation count must be at least $2$ .

The current record upper bound on the exponent is $2.37286$ , proved by Alman and Vassilevska Williams (2021) which improved on the previous record of $2.37287$ , proved by Le Gall (2014) The following figure plots the best upper bound for the exponent for matrix multiplication over time.

In the methods that achieve exponents lower than 2.775, various intricate techniques are used, based on representing matrix multiplication in terms of bilinear or trilinear forms and their representation as tensors having low rank. Laderman, Pan, and Sha (1993) explain that for these methods “very large overhead constants are hidden in the ` $O$ ‘ notation”, and that the methods “improve on Strassen’s (and even the classical) algorithm only for immense numbers $N$ .”

Strassen’s method, when carefully implemented, can be faster than conventional matrix multiplication for reasonable dimensions. In practice, one does not recur down to $1\times 1$ matrices, but rather uses conventional multiplication once $n_0\times n_0$ matrices are reached, where the parameter $n_0$ is tuned for the best performance.

Strassen’s method has the drawback that it satisfies a weaker form of rounding error bound that conventional multiplication. For conventional multiplication $C = AB$ of $A\in\mathbb{R}^{m\times n}$ and $B\in\mathbb{R}^{n\times p}$ we have the componentwise bound

$\notag |C - \widehat{C}| \le \gamma^{}_n |A|\, |B|, \qquad(1)$

where $\gamma^{}_n = nu/(1-nu)$ and $u$ is the unit roundoff. For Strassen’s method we have only a normwise error bound. The following result uses the norm $\|A\| = \max_{i,j} |a_{ij}|$ , which is not a consistent norm.

Theorem 1 (Brent). Let $A,B \in \mathbb{R}^{n\times n}$ , where $n=2^k$ . Suppose that $C=AB$ is computed by Strassen’s method and that $n_0=2^r$ is the threshold at which conventional multiplication is used. The computed product $\widehat{C}$ satisfies

$\notag \|C - \widehat{C}\| \le \left[ \Bigl( \displaystyle\frac{n}{n_0} \Bigr)^{\log_2{12}} (n_0^2+5n_0) - 5n \right] u \|A\|\, \|B\| + O(u^2). \qquad(2)$

With full recursion ( $n_0=1$ ) the constant in (2) is $6 n^{\log_2 12}-5n \approx 6 n^{3.585}-5n$ , whereas with just one level of recursion ( $n_0 = n/2$ ) it is $3n^2+25n$ . These compare with $n^2u + O(u^2)$ for conventional multiplication (obtained by taking norms in (1)). So the constant for Strassen’s method grows at a faster rate than that for conventional multiplication no matter what the choice of $n_0$ .

The fact that Strassen’s method does not satisfy a componentwise error bound is a significant weakness of the method. Indeed Strassen’s method cannot even accurately multiply by the identity matrix. The product

$\notag C = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & \epsilon \\ \epsilon & \epsilon^2 \end{bmatrix}, \quad 0 < \epsilon \ll 1$

is evaluated exactly in floating-point arithmetic by conventional multiplication, but Strassen’s method computes

$c_{22} = 2(1+\epsilon^2) + (\epsilon-\epsilon^2) - 1 - (1+\epsilon).$

Because $c_{22}$ involves subterms of order unity, the error $c_{22} - \widehat c_{22}$ will be of order $u$ . Thus the relative error $|c_{22} - \widehat c_{22}| / |c_{22}| = O(u/\epsilon^2) \gg O(u)$ ,

Another weakness of Strassen’s method is that while the scaling $AB \to (AD) (D^{-1}B)$ , where $D$ is diagonal, leaves (1) unchanged, it can alter (2) by an arbitrary amount. Dumitrescu (1998) suggests computing $D_1(D_1^{-1}A\cdot B D_2^{-1})D_2$ , where the diagonal matrices $D_1$ and $D_2$ are chosen to equilibrate the rows of $A$ and the columns of $B$ in the $\infty$ -norm; he shows that this scaling can improve the accuracy of the result. Further investigations along these lines are made by Ballard et al. (2016).

Should one use Strassen’s method in practice, assuming that an implementation is available that is faster than conventional multiplication? Not if one needs a componentwise error bound, which ensures accurate products of matrices with nonnegative entries and ensures that the column scaling of $A$ and row scaling of $B$ has no effect on the error. But if a normwise error bound with a faster growing constant than for conventional multiplication is acceptable then the method is worth considering.

Notes

For recent work on high-performance implementation of Strassen’s method see Huang et al. (2016, 2020).

Theorem 1 is from an unpublished technical report of Brent (1970). A proof can be found in Higham (2002, §23.2.2).

For more on fast matrix multiplication see Bini (2014) and Higham (2002, Chapter 23).

References

This is a minimal set of references, which contain further useful references within.

Josh Alman and Virginia Vassilevska Williams. A refined laser method and faster matrix multiplication. In Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA), Society for Industrial and Applied Mathematics, January 2021, pages 522–539.
Grey Ballard, Austin R. Benson, Alex Druinsky, Benjamin Lipshitz, and Oded Schwartz. Improving the numerical stability of fast matrix multiplication. SIAM J. Matrix Anal. Appl, 37(4):1382–1418, 2016.
Benson, Alex Druinsky, Benjamin Lipshitz, and Oded Schwartz. Improving the numerical stability of fast matrix multiplication. SIAM J. Matrix Anal. Appl., 37(4):1382–1418, 2016.
Dario A. Bini. Fast matrix multiplication. In Handbook of Linear Algebra, Leslie Hogben, editor, second edition, Chapman and Hall/CRC, Boca Raton, FL, USA, 2014, pages 61.1–61.17.
Bogdan Dumitrescu. Improving and estimating the accuracy of Strassen’s algorithm. Numer. Math., 79:485–499, 1998.
Nicholas J. Higham, Accuracy and Stability of Numerical Algorithms, second edition, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2002.
Jianyu Huang, Tyler M. Smith, Greg M. Henry, and Robert A. van de Geijn. Strassen’s algorithm reloaded. In SC16: International Conference for High Performance Computing, Networking, Storage and Analysis, IEEE, November 2016.
Jianyu Huang, Chenhan D. Yu, and Robert A. van de Geijn. Strassen’s algorithm reloaded on GPUs, ACM Trans. Math. Software, 46(1):1:1–1:22, 2020.
Julian Laderman, Victor Pan, and Xuan-He Sha. On practical algorithms for accelerated matrix multiplication. Linear Algebra Appl., 162–164:557–588, 1992.
François Le Gall. Powers of tensors and fast matrix multiplication. In Proceedings of the 39th International Symposium on Symbolic and Algebraic Computation, 2014, pages 296–303.

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

What Is the Pascal Matrix?

The Pascal matrix $P_n\in\mathbb{R}^{n\times n}$ is the symmetric matrix defined by

$p_{ij} = \displaystyle{i+j-2 \choose j-1} = \displaystyle\frac{ (i+j-2)! }{ (i-1)! (j-1)! }.$

It contains the rows of Pascal’s triangle along the anti-diagonals. For example:

$\notag P_5 = \left[\begin{array}{ccccc} 1 & 1 & 1 & 1 & 1\\ 1 & 2 & 3 & 4 & 5\\ 1 & 3 & 6 & 10 & 15\\ 1 & 4 & 10 & 20 & 35\\ 1 & 5 & 15 & 35 & 70 \end{array}\right].$

In MATLAB, the matrix is pascal(n).

The Pascal matrix is positive definite and has the Cholesky factorization

$\notag P_n = L_nL_n^T, \qquad(1)$

where the rows of $L_n$ are the rows of Pascal’s triangle. For example,

$\notag L_5 = \left[\begin{array}{ccccc} 1 & 0 & 0 & 0 & 0\\ 1 & 1 & 0 & 0 & 0\\ 1 & 2 & 1 & 0 & 0\\ 1 & 3 & 3 & 1 & 0\\ 1 & 4 & 6 & 4 & 1 \end{array}\right]\\$

From (1) we have $\det(P_n) = \det(L_n)^2 = 1$ . Form this equation, or by inverting (1), it follows that $P_n^{-1}$ has integer elements. Indeed the inverse is known to have $(i,j)$ element

$\notag (-1)^{i-j} \displaystyle\sum_{k=0}^{n-i} {i+k-1 \choose k} {i+k-1 \choose i+k-j}, \quad i \ge j. \qquad(2)$

The Cholesky factor $L_n$ can be factorized as

$\notag L_n = M_{n-1} M_{n-2} \dots M_1, \qquad(3)$

where $M_i$ is unit lower bidiagonal with the first $i-1$ entries along the subdiagonal of $M_i$ zero and the rest equal to $1$ . For example,

$\notag \begin{aligned} L_5 = \left[\begin{array}{ccccc} 1 & 0 & 0 & 0 & 0\\ 0 & 1 & 0 & 0 & 0\\ 0 & 0 & 1 & 0 & 0\\ 0 & 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1 & 1 \end{array}\right] \left[\begin{array}{ccccc} 1 & 0 & 0 & 0 & 0\\ 0 & 1 & 0 & 0 & 0\\ 0 & 0 & 1 & 0 & 0\\ 0 & 0 & 1 & 1 & 0\\ 0 & 0 & 0 & 1 & 1 \end{array}\right] \left[\begin{array}{ccccc} 1 & 0 & 0 & 0 & 0\\ 0 & 1 & 0 & 0 & 0\\ 0 & 1 & 1 & 0 & 0\\ 0 & 0 & 1 & 1 & 0\\ 0 & 0 & 0 & 1 & 1 \end{array}\right] \left[\begin{array}{ccccc} 1 & 0 & 0 & 0 & 0\\ 1 & 1 & 0 & 0 & 0\\ 0 & 1 & 1 & 0 & 0\\ 0 & 0 & 1 & 1 & 0\\ 0 & 0 & 0 & 1 & 1 \end{array}\right]. \end{aligned}$

The factorization (3) shows that $P_n$ is totally positive, that is, every minor (a determinant of a square submatrix) is positive. Indeed each bidiagonal factor $M_i$ is totally nonnegative, that is, every minor is nonnegative, and the product of totally nonnegative matrices is totally nonnegative. Further results in the theory of totally positive matrices show that the product is actually totally positive.

The positive definiteness of $P_n$ implies that the eigenvalues are real and positive. The total positivity, together with the fact that $P_n$ is (trivially) irreducible, implies that the eigenvalues are distinct.

For a symmetric positive semidefinite matrix $A$ with nonnegative entries, we define $A^{\circ t} = (a_{ij}^t)$ , which is the matrix with every entry raised to the power $t\in \mathbb{R}$ . We say that $A$ is infinitely divisible if $A^{\circ t}$ is positive semidefinite for all nonnegative $t$ . The Pascal matrix is infinitely divisible.

It is possible to show that

$\notag L_n^{-1} = DL_nD, \qquad (4)$

where $D = \mathrm{diag}((-1)^i)$ . In other words, $Y_n = L_nD$ is involutory, that is, $Y_n^2 = I$ . It follows from $P_n = Y_n Y_n^T$ that

$\notag \begin{aligned} P_n^{-1} = Y_n^{-T} Y_n^{-1} = Y_n^T Y_n = Y_n^{-1} (Y_n Y_n^T) Y_n = Y_n^{-1} P_n Y_n, \end{aligned}$

so $P_n$ and $P_n^{-1}$ are similar and hence have the same eigenvalues. This means that the eigenvalues of $P_n$ appear in reciprocal pairs and that the characteristic polynomial is palindromic. Here is an illustration in MATLAB:

>> P = pascal(5); evals = eig(P); [evals 1./evals], coeffs = charpoly(P)
ans =
   1.0835e-02   9.2290e+01
   1.8124e-01   5.5175e+00
   1.0000e+00   1.0000e+00
   5.5175e+00   1.8124e-01
   9.2290e+01   1.0835e-02
coeffs =
     1   -99   626  -626    99    -1

Now

$p_{nn} \le \|P\|_2 \le (\|P\|_1 \|P\|_{\infty})^{1/2} = \biggl(\displaystyle\frac{2n-1}{n}\biggr) p_{nn},$

where for the equality we used a binomial coefficient summation identity. The fact that $P_n$ is positive definite with reciprocal eigenvalues implies that $\kappa_2(P) = \|P\|_2^2$ . Hence, using Stirling’s approximation ( $n! \sim \sqrt{2\pi n} (n/\mathrm{e})^n$ ),

$\kappa_2(P_n) \sim p_{nn}^2 \sim\left( \displaystyle\frac{ (2n)! }{ (n!)^2 } \right)^2 \sim \displaystyle\frac{16^n}{n \pi}.$

Thus $P_n$ is exponentially ill conditioned as $n\to\infty$ .

The matrix $Y_n$ is obtained in MATLAB with pascal(n,1); this is a lower triangular square root of the identity matrix. Turnbull (1927) noted that by rotating $Y_n$ through 90 degrees one obtains a cube root of the identity matrix. This matrix is returned by pascal(n,2). For example, with $n = 5$ :

$\notag \hspace*{-1cm} X = \left[\begin{array}{rrrrr} 1 & 1 & 1 & 1 & 1\\ -4 & -3 & -2 & -1 & 0\\ 6 & 3 & 1 & 0 & 0\\ -4 & -1 & 0 & 0 & 0\\ 1 & 0 & 0 & 0 & 0 \end{array}\right], \quad X^2 = \left[\begin{array}{rrrrr} 0 & 0 & 0 & 0 & 1\\ 0 & 0 & 0 & -1 & -4\\ 0 & 0 & 1 & 3 & 6\\ 0 & -1 & -2 & -3 & -4\\ 1 & 1 & 1 & 1 & 1 \end{array}\right], \quad X^3 = I.$

The logarithm of $L_n$ is explicitly known: it is the upper bidiagonal matrix

$\notag \log L_n = \left[\begin{array}{ccccc} 0 & 1 & & & \\ & 0 & 2 & & \\[-5pt] & & 0 & \ddots & \\[-5pt] & & & \ddots & n-1\\ & & & & 0 \end{array}\right]. \qquad (5)$

Notes

For proofs of (2) and (4) see Cohen (1975) and Call and Velleman (1993). respectively. For (5), see Edelman and Strang (2004). The infinite divisibility of the Pascal matrix is infinitely is shown by Bhatia (2006). For the total positivity property see Fallat and Johnson (2011).

References

Rajendra Bhatia, Infinitely Divisible Matrices, Amer. Math. Monthly 113, 221–235, 2006
Gregory Call and Daniel Velleman, Pascal’s Matrices, Amer. Math. Monthly 100, 372–376, 1993
A. M. Cohen, The Inverse of a Pascal Matrix, Math, Gaz. 59(408), 111–112, 1975.
Alan Edelman and Gilbert Strang, Pascal Matrices, Amer. Math. Monthly 111, 189–197, 2004.
Shaun Fallat and Charles Johnson, Totally Nonnegative Matrices, Princeton University Press, 2011.
H. W. Turnbull, The Matrix Square and Cube Roots of Unity, J. London Math. Soc. 2, 242–244, 1927.

What Is a Schur Decomposition?

A Schur decomposition of a matrix $A\in\mathbb{C}^{n\times n}$ is a factorization $A = QTQ^*$ , where $Q$ is unitary and $T$ is upper triangular. The diagonal entries of $T$ are the eigenvalues of $A$ , and they can be made to appear in any order by choosing $Q$ appropriately. The columns of $Q$ are called Schur vectors.

A subspace $\mathcal{X}$ of $\mathbb{C}^{n\times n}$ is an invariant subspace of $A$ if $Ax\in\mathcal{X}$ for all $x\in\mathcal{X}$ . If we partition $Q$ and $T$ conformably we can write

$\notag A [Q_1,~Q_2] = [Q_1,~Q_2] \begin{bmatrix} T_{11} & T_{12} \\ 0 & T_{22} \\ \end{bmatrix},$

which gives $A Q_1 = Q_1 T_{11}$ , showing that the columns of $Q_1$ span an invariant subspace of $A$ . Furthermore, $Q_1^*AQ_1 = T_{11}$ . The first column of $Q$ is an eigenvector of $A$ corresponding to the eigenvalue $\lambda_1 = t_{11}$ , but the other columns are not eigenvectors, in general. Eigenvectors can be computed by solving upper triangular systems involving $T - \lambda I$ , where $\lambda$ is an eigenvalue.

Write $T = D+N$ , where $D = \mathrm{diag}(\lambda_i)$ and $N$ is strictly upper triangular. Taking Frobenius norms gives $\|A\|_F^2 = \|D\|_F^2 + \|N\|_F^2$ , or

$\notag \|N\|_F^2 = \|A\|_F^2 - \displaystyle\sum_{i=1}^n |\lambda_i|^2.$

Hence $\|N\|_F$ is independent of the particular Schur decomposition and it provides a measure of the departure from normality. The matrix $A$ is normal (that is, $A^*A = AA^*$ ) if and only if $N = 0$ . So a normal matrix is unitarily diagonalizable: $A = QDQ^*$ .

An important application of the Schur decomposition is to compute matrix functions. The relation $f(A) = Qf(T)Q^*$ shows that computing $f(A)$ reduces to computing a function of a triangular matrix. Matrix functions illustrate what Van Loan (1975) describes as “one of the most basic tenets of numerical algebra”, namely “anything that the Jordan decomposition can do, the Schur decomposition can do better!”. Indeed the Jordan canonical form is built on a possibly ill conditioned similarity transformation while the Schur decomposition employs a perfectly conditioned unitary similarity, and the full upper triangular factor of the Schur form can do most of what the Jordan form’s bidiagonal factor can do.

An upper quasi-triangular matrix is a block upper triangular matrix

$\notag R = \begin{bmatrix} R_{11} & R_{12} & \dots & R_{1m}\\ & R_{22} & \dots & R_{2m}\\ & & \ddots& \vdots\\ & & & R_{mm} \end{bmatrix}$

whose diagonal blocks $R_{ii}$ are either $1\times1$ or $2\times2$ . A real matrix $A\in\mathbb{R}^{n \times n}$ has a real Schur decomposition $A = QRQ^T$ in which in which all the factors are real, $Q$ is orthogonal, and $R$ is upper quasi-triangular with any $2\times2$ diagonal blocks having complex conjugate eigenvalues. If $A$ is normal then the $2\times 2$ blocks $R_{ii}$ have the form

$R_{ii} = \left[\begin{array}{@{}rr@{\mskip2mu}} a & b \\ -b & a \end{array}\right], \quad b \ne 0,$

which has eigenvalues $a \pm \mathrm{i}b$ .

The Schur decomposition can be computed by the QR algorithm at a cost of about $25n^3$ flops for $Q$ and $T$ , or $10n^3$ flops for $T$ only.

In MATLAB, the Schur decomposition is computed with the schur function: the command [Q,T] = schur(A) returns the real Schur form if $A$ is real and otherwise the complex Schur form. The complex Schur form for a real matrix can be obtained with [Q,T] = schur(A,'complex'). The rsf2csf function converts the real Schur form to the complex Schur form. The= ordschur function takes a Schur decomposition and modifies it so that the eigenvalues appear in a specified order along the diagonal of $T$ .

References

Nicholas J. Higham, Functions of Matrices: Theory and Computation, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2008.
C. F. Van Loan, A Study of the Matrix Exponential, Numerical Analysis Report No. 10, University of Manchester, UK, 1975.

What Is a Permutation Matrix?

A permutation matrix is a square matrix in which every row and every column contains a single $1$ and all the other elements are zero. Such a matrix, $P$ say, is orthogonal, that is, $P^TP = PP^T = I_n$ , so it is nonsingular and has determinant $\pm 1$ . The total number of $n\times n$ permutation matrices is $n!$ .

Premultiplying a matrix by $P$ reorders the rows and postmultiplying by $P$ reorders the columns. A permutation matrix $P$ that has the desired reordering effect is constructed by doing the same operations on the identity matrix.

Examples of permutation matrices are the identity matrix $I_n$ , the reverse identity matrix $J_n$ , and the shift matrix $K_n$ (also called the cyclic permutation matrix), illustrated for $n = 4$ by

$\notag I_4 = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix}, \qquad J_4 = \begin{bmatrix} 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \\ 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0 \end{bmatrix}, \qquad K_4 = \begin{bmatrix} 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ 1 & 0 & 0 & 0 \end{bmatrix}.$

Pre- or postmultiplying a matrix by $J_n$ reverses the order of the rows and columns, respectively. Pre- or postmultiplying a matrix by $K_n$ shifts the rows or columns, respectively, one place forward and moves the first one to the last position—that is, it cyclically permutes the rows or columns. Note that $J_n$ is a symmetric Hankel matrix and $K_n$ is a circulant matrix.

An elementary permutation matrix $P$ differs from $I_n$ in just two rows and columns, $i$ and $j$ , say. It can be written $P = I_n - (e_i-e_j)(e_i-e_j)^T$ , where $e_i$ is the $i$ th column of $I_n$ . Such a matrix is symmetric and so satisfies $P^2 = I_n$ , and it has determinant $-1$ . A general permutation matrix can be written as a product of elementary permutation matrices $P = P_1P_2\dots P_k$ , where $k$ is such that $\det(P) = (-1)^k$ .

It is easy to show that $\det(\lambda I - K_n) = \lambda^n - 1$ , which means that the eigenvalues of $K_n$ are $1, w, w^2, \dots, w^{n-1}$ , where $w = \exp(2\pi\mathrm{i}/n)$ is the $n$ th root of unity. The matrix $K_n$ has two diagonals of $1$ s, which move up through the matrix as it is powered: $K_n^i \ne I$ for $i< n$ and $K_n^n = I$ . The following animated gif superposes MATLAB spy plots of $K_{64}$ , $K_{64}^2$ , …, $K_{64}^{64} = I_{64}$ .

The shift matrix $K_n$ plays a fundamental role in characterizing irreducible permutation matrices. Recall that a matrix $A\in\mathbb{C}^{n\times n}$ is irreducible if there does not exist a permutation matrix $P$ such that

$\notag P^TAP = \begin{bmatrix} A_{11} & A_{12} \\ 0 & A_{22} \end{bmatrix},$

where $A_{11}$ and $A_{22}$ are square, nonempty submatrices.

Theorem 1. For a permutation matrix $P \in \mathbb{R}^{n \times n}$ the following conditions are equivalent.

$P$ is irreducible.

There exists a permutation matrix $Q$ such that $Q^{-1}PQ = K_n$

The eigenvalues of $P$ are $1, w, w^2, \dots, w^{n-1}$ .

One consequence of Theorem 1 is that for any irreducible permutation matrix $P$ , $\mathrm{rank}(P - I) = \mathrm{rank}(K_n - I) = n-1$ .

The next result shows that a reducible permutation matrix can be expressed in terms of irreducible permutation matrices.

Theorem 2. Every reducible permutation matrix is permutation similar to a direct sum of irreducible permutation matrices.

Another notable permutation matrix is the vec-permutation matrix, which relates $A\otimes B$ to $B\otimes A$ , where $\otimes$ is the Kronecker product.

A permutation matrix is an example of a doubly stochastic matrix: a nonnegative matrix whose row and column sums are all equal to $1$ . A classic result characterizes doubly stochastic matrices in terms of permutation matrices.

Theorem 3 (Birkhoff). A matrix is doubly stochastic if and only if it is a convex combination of permutation matrices.

In coding, memory can be saved by representing a permutation matrix $P$ as an integer vector $p$ , where $p_i$ is the column index of the $1$ within the $i$ th row of $P$ . MATLAB functions that return permutation matrices can also return the permutation in vector form. Here is an example with the MATLAB lu function that illustrates how permuting a matrix can be done using the vector permutation representation.

>> A = gallery('frank',4), [L,U,P] = lu(A); P
A =
     4     3     2     1
     3     3     2     1
     0     2     2     1
     0     0     1     1
P =
     1     0     0     0
     0     0     1     0
     0     0     0     1
     0     1     0     0
>> P*A
ans =
     4     3     2     1
     0     2     2     1
     0     0     1     1
     3     3     2     1
>> [L,U,p] = lu(A,'vector'); p
p =
     1     3     4     2
>> A(p,:)
ans =
     4     3     2     1
     0     2     2     1
     0     0     1     1
     3     3     2     1

For more on handling permutations in MATLAB see section 24.3 of MATLAB Guide.

Notes

For proofs of Theorems 1–3 see Zhang (2011, Sec. 5.6). Theorem 3 is also proved in Horn and Johnson (2013, Thm. 8.7.2).

Permutations play a key role in the fast Fourier transform and its efficient implementation; see Van Loan (1992).

References

Desmond Higham and Nicholas Higham, MATLAB Guide, third edition, SIAM, 2017.
Roger A. Horn and Charles R. Johnson, Matrix Analysis, second edition, Cambridge University Press, 2013. My review of the second edition.
Charles F. Van Loan, Computational Frameworks for the Fast Fourier Transform, SIAM, 1992.
Fuzhen Zhang, Matrix Theory: Basic Results and Techniques, Springer, 2011.

What Is the Matrix Inverse?

The inverse of a matrix $A\in\mathbb{C}^{n\times n}$ is a matrix $X\in\mathbb{C}^{n\times n}$ such that $AX = I$ , where $I$ is the identity matrix (which has ones on the diagonal and zeros everywhere else). The inverse is written as $A^{-1}$ . If the inverse exists then $A$ is said to be nonsingular or invertible, and otherwise it is singular.

The inverse $X$ of $A$ also satisfies $XA = I$ , as we now show. The equation $AX = I$ says that $Ax_j = e_j$ for $j=1\colon n$ , where $x_j$ is the $j$ th column of $A$ and $e_j$ is the $j$ th unit vector. Hence the $n$ columns of $A$ span $\mathbb{C}^n$ , which means that the columns are linearly independent. Now $A(I-XA) = A - AXA = A -A = 0$ , so every column of $I - XA$ is in the null space of $A$ . But this contradicts the linear independence of the columns of $A$ unless $I - XA = 0$ , that is, $XA = I$ .

The inverse of a nonsingular matrix is unique. If $AX = AW = I$ then premultiplying by $X$ gives $XAX = XAW$ , or, since $XA = I$ , $X = W$ .

The inverse of the inverse is the inverse: $(A^{-1})^{-1} = A$ , which is just another way of interpreting the equations $AX = XA = I$ .

Connections with the Determinant

Since the determinant of a product of matrices is the product of the determinants, the equation $AX = I$ implies $\det(A) \det(X) = 1$ , so the inverse can only exist when $\det(A) \ne 0$ . In fact, the inverse always exists when $\det(A) \ne 0$ .

An explicit formula for the inverse is

$\notag A^{-1} = \displaystyle\frac{\mathrm{adj}(A)}{\det(A)}, \qquad (1)$

where the adjugate $\mathrm{adj}$ is defined by

$\bigl(\mathrm{adj}(A)\bigr)_{ij} = (-1)^{i+j} \det(A_{ji})$

and where $A_{pq}$ denotes the submatrix of $A$ obtained by deleting row $p$ and column $q$ . A special case is the formula

$\notag \begin{bmatrix} a & b \\ c& d \end{bmatrix}^{-1} = \displaystyle\frac{1}{ad-bc} \begin{bmatrix} d & -b \\ -c & a \end{bmatrix}.$

The equation $AA^{-1} = I$ implies $\det(A^{-1}) = 1/\det(A)$ .

Conditions for Nonsingularity

The following result collects some equivalent conditions for a matrix to be nonsingular. We denote by $\mathrm{null}(A)$ the null space of $A$ (also called the kernel).

Theorem 1. For $A \in \mathbb{C}^{n \times n}$ the following conditions are equivalent to $A$ being nonsingular:

$\mathrm{null}(A) = \{0\}$ ,

$\mathrm{rank}(A) = n$ ,

$Ax=b$ has a unique solution $x$ , for any $b$ ,

none of the eigenvalues of $A$ is zero,

$\det(A) \ne 0$ .

A useful formula is

$\notag (AB)^{-1} = B^{-1}A^{-1}.$

Here are some facts about the inverses of $n\times n$ matrices of special types.

A diagonal matrix $D = \mathrm{diag}(d_i)$ is nonsingular if $d_i\ne0$ for all $i$ , and $D^{-1} = \mathrm{diag}(d_i^{-1})$ .
An upper (lower) triangular matrix $T$ is nonsingular if its diagonal elements are nonzero, and the inverse is upper (lower) triangular with $(i,i)$ element $t_{ii}^{-1}$ .
If $x,y\in\mathbb{C}^n$ and $y^*A^{-1}x \ne -1$ , then $A + xy^*$ is nonsingular and
$\notag \bigl(A + xy^*\bigr)^{-1} = A^{-1} - \displaystyle\frac{A^{-1}x y^* A^{-1}}{1 + y^*A^{-1}x}.$

This is the Sherman–Morrison formula.

The Inverse as a Matrix Polynomial

The Cayley-–Hamilton theorem says that a matrix satisfies its own characteristic equation, that is, if $p(t) = \det(tI - A) = t^n + c_{n-1} t^{n-1} + \cdots + c_0$ , then $p(A) = 0$ . In other words, $A^n + c_{n-1} A^{n-1} + \cdots + c_0I = 0$ , and if $A$ is nonsingular then multiplying through by $A^{-1}$ gives (since $c_0 = p(0) = (-1)^n\det(A) \ne 0)$

$A^{-1} = -\displaystyle\frac{1}{c_0} (A^{n-1} + c_{n-1} A^{n-2} + \cdots + c_1 I). \qquad (2)$

This means that $A^{-1}$ is expressible as a polynomial of degree at most $n-1$ in $A$ (with coefficients that depend on $A$ ).

To Compute or Not to Compute the Inverse

The inverse is an important theoretical tool, but it is rarely necessary to compute it explicitly. If we wish to solve a linear system of equations $Ax = b$ then computing $A^{-1}$ and then forming $x = A^{-1}b$ is both slower and less accurate in floating-point arithmetic than using LU factorization (Gaussian elimination) to solve the system directly. Indeed, for $n = 1$ one would not solve $3x = 1$ by computing $3^{-1} \times 1$ .

For sparse matrices, computing the inverse may not even be practical, as the inverse is usually dense.

If one needs to compute the inverse, how should one do it? We will consider the cost of different methods, measured by the number of elementary arithmetic operations (addition, subtraction, division, multiplication) required. Using (1), the cost is that of computing one determinant of order $n$ and $n^2$ determinants of order $n-1$ . Since computing a $k\times k$ determinant costs at least $O(k^3)$ operations by standard methods, this approach costs at least $O(n^5)$ operations, which is prohibitively expensive unless $n$ is very small. Instead one can compute an LU factorization with pivoting and then solve the systems $Ax_i = e_i$ for the columns $x_i$ of $A^{-1}$ , at a total cost of $2n^3 + O(n^2)$ operations.

Equation (2) does not give a good method for computing $A^{-1}$ , because computing the coefficients $c_i$ and evaluating a matrix polynomial are both expensive.

It is possible to exploit fast matrix multiplication methods, which compute the product of two $n\times n$ matrices in $O(n^\alpha)$ operations for some $\alpha < 3$ . By using a block LU factorization recursively, one can reduce matrix inversion to matrix multiplication. If we use Strassen’s fast matrix multiplication method, which has $\alpha = \log_2 7 \approx 2.807$ , then we can compute $A^{-1}$ in $O(n^{2.807})$ operations.

Slash Notation

MATLAB uses the backslash and forward slash for “matrix division”, with the meanings $A \backslash B = A^{-1}B$ and $A / B = AB^{-1}$ . Note that because matrix multiplication is not commutative, $A \backslash B \ne A / B$ , in general. We have $A\backslash I = I/A = A^{-1}$ and $I\backslash A = A/I = A$ . In MATLAB, the inverse can be compute with inv(A), which uses LU factorization with pivoting.

Rectangular Matrices

If $A$ is $m\times n$ then the equation $AX = I_m$ requires $X$ to be $n\times m$ , as does $XA = I_n$ . Rank considerations show that at most one of these equations can hold if $m\ne n$ . For example, if $A = a^*$ is a nonzero row vector, then $AX = 1$ for $X = a/a^*a$ , but $XA = aa^*/a^*a\ne I$ . This is an example of a generalized inverse.

An Interesting Inverse

Here is a triangular matrix with an interesting inverse. This example is adapted from the LINPACK Users’ Guide, which has the matrix, with “LINPACK” replacing “INVERSE” on the front cover and the inverse on the back cover.

$\notag \left[\begin{array}{ccccccc} I & N & V & E & R & S & E\\ 0 & N & V & E & R & S & E\\ 0 & 0 & V & E & R & S & E\\ 0 & 0 & 0 & E & R & S & E\\ 0 & 0 & 0 & 0 & R & S & E\\ 0 & 0 & 0 & 0 & 0 & S & E\\ 0 & 0 & 0 & 0 & 0 & 0 & E \end{array}\right]^{-1} = \left[\begin{array}{*{7}{r@{\hspace{4pt}}}} 1/I & -1/I & 0 & 0 & 0 & 0 & 0\\ 0 & 1/N & -1/N & 0 & 0 & 0 & 0\\ 0 & 0 & 1/V & -1/V & 0 & 0 & 0\\ 0 & 0 & 0 & 1/E & -1/E & 0 & 0\\ 0 & 0 & 0 & 0 & 1/R & -1/R & 0\\ 0 & 0 & 0 & 0 & 0 & 1/S & -1/S\\ 0 & 0 & 0 & 0 & 0 & 0 & 1/E \end{array}\right].$

What Is A\A?

In a recent blog post What is $A\backslash A$ ?, Cleve Moler asked what the MATLAB operation $A \backslash A$ returns. I will summarize what backslash does in general, for $A \backslash B$ and then consider the case $B = A$ .

$A \backslash B$ is a solution, in some appropriate sense, of the equation

$\notag AX = B, \quad A \in\mathbb{C}^{m\times n} \quad X \in\mathbb{C}^{n\times p} \quad B \in\mathbb{C}^{m\times p}. \qquad (1)$

It suffices to consider the case $p = 1$ , because backslash treats the columns independently, and we write this as

$\notag Ax = b, \quad A \in\mathbb{C}^{m\times n} \quad x \in\mathbb{C}^{n} \quad b \in\mathbb{C}^{m}.$

The MATLAB backslash operator handles several cases depending on the relative sizes of the row and column dimensions of $A$ and whether it is rank deficient.

Square Matrix: $m = n$

When $A$ is square, backslash returns $x = A^{-1}b$ , computed by LU factorization with partial pivoting (and of course without forming $A^{-1}$ ). There is no special treatment for singular matrices, so for them division by zero may occur and the output may contain NaNs (in practice, what happens will usually depend on the rounding errors). For example:

>> A = [1 0; 0 0], b = [1 0]', x = A\b
A =
     1     0
     0     0
b =
     1
     0
Warning: Matrix is singular to working precision. 

x =
     1
   NaN

Backslash take advantage of various kinds of structure in $A$ ; see MATLAB Guide (section 9.3) or doc mldivide in MATLAB.

Overdetermined System: $m > n$

An overdetermined system has no solutions, in general. Backslash yields a least squares (LS) solution, which is unique if $A$ has full rank. If $A$ is rank-deficient then there are infinitely many LS solutions, and backslash returns a basic solution: one with at most $\mathrm{rank}(A)$ nonzeros. Such a solution is not, in general, unique.

Underdetermined System: $m < n$

An underdetermined system has fewer equations than unknowns, so either there is no solution of there are infinitely many. In the latter case $A\backslash b$ produces a basic solution and in the former case a basic LS solution. Example:

>> A = [1 1 1; 1 1 0]; b = [3 2]'; x = A\b
x =
   2.0000e+00
            0
   1.0000e+00

Another basic solution is $[0~2~1]^T$ , and the minimum $2$ -norm solution is $[1~1~1]^T$ .

A\A

Now we turn to the special case $A\backslash A$ , which in terms of equation (1) is a solution to $AX = A$ . If $A = 0$ then $X = I$ is not a basic solution, so $A\backslash A \ne I$ ; in fact, $0\backslash 0 = 0$ if $m\ne n$ and it is matrix of NaNs if $m = n$ .

For an underdetermined system with full-rank $A$ , $A\backslash A$ is not necessarily the identity matrix:

>> A = [1 0 1; 0 1 0], X = A\A
A =
     1     0     1
     0     1     0
X =
     1     0     1
     0     1     0
     0     0     0

But for an overdetermined system with full-rank $A$ , $A\backslash A$ is the identity matrix:

>> A'\A'
ans =
   1.0000e+00            0
  -1.9185e-17   1.0000e+00

Minimum Frobenius Norm Solution

The MATLAB definition of $A\backslash b$ is a pragmatic one, as it computes a solution or LS solution to $Ax = b$ in the most efficient way, using LU factorization ( $m = n$ ) or QR factorization $(m\ne n$ ). Often, one wants the solution of minimum $2$ -norm, which can be expressed as $A^+b$ , where $A^+$ is the pseudoinverse of $A$ . In MATLAB, $A^+b$ can be computed by lsqminnorm(A,b) or pinv(A)*b, the former expression being preferred as it avoids the unnecessary computation of $A^+$ and it uses a complete orthogonal factorization instead of an SVD.

When the right-hand side is a matrix, $B$ , lsqminnorm(A,B) and pinv(A)*B give the solution of minimal Frobenius norm, which we write as $A \backslash\backslash B$ . Then $A\backslash\backslash A = A^+A$ , which is the orthogonal projector onto $\mathrm{range}(A^*)$ , and it is equal to the identity matrix when $m\ge n$ and $A$ has full rank. For the matrix above:

>> A = [1 0 1; 0 1 0], X = lsqminnorm(A,A)
A =
     1     0     1
     0     1     0
X =
   5.0000e-01            0   5.0000e-01
            0   1.0000e+00            0
   5.0000e-01            0   5.0000e-01

What Is the Jordan Canonical Form?

How close can similarity transformations take a matrix towards diagonal form? The answer is given by the Jordan canonical form, which achieves the largest possible number of off-diagonal zero entries (Brualdi, Pei, and Zhan, 2008).

Theorem (Jordan canonical form). Any matrix $A\in\mathbb{C}^{n\times n}$ can be expressed as

$\notag \begin{aligned} A &= ZJZ^{-1}, \quad J = \mathrm{diag}(J_1, J_2, \dots, J_p), \\ J_k &= J_k(\lambda_k) = \begin{bmatrix} \lambda_k & 1 & & \\ & \lambda_k & \ddots & \\ & & \ddots & 1 \\ & & & \lambda_k \end{bmatrix} \in \mathbb{C}^{m_k\times m_k}, \label{Jk} \end{aligned}$

where $Z$ is nonsingular and $m_1 + m_2 + \cdots + m_p = n$ . The matrix $J$ is unique up to the ordering of the blocks $J_k$ .

The matrix $J$ is (up to reordering of the diagonal blocks) the Jordan canonical form of $A$ (or the Jordan form, for short).

The bidiagonal matrices $J_k$ are called Jordan blocks. Clearly, the eigenvalues of $J_k$ are $\lambda_k$ repeated $m_k$ times and $J_k$ has a single eigenvector, $e_1\in\mathbb{R}^{m_k}$ . Two different Jordan blocks can have the same eigenvalues.

In total, $J$ has $p$ linearly independent eigenvectors, and the same is true of $A$ .

The Jordan canonical form is an invaluable tool in matrix analysis, as it provides a concrete way to prove and understand many results. However, the Jordan form can not be reliably computed in finite precision arithmetic, so it is of little use computationally, except in special cases such as when $A$ is Hermitian or normal.

For a Jordan block $J_k = J_k(\lambda_k)\in\mathbb{C}^{m_k\times m_k}$ we have

$\notag \begin{aligned} J_k - \lambda_k I &= \begin{bmatrix} 0 & 1 & & \\ & 0 & \ddots & \\ & & \ddots & 1 \\ & & & 0 \end{bmatrix}, \quad (J_k - \lambda_k I)^2 = \begin{bmatrix} 0 & 0 & 1 & & \\ & 0 & 0 & \ddots & \\ & & \ddots & \ddots& 1 \\ & & & \ddots & 0 \\ & & & & 0 \end{bmatrix},\\ \dots,\quad (J_k - \lambda_k I)^{m_k-1} &= \begin{bmatrix} 0 & 0 & \dots & 1 \\ & 0 & \dots & 0 \\ & & \ddots & \vdots \\ & & & 0 \end{bmatrix}, \quad (J_k - \lambda_k I)^{m_k} = 0. \qquad (*)\notag \end{aligned}$

The superdiagonal of ones moves up to the right with each increase in the index of the power, until it disappears off the top corner of the matrix.

It is easy to see that $(A - \lambda I_n)^{j} = Z(J - \lambda I_n)^{j} Z^{-1} = Z\mathrm{diag}\bigl((J_k(\lambda_k) - \lambda I_{m_k})^{j}\bigr) Z^{-1}$ , and so

$\mathrm{rank}( (A - \lambda I_n)^{j} ) = \sum_{k = 1}^p\mathrm{rank}\bigl( (J_k(\lambda_k) - \lambda I_{m_k})^{j} \bigr).$

For $\lambda = \lambda_k$ , these quantities provide information about the size of the Jordan blocks associated with $\lambda_k$ . To be specific, let

$\notag d_j = \mathrm{rank}( (A - \lambda_kI_n)^{j}), \quad j\ge 1, \quad \quad d_0 = n$

and

$\notag \omega_j = d_{j-1} - d_j, \quad j \ge 1.$

By considering the equations $(*)$ above, it can be shown that $\omega_j$ is the number of Jordan blocks of size at least $j$ in which $\lambda_k$ appears. Moreover, the number of Jordan blocks of size $j$ is $\omega_j - \omega_{j+1} = d_{j-1} - 2d_j + d_{j+1}$ . Therefore if we know the eigenvalues and the ranks of $(A - \lambda_k I_n)^j$ for each eigenvalue $\lambda_k$ and appropriate $j$ then we can determine the Jordan structure. As an important special case, if $\mathrm{rank}(A - \lambda_k I_n) = n-1$ then we know that $\lambda_k$ appears in a single Jordan block. The sequence of $\omega_j$ is known as the Weyr characteristic, and it satisfies $\omega_1 \ge \omega_2 \ge \cdots$ .

As an example of a matrix for which we can easily deduce the Jordan form consider the nilpotent matrix $B = \bigl[\begin{smallmatrix} 0_r & I_r\\0_r & 0_r \end{smallmatrix}\bigr]$ , for which $B^2 = 0$ and all the eigenvalues are zero. Since $\mathrm{rank}(B) = r$ , we have $d_0 = 2r$ , $d_1 = r$ , and $d_2 = 0$ . Hence $\omega_1 = 0$ and $\omega_2 = r$ , so there are $r$ $2\times 2$ Jordan blocks. (In fact, $A$ can be permuted into Jordan form by a similarity transformation.)

Here is an example with $A$ the $11\times 11$ matrix anymatrix('core/collatz',11).

$\notag A = \left[\begin{array}{ccccccccccc} 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 1 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 1 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 1 & 0 & 0 & 0\\ 0 & 0 & 1 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 1\\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 1 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0\\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \end{array}\right]$

We have $\mathrm{rank}(A) = 10$ and $\mathrm{rank}(A-2I) = 10$ , so $0$ and $2$ are simple eigenvalues. All the other eigenvalues are $1$ and they have the following $d_j$ and $\omega_j$ values:

$\notag \begin{array}{cccc} j & d_j &\omega_j &\omega_j - \omega_{j+1}\\\hline 0 & 11 & & \\ 1 & 7 & 4 & 2 \\ 2 & 5 & 2 & 1 \\ 3 & 4 & 1 & 0 \\ 4 & 3 & 1 & 0 \\ 5 & 2 & 1 & 1 \\ 6 & 2 & 0 & 0 \\ \end{array}$

We conclude that the eigenvalue $1$ occurs in one block of order $5$ , one block of order $2$ , and two blocks of order $1$ .

A matrix and its transpose have the same Jordan form. One way to see this is to note that $A = ZJZ^{-1}$ implies

$A^T = Z^{-T}J^TZ^T = Z^{-T}P \cdot PJ^TP \cdot PZ^T = (ZP)^{-T}J \,(ZP)^T,$

where $P$ is the identity matrix with the its columns reversed. A consequence is that $A$ and $A^T$ are similar.

Real Jordan Form

A version of the Jordan form with $Z$ and $J$ real exists for $A\in\mathbb{R}^{n\times n}$ . The main change is how complex eigenvalues are represented. Since the eigenvalues now occur in complex conjugate pairs $\lambda$ and $\overline{\lambda}$ , and each of the pair has the same Jordan structure (which follows from the fact that a matrix and its complex conjugate have the same rank), pairs of Jordan blocks corresponding to $\lambda$ and $\overline{\lambda}$ are combined into a real block of twice the size. For example, Jordan blocks

$\notag \begin{bmatrix} \lambda & 1 \\ 0 & \lambda \end{bmatrix}, \; \begin{bmatrix}\,\overline{\lambda} & 1 \\ 0 & \overline{\lambda} \end{bmatrix} \in\mathbb{C}^{2 \times 2}$

become

$\notag \left[\begin{array}{@{\mkern3mu}rr|rr@{\mkern7mu}} a & b & 1 & 0 \\ -b & a & 0 & 1 \\\hline 0 & 0 & a & b \\ 0 & 0 &-b & a \end{array}\right] \in\mathbb{R}^{4 \times 4}$

in the real Jordan form, where $\lambda = a + \mathrm{i} b$ . Note that the eigenvalues of $\bigl[\begin{smallmatrix} a & b \\ -b & a \end{smallmatrix}\bigr]$ are $a \pm \mathrm{i} b$ .

Notes

Proofs of the Jordan canonical form and its real variant can be found in many textbooks. See also Brualdi (1987) and Fletcher and Sorensen (1983), who give proofs that go via the Schur decomposition.

References

Richard A. Brualdi. The Jordan Canonical Form: An Old Proof. Amer. Math. Monthly, 94(3):257–267,1987.
Richard A. Brualdi, Pei Pei, and Xingzhi Zhan. An Extremal Sparsity Property of the Jordan Canonical Form. Linear Algebra Appl., 429(10):2367–2372, 2008.
R. Fletcher and D. C. Sorensen. An Algorithmic Derivation of the Jordan Canonical Form. Amer. Math. Monthly, 90(1):12–16, 1983.

This article is part of the “What Is” series, available from https://nhigham.com/index-of-what-is-articles/ and in PDF form from the GitHub repository https://github.com/higham/what-is.

What Is the Second Difference Matrix?

The second difference matrix is the tridiagonal matrix $T_n$ with diagonal elements $2$ and sub- and superdiagonal elements $-1$ :

$\notag T_n = \left[ \begin{array}{@{}*{4}{r@{\mskip10mu}}r} 2 & -1 & & & \\ -1 & 2 & -1 & & \\[-5pt] & -1 & 2 & \ddots & \\ & & \ddots & \ddots & -1 \\ & & & -1 & 2 \end{array}\right] \in\mathbb{R}^{n\times n}.$

It arises when a second derivative is approximated by the central second difference $f''(x) \approx (f(x+h) -2 f(x) + f(x-h))/h^2$ . (Accordingly, the second difference matrix is sometimes defined as $-T_n$ .) In MATLAB, $T_n$ can be generated by gallery('tridiag',n), which is returned as a aparse matrix.

This is Gil Strang’s favorite matrix. The photo, from his home page, shows a birthday cake representation of the matrix.

The second difference matrix is symmetric positive definite. The easiest way to see this is to define the full rank rectangular matrix

$\notag L_n = \begin{bmatrix} 1 & & & \\ -1 & 1 & & \\ & -1 & \ddots & \\ & & \ddots & 1 \\ & & & -1 \end{bmatrix} \in\mathbb{R}^{(n+1)\times n}$

and note that $T_n = L_n^T L_n$ . The factorization corresponds to representing a central difference as the product of a forward difference and a backward difference. Other properties of the second difference matrix are that it is diagonally dominant, a Toeplitz matrix, and an $M$ -matrix.

Cholesky Factorization

In an LU factorization $A = LU$ the pivots $u_{ii}$ are $2$ , $3/2$ , $4/3$ , …, $(n+1)/n$ . Hence the pivots form a decreasing sequence tending to 1 as $n\to\infty$ . The diagonal of the Cholesky factor contains the square roots of the pivots. This means that in the Cholesky factorization $A = R^*R$ with $R$ upper bidiagonal, $r_{nn} \to 1$ and $r_{n,n-1}\to -1$ as $n\to\infty$ .

Determinant, Inverse, Condition Number

Since the determinant is the product of the pivots, $\det(T_n) = n+1$ .

The inverse of $T_n$ is full, with $(i,j)$ element $i(n-j+1)/(n+1)$ for $j\ge i$ . For example,

$\notag T_5^{-1} = \displaystyle\frac{1}{6} \begin{bmatrix} 5 & 4 & 3 & 2 & 1\\ 4 & 8 & 6 & 4 & 2\\ 3 & 6 & 9 & 6 & 3\\ 2 & 4 & 6 & 8 & 4\\ 1 & 2 & 3 & 4 & 5 \end{bmatrix}.$

The $2$ -norm condition number satisfies $\kappa_2(T_n) \sim 4n^2/\pi^2$ (as follows from the formula (1) below for the eigenvalues).

Eigenvalues and Eigenvectors

The eigenvalues of $T_n$ are

$\notag \mu_k = 4 \sin^2\Bigl(\displaystyle\frac{k \phi}{2} \Bigr), \quad k = 1:n, \qquad (1)$

where $\phi = \pi/(n+1)$ , with corresponding eigenvector

$\notag v_k = \begin{bmatrix} \sin(k\phi), &\sin(2k\phi), &\dots, &\sin(nk\phi) \end{bmatrix}^T.$

The matrix $Q$ with

$\notag q_{ij} = \Bigl(\displaystyle\frac{2}{n+1}\Bigr)^{1/2} \sin\Bigl( \frac{ij\pi}{n+1} \Bigr)$

is therefore an eigenvector matrix for $T_n$ : $Q^*AQ = \mathrm{diag}(\mu_k)$ .

Variations

Various modifications of the second difference matrix arise and similar results can be derived. For example, consider the matrix obtained by changing the $(n,n)$ element to $1$ :

$\notag \widetilde{T}_n = \left[ \begin{array}{@{}*{4}{r@{\mskip10mu}}r} 2 & -1 & & & \\ -1 & 2 & -1 & & \\[-5pt] & -1 & 2 & \ddots & \\ & & \ddots & \ddots & -1 \\ & & & -1 & 1 \end{array}\right] \in\mathbb{R}^{n\times n}.$

It can be shown that $\widetilde{T}_n^{-1}$ has $(i,j)$ element $\min(i,j)$ and eigenvalues $4\cos( j \pi/(2n+1))^2$ , $j=1:n$ .

If we perturb the $(1,1)$ of $\widetilde{T}_n$ to $1$ , we obtain a singular matrix, but suppose we perturb the $(1,1)$ element to $3$ :

$\notag \widehat{T}_n = \left[ \begin{array}{@{}*{4}{r@{\mskip10mu}}r} 3 & -1 & & & \\ -1 & 2 & -1 & & \\[-5pt] & -1 & 2 & \ddots & \\ & & \ddots & \ddots & -1 \\ & & & -1 & 1 \end{array}\right] \in\mathbb{R}^{n\times n}.$

The inverse is $\widehat{T}_n^{-1} = G/2$ , where $G$ with $(i,j)$ element $2\min(i,j)-1$ is a matrix of Givens, and the eigenvalues are $4\cos((2j-1)\pi/(4n))^2$ , $j=1:n$ .

Notes

The factorization $T_n = L_n^TL_n$ is noted by Strang (2012).

For a derivation of the eigenvalues and eigenvectors see Todd (1977, p. 155 ff.). For the eigenvalues of $\widetilde{T}_n$ see Fortiana and Cuadras (1997). Givens’s matrix is described by Newman and Todd (1958) and Todd (1977).

References

This is a minimal set of references, which contain further useful references within.

J. Fortiana and C. N. Cuadras, A Family of Matrices, the Discretized Brownian Bridge, and Distance-Based Regression, Linear Algebra Appl. 264, 173–188, 1997.
Morris Newman and John Todd, The Evaluation of Matrix Inversion Programs, J. Soc. Indust. Appl. Math. 6(4), 466–476, 1958.
Gilbert Strang, Essays in Linear Algebra, Wellesley-Cambridge Press, Wellesley, MA, USA, 2012. Chapter A.3: “My Favorite Matrix”.
John Todd, Basic Numerical Mathematics, Vol. $2$ : Numerical Algebra, Birkhäuser, Basel, and Academic Press, New York, 1977.

What Is the Frank Matrix?

The $n\times n$ upper Hessenberg matrix

$\notag F_n = \left[\begin{array}{*{6}c} n & n-1 & n-2 & \dots & 2 & 1 \\ n-1 & n-1 & n-2 & \dots & 2 & 1 \\ 0 & n-2 & n-2 & \dots & 2 & 1 \\[-3pt] \vdots & 0 & \ddots & \ddots & \vdots & 1 \\[-3pt] \vdots & \vdots & \dots & 2 & 2 & 1 \\ 0 & 0 & \dots & 0 & 1 & 1 \\ \end{array}\right]$

was introduced by Frank in 1958 as a test matrix for eigensolvers.

Determinant

Taking the Laplace expansion about the first column, we obtain $\det(F_n) = n\det(F_{n-1}) - (n-1)\det(F_{n-1}) = \det(F_{n-1})$ , and since $\det(F_1) = 1$ we have $\det(F_n) = 1$ .

In MATLAB, the computed determinant of the matrix and its transpose can be far from the exact values of $1$ :

>> n = 20; F = anymatrix('gallery/frank',n); 
>> [det(F), det(F')]
ans =
   1.0000e+00  -1.4320e+01
>> n = 49; F = anymatrix('gallery/frank',n); det(F)
ans =
    -1.406934439401568e+45

This behavior illustrates the sensitivity of the determinant rather than numerical instability in the evaluation and it is very dependent on the arithmetic (different results may be obtained in different releases of MATLAB). The sensitivity can be seen by noting that

$\notag \det\bigl(F_n + \epsilon e_1e_n^T\bigr) = 1 + (-1)^{n-1} (n-1)!\epsilon, \qquad (1)$

which means that changing the $(1,n)$ element from 1 to $1+\epsilon$ changes the determinant by $(n-1)!\epsilon$ .

Inverse and Condition Number

It is easily verified that

$\notag F_n = \begin{bmatrix} 1 & -1 & & & \\ & 1 & -1 & & \\ & & 1 & \ddots & \\ & & & \ddots & -1\\ & & & & 1 \end{bmatrix}^{-1} \begin{bmatrix} 1 & & & & \\ n-1 & 1 & & & \\ & n-2& 1 & & \\ & & \ddots& \ddots & \\ & & & 1 & 1 \end{bmatrix} \equiv U^{-1}L. \qquad (2)$

Hence $F_n^{-1} = L^{-1}U$ is lower Hessenberg with integer entries. This factorization provides another way to see that $\det(F_n) = 1$ .

From (1) we see that $F_n + \epsilon e_1e_n^T$ is singular for $\epsilon = (-1)^n/(n-1)!$ , which implies that

$\notag \kappa(F_n) \ge (n-1)! \|F_n\|$

for any subordinate matrix norm, showing that the condition number grows very rapidly with $n$ . In fact, this lower bound is within a factor $3.5$ of the condition number for the $1$ -, $2$ -, and $\infty$ -norms for $n\le 20$ .

Eigenvalues

Denote by $p_n(t) = \det(F_n - tI)$ the characteristic polynomial of $F_n$ . By expanding about the first column one can show that

$\label{Fnpn-rec} p_n(t) = (1-t)p_{n-1}(t) - (n-1)t p_{n-2}(t). \qquad (3)$

Using (3), one can show by induction that

$\notag p_n(t^{-1}) = (-1)^n t^{-n} p_n(t).$

This means that the eigenvalues of $F_n$ occur in reciprocal pairs, and hence that $\lambda = 1$ is an eigenvalue when $n$ is odd. It also follows that $p_n$ is palindromic when $n$ is even and anti-palindromic when $n$ is odd. Examples:

>> charpoly( anymatrix('gallery/frank',6) )
ans =
     1   -21   120  -215   120   -21     1
>> charpoly( anymatrix('gallery/frank',7) )
ans =
     1   -28   231  -665   665  -231    28    -1

Since $F_n$ has nonzero subdiagonal entries, $\mathrm{rank}(F_n - t I) \ge n-1$ for any $t$ , and hence $F_n$ is nonderogatory, that is, no eigenvalue appears in more than one Jordan block in the Jordan canonical form. It can be shown that the eigenvalues are real and positive and that they can be expressed in terms of the zeros of Hermite polynomials. Furthermore, the eigenvalues are distinct.

If each eigenvalue of an $n\times n$ matrix undergoes a small perturbation then the determinant, being the product of the eigenvalues, undergoes a perturbation up to about $n$ times as large. From (1) we see that a change to $F_n$ of order $\epsilon$ can perturb $\det(F_n)$ by $(n-1)!\epsilon$ , and it follows that some eigenvalues must undergo a large relative perturbation. The condition number of a simple eigenvalue is defined as the reciprocal of the cosine of the angle between its left and right eigenvectors. For the Frank matrix it is the small eigenvalues that are ill conditioned, as shown here for $n = 9$ .

>> n = 9; F = anymatrix('gallery/frank',n);
>> [V,D,cond_evals] = condeig(F); evals = diag(D);
>> [~,k] = sort(evals,'descend');
>> [evals(k) cond_evals(k)]
ans =
   2.2320e+01   1.9916e+00
   1.2193e+01   2.3903e+00
   6.1507e+00   1.5268e+00
   2.6729e+00   3.6210e+00
   1.0000e+00   6.8615e+01
   3.7412e-01   1.5996e+03
   1.6258e-01   1.1907e+04
   8.2016e-02   2.5134e+04
   4.4803e-02   1.4762e+04

Notes

Frank found that $F_n$ “gives our selected procedures difficulties” and that “accuracy was lost in the smaller roots”. The difficulties encountered by Frank’s codes were shown by Wilkinson to be caused by the inherent sensitivity of the eigenvalues to perturbations in the matrix.

References

This is a minimal set of references, which contain further useful references within.

Werner Frank, Computing Eigenvalues of Complex Matrices by Determinant Evaluation and by Methods of Danilewski and Wielandt, J. Soc. Indust. Appl. Math. 6(4), 378–392, 1958.
Efruz Özlem Mersin and Mustafa Bahçsi, Sturm Theorem for the Generalized Frank Matrix, Hacettepe Journal of Mathematics and Statistics 50, 1002–1011, 2021.
J. H. Wilkinson, Error Analysis of Floating-Point Computation, Numer. Math. 2(1), 319–340, 1960.
J. H. Wilkinson, The Algebraic Eigenvalue Problem, Oxford University Press, 1965, Section 5.45.

What Is the Logarithmic Norm?

The logarithmic norm of a matrix $A\in\mathbb{C}^{n\times n}$ (also called the logarithmic derivative) is defined by

$\notag \mu(A) = \displaystyle\lim_{\epsilon \to 0+} \frac{ \|I + \epsilon A\| - 1}{\epsilon},$

where the norm is assumed to satisfy $\|I\| = 1$ .

Note that the limit is taken from above. If we take the limit from below then we obtain a generally different quantity: writing $\delta = -\epsilon$ ,

$\notag \displaystyle\lim_{\epsilon \to 0-} \frac{ \|I + \epsilon A\| - 1}{\epsilon} = \lim_{\delta \to 0+} \frac{ \|I - \delta A\| - 1}{-\delta} = -\mu(-A).$

The logarithmic norm is not a matrix norm; indeed it can be negative: $\mu(-I) = -1$ .

The logarithmic norm can also be expressed in terms of the matrix exponential.

Lemma 1. For $A\in\mathbb{C}^{n\times n}$ ,

$\notag \mu(A) = \displaystyle\lim_{\epsilon\to 0+} \frac{\log\, \| \mathrm{e}^{\epsilon A}\|} {\epsilon} = \lim_{\epsilon\to 0+} \frac{\| \mathrm{e}^{\epsilon A}\| - 1} {\epsilon}.$

Properties

We give some basic properties of the logarithmic norm.

It is easy to see that

$\notag -\|A\| \le \mu(A) \le \|A\|. \qquad (1)$

For $z\in\mathbb{C}$ , we define $\mathrm{sign}(z) = z/|z|$ for $z\ne 0$ and $\mathrm{sign}(0) = 0$ .

Lemma 2. For $A,B\in\mathbb{C}^{n\times n}$ and $\lambda\in\mathbb{C}$ ,

$\mu(\lambda A) = |\lambda| \mu\bigl(\mathrm{sign}(\lambda)A\bigr)$ ,

$\mu(A + \lambda I) = \mu(A) + \mathrm{Re}\,\lambda$ ,

$\mu(A + B ) \le \mu(A) + \mu(B)$ ,

$|\mu(A) - \mu(B)| \le \|A - B\|$ .

The next result shows the crucial property that $\mu(A)$ features in an easily evaluated bound for the norm of $\mathrm{e}^{At}$ and that, moreover, $\mu(A)$ is the smallest constant that can appear in this bound.

Theorem 3. For $A\in\mathbb{C}^{n\times n}$ and a consistent matrix norm,

$\notag \|\mathrm{e}^{At}\| \le \mathrm{e}^{\mu(A)t}, \quad t\ge0. \qquad (2)$

Moreover,

$\notag \mu(A) = \min\{\, \theta\in\mathbb{R}: \|\mathrm{e}^{At}\| \le \mathrm{e}^{\theta t} ~\mathrm{for~all}~t\ge 0 \,\}.$

Proof. Given any $\delta > 0$ , by Lemma 1 there exists $h > 0$ such that

$\notag \displaystyle\frac{ \| \mathrm{e}^{At}\| - 1}{t} - \mu(A) < \delta, \quad t\in[0,h],$

or

$\notag \| \mathrm{e}^{At}\| \le 1 + (\mu(A) + \delta)t \le \mathrm{e}^{(\mu(A) + \delta)t}, \quad t\in[0,h]$

(since $\mathrm{e}^x \ge 1+x$ for all $x$ ). Then for any integer $k$ , $\|\mathrm{e}^{Atk}\| = \|( \mathrm{e}^{At})^k\| \le \| \mathrm{e}^{At}\|^k \le \mathrm{e}^{(\mu(A) + \delta)tk}$ , and hence $\|\mathrm{e}^{At}\| \le \mathrm{e}^{(\mu(A) + \delta)t}$ holds for all $t\in[0,\infty)$ . Since $\delta$ is arbitrary, it follows that $\|\mathrm{e}^{At}\| \le \mathrm{e}^{\mu(A)t}$ .

Finally, if $\| \mathrm{e}^{At}\| \le \mathrm{e}^{\theta t}$ for all $t\ge 0$ then $(\| \mathrm{e}^{At}\| -1)/t\le (\mathrm{e}^{\theta t}-1)/t$ for all $t\ge0$ and taking $\lim_{t\to 0+}$ we conclude that $\mu(A) \le (d/dt) \mathrm{e}^{\theta t}\mid_{t = 0} \,= \theta$ .

The logarithmic norm was introduced by Dahlquist (1958) and Lozinskii (1958) in order to obtain error bounds for numerical methods for solving differential equations. The bound (2) can alternatively be proved by using differential inequalities (see Söderlind (2006)). Not only is (2) sharper than $\|\mathrm{e}^{At}\| \le \mathrm{e}^{\|A\|t}$ , but $\mathrm{e}^{\|A\|t}$ is increasing in $t$ while $\mathrm{e}^{\mu(A)t}$ potentially decays, since $\mu(A) < 0$ is possible.

The spectral abscissa is defined by

$\notag \alpha(A) = \max \{\, \mathrm{Re} \lambda : \lambda \in \Lambda(A)\,\},$

where $\Lambda(A)$ denotes the spectrum of $A$ (the set of eigenvalues). Whereas the norm bounds the spectral radius ( $\rho(A) \le \|A\|$ ), the logarithmic norm bounds the spectral abscissa, as shown by the next result.

Theorem 4. For $A\in\mathbb{C}^{n\times n}$ and a consistent matrix norm,

$\notag -\mu(-A) \le \alpha(A) \le \mu(A).$

Combining (1) with (2) gives

$\notag -\|A\| \le -\mu(-A) \le \alpha(A) \le \mu(A) \le \|A\|.$

In view of Lemma 1, the logarithmic norm $\mu(A)$ can be expressed as the one-sided derivative of $\|\mathrm{e}^{tA}\|$ at $t = 0$ , so $\mu(A)$ determines the initial rate of change of $\|\mathrm{e}^{tA}\|$ (as well as providing the bound $\mathrm{e}^{\mu(A)t}$ for all $t$ ). The long-term behavior is determined by the spectral abscissa $\alpha(A)$ , since $\|\mathrm{e}^{tA}\| \to 0$ as $t\to\infty$ if and only if $\alpha(A) < 0$ , which can be proved using the Jordan canonical form.

The next result provides a bound on the norm of the inverse of a matrix in terms of the logarithmic norm.

Theorem 5. For a nonsingular matrix $A\in\mathbb{C}^{n\times n}$ and a subordinate matrix norm, if $\mu(A) < 0$ or $\mu(-A) < 0$ then

$\notag \|A^{-1}\| \le \displaystyle\frac{1}{\max\bigl(-\mu(A),-\mu(-A)\bigr)}. \qquad (3)$

Formulas for Logarithmic Norms

Explicit formulas are available for the logarithmic norm s corresponding to the $1$ , $2$ , and $\infty$ -norms.

Theorem 6. For $A\in\mathbb{C}^{n\times n}$ ,

$\notag \begin{aligned} \mu_1(A) &= \max_j \biggl( \sum_{i\ne j} |a_{ij}| + \mathrm{Re}\, a_{jj} \biggr ),\\ \mu_{\infty}(A) &= \max_i \biggl( \sum_{j\ne i} |a_{ij}| + \mathrm{Re}\, a_{ii} \biggr ),\\ \mu_2(A) &= \lambda_{\max}\Bigl( \frac{A+A^*}{2} \Bigr), \qquad (4) \end{aligned}$

where $\lambda_{\max}$ denotes the largest eigenvalue of a Hermitian matrix.

Proof. We have

$\notag \begin{aligned} \mu_{\infty}(A) &= \lim_{\epsilon \to 0+} \frac{ \|I + \epsilon A\|_{\infty} - 1}{\epsilon}\\ &= \lim_{\epsilon \to 0+} \frac{ \max_i \bigl(\sum_{j\ne i} |\epsilon a_{ij}| + |1 + \epsilon a_{ii}| \bigr) -1}{\epsilon}\\ &= \lim_{\epsilon \to 0+} \frac{ \max_i \bigl(\sum_{j\ne i} |\epsilon a_{ij}| + \sqrt{(1 + \epsilon\mathrm{Re}\, a_{ii})^2 + (\epsilon\mathrm{Im}\,a_{ii})^2}\, \bigr) -1}{\epsilon}\\ &= \lim_{\epsilon \to 0+} \frac{ \max_i \bigl( \sum_{j\ne i} |\epsilon a_{ij}| + \epsilon\mathrm{Re}\, a_{ii} + O(\epsilon^2) \bigr)}{\epsilon}\\ &= \max_i \biggl( \sum_{j\ne i} |a_{ij}| + \mathrm{Re}\, a_{ii} \biggr ). \end{aligned}$

The formula for $\mu_1(A)$ follows, since $\|A\|_1 = \|A^*\|_\infty$ implies $\mu_1(A) = \mu_{\infty}(A^*)$ . For the $2$ -norm, using $\|A\|_2 = \rho(A^*A)^{1/2}$ , we have

$\notag \begin{aligned} \mu_2(A) &= \lim_{\epsilon \to 0+} \frac{ \|I + \epsilon A\|_2 - 1}{\epsilon}\\ &= \lim_{\epsilon \to 0+} \frac{ \rho\bigl( I + \epsilon(A+A^*) + \epsilon^2 A^*A\bigr)^{1/2} -1} {\epsilon}\\ &= \lim_{\epsilon \to 0+} \frac{ 1 + \frac{1}{2}\epsilon\lambda_{\max}(A+A^*) + O(\epsilon^2) -1} {\epsilon}\\ &= \lambda_{\max}\Bigl( \frac{A+A^*}{2} \Bigr). \end{aligned}$

As a special case of (4), if $A$ is normal, so that $A = QDQ^*$ with $Q$ unitary and $D = \mathrm{diag}(\lambda_i)$ , then $\mu_2(A) = \max_i (\lambda_i + \overline{\lambda_i})/2 = \max_i \mathrm{Re} \lambda_i = \alpha(A)$ .

We can make some observations about $\mu_\infty(A)$ for $A\in\mathbb{R}^{n\times n}$ .

If $A\ge 0$ then $\mu_\infty(A) = \|A\|_\infty$ .
$\mu_\infty(A) < 0$ if and only if $a_{ii} < 0$ for all $i$ and $A$ is strictly row diagonally dominant.
For the $\infty$ -norm the bound (3) is the same as a bound based on diagonal dominance except that it is applicable only when the diagonal is one-signed.

For a numerical example, consider the $n\times n$ tridiagonal matrix anymatrix('gallery/lesp'), which has the form illustrated for $n = 6$ by

$\notag A_6 = \left[\begin{array}{cccccc} -5 & 2 & 0 & 0 & 0 & 0\\ \frac{1}{2} & -7 & 3 & 0 & 0 & 0\\ 0 & \frac{1}{3} & -9 & 4 & 0 & 0\\ 0 & 0 & \frac{1}{4} & -11 & 5 & 0\\ 0 & 0 & 0 & \frac{1}{5} & -13 & 6\\ 0 & 0 & 0 & 0 & \frac{1}{6} & -15 \end{array}\right].$

We find that $\alpha(A_6) = -4.55$ and $\mu_2(A_6) = -4.24$ , and it is easy to see that $\mu_1(A_n) = -4.5$ and $\mu_{\infty}(A_n) = -3$ for all $n$ . Therefore Theorem 4 shows that $\mathrm{e}^{At}\to 0$ as $t\to \infty$ and $\mu_1$ gives a faster decaying bound than $\mu_2$ and $\mu_\infty$ .

Now consider the subordinate matrix norm $\|\cdot\|_G$ based on the vector norm $\|x\|_G = (x^*Gx)^{1/2}$ , where $G$ is a Hermitian positive definite matrix. The corresponding logarithmic norm $\mu_G$ can be expressed as the largest eigenvalue of a Hermitian definite generalized eigenvalue problem.

Theorem 7. For $A\in\mathbb{C}^{n\times n}$ ,

$\notag \mu_G(A) = \max\{\, \lambda: \det(GA + A^*G - 2\lambda G) = 0 \,\}.$

Theorem 7 allows us to make a connection with the theory of ordinary differential equations (ODEs)

$\notag y' = f(t,y), \quad f: \mathbb{R} \times \mathbb{R}^n\to \mathbb{R}^n. \qquad (5)$

Let $G\in\mathbb{R}^{n\times n}$ be symmetric positive definite and consider the inner product $\langle x, y \rangle = x^*Gy$ and the corresponding norm defined by $\|x\|_G^2 = \langle x, x \rangle = (x^*Gx)^{1/2}$ . It can be shown that for $A\in\mathbb{R}^{n\times }$ ,

$\notag \mu_G(A) = \max_x \displaystyle\frac{\langle Ax, x\rangle}{\|x\|_G^2}. \qquad (6)$

The function $f$ satisfies a one-sided Lipschitz condition if there is a function $v(t)$ such that

$\langle f(t,y) - f(t,z), y - z \rangle \le v(t) \|y-z\|^2$

for all $y,z$ in some region and all $a\le t \le b$ . For the linear differential equation with $f(t,y) = A(t)y$ in (5), using (6) we obtain

$\langle f(t,y) - f(t,z), y - z \rangle = \langle A(t)(y-z), y - z \rangle \le \mu_G(A(t)) \|y-z\|_G^2,$

and so the logarithmic norm $\mu_G(A(t))$ can be taken as a one-sided Lipschitz constant. This observation leads to results on contractivity of ODEs; see Lambert (1991) for details.

Notes

For more on the use of the logarithmic norm in differential equations see Coppel (1965) and Söderlind (2006).

The proof of Theorem 3 is from Hinrichsen and Pritchard (2005).

References

This is a minimal set of references, which contain further useful references within.

W. A. Coppel, Stability and Asymptotic Behavior of Differential Equations}. D. C. Heath and Company, Boston, MA. USA, 1965.
Germund Dahlquist. Stability and Error Bounds in the Numerical Integration of Ordinary Differential Equations. PhD thesis, Royal Inst. of Technology, Stockholm, Sweden, 1958.
Diederich Hinrichsen and Anthony J. Pritchard. Mathematical Systems Theory I. Modelling, State Space Analysis, Stability and Robustness. Springer-Verlag, Berlin, Germany, 2005.
J. D. Lambert. Numerical Methods for Ordinary Differential Systems. The Initial Value Problem. John Wiley, Chichester, 1991.
Gustaf Söderlind, The Logarithmic Norm. History and Modern Theory. BIT, 46(3):631–652, 2006.
Torsten Ström. On Logarithmic Norms. SIAM J. Numer. Anal., 12(5):741–753, 1975.

Notes

References

Share this:

Notes

References

Related Blog Posts

Share this:

References

Related Blog Posts

Share this:

Notes

References

Related Blog Posts

Share this:

Connections with the Determinant

Conditions for Nonsingularity

The Inverse as a Matrix Polynomial

To Compute or Not to Compute the Inverse

Slash Notation

Rectangular Matrices

An Interesting Inverse

Related Blog Posts

Share this:

Square Matrix:

Overdetermined System:

Underdetermined System:

A\A

Minimum Frobenius Norm Solution

Related Blog Posts

Share this:

Real Jordan Form

Notes

References

Share this:

Cholesky Factorization

Determinant, Inverse, Condition Number

Eigenvalues and Eigenvectors

Variations

Notes

References

Related Blog Posts

Share this:

Determinant

Inverse and Condition Number

Eigenvalues

Notes

References

Related Blog Posts

Share this:

Properties

Formulas for Logarithmic Norms

Notes

References

Related Blog Posts

Share this:

Square Matrix: $m = n$

Overdetermined System: $m > n$

Underdetermined System: $m < n$