The Ten Most-Viewed Posts of 2022


According to the WordPress statistics, this blog received over 192,000 visitors and 281,000 views in 2022. These are the ten most-viewed posts published during the year.

  1. Seven Sins of Numerical Linear Algebra
  2. What Is an Eigenvalue?
  3. The Big Six Matrix Factorizations
  4. What Is a Schur Decomposition?
  5. What Is a Permutation Matrix?
  6. What Is the Logarithmic Norm?
  7. What Is the Jordan Canonical Form?
  8. What Is the Frank Matrix?
  9. What Is a Circulant matrix?
  10. What Is a Toeplitz Matrix?

Eight of the posts are from the What Is series, which now contains 83 articles; more will follow in 2023.

What Is a Stochastic Matrix?

A stochastic matrix is an n\times n matrix with nonnegative entries and unit row sums. If A\in\mathbb{R}^{n\times n} is stochastic then Ae = e, where e = [1,1,\dots,1]^T is the vector of ones. This means that e is an eigenvector of A corresponding to the eigenvalue 1.

The identity matrix is stochastic, as is any permutation matrix. Here are some other examples of stochastic matrices:

\notag \begin{aligned}   A_n &= n^{-1}ee^T, \quad \textrm{in particular}~~     A_3 = \begin{bmatrix}    \frac{1}{3} & \frac{1}{3} & \frac{1}{3}\\[3pt]    \frac{1}{3} & \frac{1}{3} & \frac{1}{3}\\[3pt]    \frac{1}{3} & \frac{1}{3} & \frac{1}{3}    \end{bmatrix}, \qquad (1)\\   B_n &= \frac{1}{n-1}(ee^T -I), \quad \textrm{in particular}~~     B_3 = \begin{bmatrix}              0 & \frac{1}{2} & \frac{1}{2}\\[2pt]    \frac{1}{2} & 0           & \frac{1}{2}\\[2pt]    \frac{1}{2} & \frac{1}{2} & 0    \end{bmatrix},    \qquad (2)\\   C_n &= \begin{bmatrix}             1         &                   &           &    \\      \frac{1}{2}      &  \frac{1}{2}      &           &    \\      \vdots           &  \vdots           &\ddots     &    \\      \frac{1}{n}      &  \frac{1}{n}      &\cdots     &  \frac{1}{n}      \end{bmatrix}.   \qquad (3)\\ \end{aligned}

For any matrix A, the spectral radius \rho(A) = \max\{ |\lambda| : \lambda \mathrm{~is~an~eigenvalue~of~} A \} is bounded by \rho(A) \le \|A\| for any norm. For a stochastic matrix, taking the \infty-norm (the maximum row sum of absolute values) gives \rho(A) \le 1, so since we know that 1 is an eigenvalue, \rho(A) = 1. It can be shown that 1 is a semisimple eigenvalue, that is, if there are k eigenvalues equal to 1 then there are k linearly independent eigenvectors corresponding to 1 (Meyer, 2000, p. 696).

A lower bound on the spectral radius can be obtained from Gershgorin’s theorem. The ith Gershgorin disc is defined by |\lambda - a_{ii}| \le \sum_{j\ne i} |a_{ij}| = 1 - a_{ii}, which implies |\lambda| \ge 2a_{ii}-1. Every eigenvalue \lambda lies in the union of the n discs and so must satisfy

\notag         2\min_i a_{ii}-1 \le |\lambda| \le 1.

The lower bound is positive if A is strictly diagonally dominant by rows.

If A and B are stochastic then AB is nonnegative and ABe = Ae = e, so AB is stochastic. In particular, any positive integer power of A is stochastic. Does A^k converge as k\to\infty? The answer is that it does, and the limit is stochastic, as long as 1 is the only eigenvalue of modulus 1, and this will be the case if all the elements of A are positive (by Perron’s theorem). The simplest example of non-convergence is the stochastic matrix

\notag    A = \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix},

which has eigenvalues 1 and -1. Since A^2 = I, all even powers are equal to I and all odd powers are equal to A. For the matrix (1), A_n^k = A_n for all k, while for (2), B_n^k \to A_n as k\to\infty. For (3), C_n^k converges to the matrix with 1 in every entry of the first column and zeros everywhere else.

Stochastic matrices arise in Markov chains. A transition matrix for a finite homogeneous Markov chain is a matrix whose (i,j) element is the probability of moving from state i to state j over a time step. It has nonnegative entries and the rows sum to 1, so it is a stochastic matrix. In applications including finance and healthcare, a transition matrix may be estimated for a certain time period, say one year, but a transition matrix for a shorter period, say one month, may be needed. If A is a transition matrix for a time period p then a stochastic pth root of A, which is a stochastic solution X of the equation X^p = A, is a transition matrix for a time period a factor p smaller. Therefore p = 12 (years to months) and p = 7 (weeks to days) are among the values of interest. Unfortunately, a stochastic pth root may not exist. For example, the matrix

\notag \begin{bmatrix}    0 & 1 & 0 \\    0 & 0 & 1 \\    0 & 0 & 1 \end{bmatrix}

has no pth roots at all, let alone stochastic ones. Yet many stochastic matrices do have stochastic roots. The matrix (3) has a stochastic pth root for all p, as shown by Higham and Lin (2011), who give a detailed analysis of pth roots of stochastic matrices. For example, to four decimal places,

\notag  C_6^{1/7} = \begin{bmatrix} 1.000  &        &        &        &        &        \\ 0.094  & 0.906  &        &        &        &        \\ 0.043  & 0.102  & 0.855  &        &        &        \\ 0.027  & 0.050  & 0.103  & 0.820  &        &        \\ 0.019  & 0.032  & 0.052  & 0.103  & 0.795  &        \\ 0.014  & 0.023  & 0.034  & 0.053  & 0.102  & 0.774  \\ \end{bmatrix}.

A stochastic matrix is sometime called a row-stochastic matrix to distinguish it from a column-stochastic matrix, which is a nonnegative matrix for which e^TA = e^T (so that A^T is row-stochastic). A matrix that is both row-stochastic and column-stochastic is called doubly stochastic. A permutation matrix is an example of a doubly stochastic matrix. If U is a unitary matrix then the matrix with a_{ij} = |u_{ij}|^2 is doubly stochastic. A magic square scaled by the magic sum is also doubly stochastic. For example,

>> M = magic(4), A = M/sum(M(1,:)) % A is doubly stochastic.
M =
    16     2     3    13
     5    11    10     8
     9     7     6    12
     4    14    15     1
A =
   4.7059e-01   5.8824e-02   8.8235e-02   3.8235e-01
   1.4706e-01   3.2353e-01   2.9412e-01   2.3529e-01
   2.6471e-01   2.0588e-01   1.7647e-01   3.5294e-01
   1.1765e-01   4.1176e-01   4.4118e-01   2.9412e-02
>> [sum(A) sum(A')]
ans =
     1     1     1     1     1     1     1     1
>> eig(A)'
ans =
   1.0000e+00   2.6307e-01  -2.6307e-01   8.5146e-18


  • Nicholas J. Higham and Lijing Lin, On pth Roots of Stochastic Matrices, Linear Algebra Appl. 435, 448–463, 2011.
  • Carl D. Meyer, Matrix Analysis and Applied Linear Algebra, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2000.

Related Blog Posts

What Is the Inertia of a Matrix?

The inertia of a real symmetric n\times n matrix A is a triple, written \mathrm{In}(A) = (i_+(A),i_-(A),i_0(A)), where i_+(A) is the number of positive eigenvalues of A, i_-(A) is the number of negative eigenvalues of A, and i_0(A) is the number of zero eigenvalues of A.

The rank of A is i_+(A) + i_-(A). The difference i_+(A) - i_-(A) is called the signature.

In general it is not possible to determine the inertia by inspection, but some deductions can be made. If A has both positive and negative diagonal elements then i_+(A) > 1 and i_-(A) > 1. But in general the diagonal elements do not tell us much about the inertia. For example, here is a matrix that has positive diagonal elements but only one positive eigenvalue (and this example works for any n):

>> n = 4; A = -eye(n) + 2*ones(n), eigA = eig(sym(A))'
A =
     1     2     2     2
     2     1     2     2
     2     2     1     2
     2     2     2     1
eigA =
[-1, -1, -1, 7]

A congruence transformation of a symmetric matrix A is a transformation A \to X^T\!AX for a nonsingular matrix X. The result is clearly symmetric. Sylvester’s law of inertia (1852) says that the inertia is preserved under congruence transformations.

Theorem 1 (Sylvester’s law of inertia).

If A\in\mathbb{R}^{n\times n} is symmetric and X\in\mathbb{R}^{n\times n} is nonsingular then \mathrm{In}(A) = \mathrm{In}(X^T\!AX).

Sylvester’s law gives a way to determine the inertia without computing eigenvalues: find a congruence transformation that transforms A to a matrix whose inertia can be easily determined. A factorization PAP^T = LDL^T does the job, where P is a permutation matrix, L is unit lower triangular, and D is diagonal Then \mathrm{In}(A) = \mathrm{In}(D), and \mathrm{In}(D) can be read off the diagonal of D. This factorization does not always exist, and if it does exist is can be numerically unstable. A block LDL^T factorization, in which D is block diagonal with diagonal blocks of size 1 or 2, always exists, and its computation is numerically stable with a suitable pivoting strategy such as symmetric rook pivoting.

For the matrix above we can compute a block LDL^T factorization using the MATLAB ldl function:

>> [L,D,P] = ldl(A); D
D =
   1.0000e+00   2.0000e+00            0            0
   2.0000e+00   1.0000e+00            0            0
            0            0  -1.6667e+00            0
            0            0            0  -1.4000e+00

Since the leading 2-by-2 block of D has negative determinant and hence one positive eigenvalue and one negative eigenvalue, it follows that A has one positive eigenvalue and three negative eigenvalues.

A congruence transformation preserves the signs of the eigenvalues but not their magnitude. A result of Ostrowski (1959) bounds the ratios of the eigenvalues of the original and transformed matrices. Let the eigenvalues of a symmetric matrix be ordered \lambda_n \le \cdots \le \lambda_1.

Theorem (Ostrowski).

For a symmetric A\in \mathbb{R}^{n\times n} and X\in\mathbb{R}^{n\times n},

\lambda_k(X^*AX) = \theta_k \lambda_k(A), \quad k=1\colon n,

where \lambda_n(X^*X) \le \theta_k \le \lambda_1(X^*X).

The theorem shows that the further X is from being orthogonal the greater the potential change in the eigenvalues.

Finally, we note that everything here generalizes to complex Hermitian matrices by replacing transpose by conjugate transpose.

Related Blog Posts

This article is part of the “What Is” series, available from and in PDF form from the GitHub repository