# The Ten Most-Viewed Posts of 2022

According to the WordPress statistics, this blog received over 192,000 visitors and 281,000 views in 2022. These are the ten most-viewed posts published during the year.

Eight of the posts are from the What Is series, which now contains 83 articles; more will follow in 2023.

# What Is a Stochastic Matrix?

A stochastic matrix is an $n\times n$ matrix with nonnegative entries and unit row sums. If $A\in\mathbb{R}^{n\times n}$ is stochastic then $Ae = e$, where $e = [1,1,\dots,1]^T$ is the vector of ones. This means that $e$ is an eigenvector of $A$ corresponding to the eigenvalue $1$.

The identity matrix is stochastic, as is any permutation matrix. Here are some other examples of stochastic matrices:

\notag \begin{aligned} A_n &= n^{-1}ee^T, \quad \textrm{in particular}~~ A_3 = \begin{bmatrix} \frac{1}{3} & \frac{1}{3} & \frac{1}{3}\\[3pt] \frac{1}{3} & \frac{1}{3} & \frac{1}{3}\\[3pt] \frac{1}{3} & \frac{1}{3} & \frac{1}{3} \end{bmatrix}, \qquad (1)\\ B_n &= \frac{1}{n-1}(ee^T -I), \quad \textrm{in particular}~~ B_3 = \begin{bmatrix} 0 & \frac{1}{2} & \frac{1}{2}\\[2pt] \frac{1}{2} & 0 & \frac{1}{2}\\[2pt] \frac{1}{2} & \frac{1}{2} & 0 \end{bmatrix}, \qquad (2)\\ C_n &= \begin{bmatrix} 1 & & & \\ \frac{1}{2} & \frac{1}{2} & & \\ \vdots & \vdots &\ddots & \\ \frac{1}{n} & \frac{1}{n} &\cdots & \frac{1}{n} \end{bmatrix}. \qquad (3)\\ \end{aligned}

For any matrix $A$, the spectral radius $\rho(A) = \max\{ |\lambda| : \lambda \mathrm{~is~an~eigenvalue~of~} A \}$ is bounded by $\rho(A) \le \|A\|$ for any norm. For a stochastic matrix, taking the $\infty$-norm (the maximum row sum of absolute values) gives $\rho(A) \le 1$, so since we know that $1$ is an eigenvalue, $\rho(A) = 1$. It can be shown that $1$ is a semisimple eigenvalue, that is, if there are $k$ eigenvalues equal to $1$ then there are $k$ linearly independent eigenvectors corresponding to $1$ (Meyer, 2000, p. 696).

A lower bound on the spectral radius can be obtained from Gershgorin’s theorem. The $i$th Gershgorin disc is defined by $|\lambda - a_{ii}| \le \sum_{j\ne i} |a_{ij}| = 1 - a_{ii}$, which implies $|\lambda| \ge 2a_{ii}-1$. Every eigenvalue $\lambda$ lies in the union of the $n$ discs and so must satisfy

$\notag 2\min_i a_{ii}-1 \le |\lambda| \le 1.$

The lower bound is positive if $A$ is strictly diagonally dominant by rows.

If $A$ and $B$ are stochastic then $AB$ is nonnegative and $ABe = Ae = e$, so $AB$ is stochastic. In particular, any positive integer power of $A$ is stochastic. Does $A^k$ converge as $k\to\infty$? The answer is that it does, and the limit is stochastic, as long as $1$ is the only eigenvalue of modulus $1$, and this will be the case if all the elements of $A$ are positive (by Perron’s theorem). The simplest example of non-convergence is the stochastic matrix

$\notag A = \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix},$

which has eigenvalues $1$ and $-1$. Since $A^2 = I$, all even powers are equal to $I$ and all odd powers are equal to $A$. For the matrix (1), $A_n^k = A_n$ for all $k$, while for (2), $B_n^k \to A_n$ as $k\to\infty$. For (3), $C_n^k$ converges to the matrix with $1$ in every entry of the first column and zeros everywhere else.

Stochastic matrices arise in Markov chains. A transition matrix for a finite homogeneous Markov chain is a matrix whose $(i,j)$ element is the probability of moving from state $i$ to state $j$ over a time step. It has nonnegative entries and the rows sum to $1$, so it is a stochastic matrix. In applications including finance and healthcare, a transition matrix may be estimated for a certain time period, say one year, but a transition matrix for a shorter period, say one month, may be needed. If $A$ is a transition matrix for a time period $p$ then a stochastic $p$th root of $A$, which is a stochastic solution $X$ of the equation $X^p = A$, is a transition matrix for a time period a factor $p$ smaller. Therefore $p = 12$ (years to months) and $p = 7$ (weeks to days) are among the values of interest. Unfortunately, a stochastic $p$th root may not exist. For example, the matrix

$\notag \begin{bmatrix} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 0 & 0 & 1 \end{bmatrix}$

has no $p$th roots at all, let alone stochastic ones. Yet many stochastic matrices do have stochastic roots. The matrix (3) has a stochastic $p$th root for all $p$, as shown by Higham and Lin (2011), who give a detailed analysis of $p$th roots of stochastic matrices. For example, to four decimal places,

$\notag C_6^{1/7} = \begin{bmatrix} 1.000 & & & & & \\ 0.094 & 0.906 & & & & \\ 0.043 & 0.102 & 0.855 & & & \\ 0.027 & 0.050 & 0.103 & 0.820 & & \\ 0.019 & 0.032 & 0.052 & 0.103 & 0.795 & \\ 0.014 & 0.023 & 0.034 & 0.053 & 0.102 & 0.774 \\ \end{bmatrix}.$

A stochastic matrix is sometime called a row-stochastic matrix to distinguish it from a column-stochastic matrix, which is a nonnegative matrix for which $e^TA = e^T$ (so that $A^T$ is row-stochastic). A matrix that is both row-stochastic and column-stochastic is called doubly stochastic. A permutation matrix is an example of a doubly stochastic matrix. If $U$ is a unitary matrix then the matrix with $a_{ij} = |u_{ij}|^2$ is doubly stochastic. A magic square scaled by the magic sum is also doubly stochastic. For example,

>> M = magic(4), A = M/sum(M(1,:)) % A is doubly stochastic.
M =
16     2     3    13
5    11    10     8
9     7     6    12
4    14    15     1
A =
4.7059e-01   5.8824e-02   8.8235e-02   3.8235e-01
1.4706e-01   3.2353e-01   2.9412e-01   2.3529e-01
2.6471e-01   2.0588e-01   1.7647e-01   3.5294e-01
1.1765e-01   4.1176e-01   4.4118e-01   2.9412e-02
>> [sum(A) sum(A')]
ans =
1     1     1     1     1     1     1     1
>> eig(A)'
ans =
1.0000e+00   2.6307e-01  -2.6307e-01   8.5146e-18


# What Is the Inertia of a Matrix?

The inertia of a real symmetric $n\times n$ matrix $A$ is a triple, written $\mathrm{In}(A) = (i_+(A),i_-(A),i_0(A))$, where $i_+(A)$ is the number of positive eigenvalues of $A$, $i_-(A)$ is the number of negative eigenvalues of $A$, and $i_0(A)$ is the number of zero eigenvalues of $A$.

The rank of $A$ is $i_+(A) + i_-(A)$. The difference $i_+(A) - i_-(A)$ is called the signature.

In general it is not possible to determine the inertia by inspection, but some deductions can be made. If $A$ has both positive and negative diagonal elements then $i_+(A) > 1$ and $i_-(A) > 1$. But in general the diagonal elements do not tell us much about the inertia. For example, here is a matrix that has positive diagonal elements but only one positive eigenvalue (and this example works for any $n$):

>> n = 4; A = -eye(n) + 2*ones(n), eigA = eig(sym(A))'
A =
1     2     2     2
2     1     2     2
2     2     1     2
2     2     2     1
eigA =
[-1, -1, -1, 7]


A congruence transformation of a symmetric matrix $A$ is a transformation $A \to X^T\!AX$ for a nonsingular matrix $X$. The result is clearly symmetric. Sylvester’s law of inertia (1852) says that the inertia is preserved under congruence transformations.

Theorem 1 (Sylvester’s law of inertia).

If $A\in\mathbb{R}^{n\times n}$ is symmetric and $X\in\mathbb{R}^{n\times n}$ is nonsingular then $\mathrm{In}(A) = \mathrm{In}(X^T\!AX)$.

Sylvester’s law gives a way to determine the inertia without computing eigenvalues: find a congruence transformation that transforms $A$ to a matrix whose inertia can be easily determined. A factorization $PAP^T = LDL^T$ does the job, where $P$ is a permutation matrix, $L$ is unit lower triangular, and $D$ is diagonal Then $\mathrm{In}(A) = \mathrm{In}(D)$, and $\mathrm{In}(D)$ can be read off the diagonal of $D$. This factorization does not always exist, and if it does exist is can be numerically unstable. A block $LDL^T$ factorization, in which $D$ is block diagonal with diagonal blocks of size $1$ or $2$, always exists, and its computation is numerically stable with a suitable pivoting strategy such as symmetric rook pivoting.

For the matrix above we can compute a block $LDL^T$ factorization using the MATLAB ldl function:

>> [L,D,P] = ldl(A); D
D =
1.0000e+00   2.0000e+00            0            0
2.0000e+00   1.0000e+00            0            0
0            0  -1.6667e+00            0
0            0            0  -1.4000e+00


Since the leading 2-by-2 block of $D$ has negative determinant and hence one positive eigenvalue and one negative eigenvalue, it follows that $A$ has one positive eigenvalue and three negative eigenvalues.

A congruence transformation preserves the signs of the eigenvalues but not their magnitude. A result of Ostrowski (1959) bounds the ratios of the eigenvalues of the original and transformed matrices. Let the eigenvalues of a symmetric matrix be ordered $\lambda_n \le \cdots \le \lambda_1$.

Theorem (Ostrowski).

For a symmetric $A\in \mathbb{R}^{n\times n}$ and $X\in\mathbb{R}^{n\times n}$,

$\lambda_k(X^*AX) = \theta_k \lambda_k(A), \quad k=1\colon n,$

where $\lambda_n(X^*X) \le \theta_k \le \lambda_1(X^*X)$.

The theorem shows that the further $X$ is from being orthogonal the greater the potential change in the eigenvalues.

Finally, we note that everything here generalizes to complex Hermitian matrices by replacing transpose by conjugate transpose.