What Is a Cholesky Factorization?

The Cholesky factorization of a symmetric positive definite matrix A is the factorization A = R^T\!R, where R is upper triangular with positive diagonal elements. It is a generalization of the property that a positive real number has a unique positive square root. The Cholesky factorization always exists and the requirement that the diagonal of R be positive ensures that it is unique.

As an example, the Cholesky factorization of the 4\times 4 matrix with (i,j) element \gcd(i,j) (gallery('gcdmat',4) in MATLAB) is

G_4 = \left[\begin{array}{cccc} 1 & 1 & 1 & 1\\ 1 & 2 & 1 & 2\\ 1 & 1 & 3 & 1\\ 1 & 2 & 1 & 4 \end{array}\right] = \left[\begin{array}{cccc} 1 & 0 & 0 & 0\\ 1 & 1 & 0 & 0\\ 1 & 0 & \sqrt{2} & 0\\ 1 & 1 & 0 & \sqrt{2} \end{array}\right] \left[\begin{array}{cccc} 1 & 1 & 1 & 1\\ 0 & 1 & 0 & 1\\ 0 & 0 & \sqrt{2} & 0\\ 0 & 0 & 0 & \sqrt{2} \end{array}\right].

The Cholesky factorization of an n\times n matrix contains n-1 other Cholesky factorizations within it: A_k = R_k^TR_k, k=1:n-1, where A_k = A(1\colon k,1\colon k) is the leading principal submatrix of order k. For example, for G_4 with k = 2, \bigl[\begin{smallmatrix}1 & 1\\ 1& 2\end{smallmatrix}\bigr]=  \bigl[\begin{smallmatrix}1 & 0\\ 1& 1\end{smallmatrix}\bigr]  \bigl[\begin{smallmatrix}1 & 1\\ 0& 1\end{smallmatrix}\bigr].

Inverting the Cholesky equation gives A^{-1} = R^{-1} R^{-T}, which implies the interesting relation that the (n,n) element of A^{-1} is r_{nn}^{-2}. So G_4^{-1} has (4,4) element 1/2. We also have \det(A) = \det(R)^2 = (r_{11}\dots r_{nn})^2, so \det(G_4) = 4 for this matrix.

The Cholesky factorization is named after André-Louis Cholesky (1875–1918), a French military officer involved in geodesy and surveying in Crete and North Africa, who developed it for solving the normal equations arising in least squares problems.

The existence and uniqueness of the factorization can be proved by induction on n, and the proof also serves as a method for computing it. Partition A\in\mathbb{R}^{n\times n} as

A = \begin{bmatrix}              A_{n-1} & c \\              c^T & \alpha              \end{bmatrix}.

We know that A_{n-1} is positive definite (any principal submatrix of a positive definite matrix is easily shown to be positive definite). Assume that A_{n-1} has a unique Cholesky factorization A_{n-1} = R_{n-1}^TR_{n-1} and define the upper triangular matrix

R = \begin{bmatrix}             R_{n-1} & r \\             0 & \beta            \end{bmatrix}.

Then

R^T\!R = \begin{bmatrix}                R_{n-1}^T R_{n-1} & R_{n-1}^T r \\                    r^T R_{n-1}   & r^Tr + \beta^2              \end{bmatrix},

which equals A if and only if

\begin{aligned}                   R_{n-1}^T r    &= c,        \\                   r^Tr + \beta^2 &= \alpha.     \end{aligned}

The first equation has a unique solution since R_{n-1} is nonsingular. Then the second equation gives \beta^2 = \alpha - r^Tr. It remains to check that there is a unique real, positive \beta satisfying this equation. From the inequality

0 < \det(A) = \det(R^T) \det(R) = \det(R_{n-1})^2 \beta^2

we see that \beta^2>0, hence there is a unique \beta>0. This completes the inductive step.

The quantity \alpha - r^Tr = \alpha - c^T A_{n-1}^{-1}c is the Schur complement of A_{n-1} in A. The essential reason why Cholesky factorization works is that the Schur complements of a positive definite matrix are themselves positive definite. Indeed we have the congruence

\begin{bmatrix} I                  & 0 \\  -c^T A_{n-1}^{-1} & 1 \end{bmatrix} \begin{bmatrix} A_{n-1}          & c \\ c^T               & \alpha \end{bmatrix} \begin{bmatrix} I  & -A_{n-1}^{-1}c\\ 0  & 1 \end{bmatrix}  = \begin{bmatrix} A_{n-1} & 0 \\ 0      & \alpha - c^T A_{n-1}^{-1} c \end{bmatrix},

and since congruences preserve definiteness it follows that \alpha - c^T A_{n-1}^{-1} c > 0.

In textbooks it is common to see element-level equations for computing the Cholesky factorization, which come from directly solving the matrix equation A = R^T\!R. If we equate elements on both sides, taking them a column at a time, starting with a_{11} = r_{11}^2, then the following algorithm is obtained:

\begin{array}{l} 1~ \mbox{for}~j=1:n  \\ 2~      \quad \mbox{for}~  i=1:j-1 \\ 3~     \qquad r_{ij} = \bigl(a_{ij} - \sum_{k=1}^{i-1} r_{ki}r_{kj} \bigr) /r_{ii} \\ 4~     \quad \mbox{end} \\ 5~     \quad r_{jj} = \bigl(a_{jj} - \sum_{k=1}^{j-1} r_{kj}^2\bigr)^{1/2} \\ 6~ \mbox{end} \end{array}

What happens if this algorithm is executed on a general (indefinite) symmetric matrix that is, one that has both positive and negative eigenvalues? Line 5 will either attempt to take the square root of a negative number for some j or it will produce r_{jj} = 0 and on the next iteration of the loop we will have a division by zero. In floating-point arithmetic it is possible for the algorithm to fail for a positive definite matrix, but only if it is numerically singular: the algorithm is guaranteed to run to completion if the condition number \kappa_2(A) = \|A\|_2 \|A^{-1}\|_2 is safely less than u^{-1}, where u is the unit roundoff.

The MATLAB function chol normally returns an error message if the factorization fails. But a second output argument can be requested, which is set to the number of the stage on which the factorization failed, or to zero if the factorization succeeded. In the case of failure, the partially computed R factor, returned in the first argument, can be used to compute a direction of negative curvature for A, which is a vector z such that z^T\!Az < 0. Here is an example. The code

n = 8;
A = gallery('lehmer',n) - 0.3*eye(n); % Indefinite matrix.
[R,p] = chol(A)
z = [-R\(R'\A(1:p-1,p)); 1; zeros(n-p,1)];
neg_curve = z'*A*z

produces the output

R =
   8.3666e-01   5.9761e-01   3.9841e-01
            0   5.8554e-01   7.3193e-01
            0            0   7.4536e-02
p =
     4
neg_curve =
  -9.1437e+00

Cholesky factorization has excellent numerical stability. The computed factor \widehat{R} satisfies

A  + \Delta A = \widehat{R}^T\widehat{R}, \quad       \|\Delta A\|_2 \le c_nu\|A\|_2,

where c_n is a constant. Unlike for LU factorization there is no possibility of element growth; indeed for i\le j,

r_{ij}^2 \le \displaystyle \sum_{k=1}^j r_{kj}^2 = a_{ij},

so the elements of R are nicely bounded relative to those of A.

Once we have a Cholesky factorization we can use it to solve a linear system Ax = b, by solving the lower triangular system R^Ty = b and then the upper triangular system Rx = y. The computed solution \widehat{x} can be shown to satisfy

(A+\Delta A)\widehat{x} = b, \quad \|\Delta A\|_2 \le d_nu\|A\|_2

where d_n is another constant. Thus \widehat{x} has a small backward error.

Finally, what if A is only symmetric positive semidefinite, that is, x^T\!Ax \ge 0 for all x, so that A is possibly singular? There always exists a Cholesky factorization (we can take the R factor in the QR factorization of the positive semidefinite square root of A) but it may not be unique. For example, we have

A = \left[\begin{array}{cccc}  1 & 1 & 1 & 1\\  1 & 1 & 1 & 1\\  1 & 1 & 2 & 2\\  1 & 1 & 2 & 4\\ \end{array}\right] = \left[\begin{array}{cccc} 1 & 0 & 0 & 0\\ 1 & 0 & 0 & 0\\ 1 & 1 & 0 & 0\\ 1 & 1 & x & y\\ \end{array}\right] \left[\begin{array}{cccc} 1 & 1 & 1 & 1\\ 0 & 0 & 1 & 1\\ 0 & 0 & 0 & x\\ 0 & 0 & 0 & y\\ \end{array}\right]  = R^T\!R

for any x and y such that x^2 + y^2 = 2. Note that A has rank 3 but R has two zero diagonal elements. When A is positive semidefinite of rank r what is usually wanted is a Cholesky factorization in which R is zero in its last n-r rows. Such a factorization can be obtained by using complete pivoting, which at each stage permutes the largest remaining diagonal element into the pivot position, which gives a factorization

\Pi^T\mskip-5mu A\Pi = R^T\!R,      \quad      R = \begin{bmatrix}          R_{11} & R_{12} \\               0 & 0          \end{bmatrix}, \quad R_{11}\in\mathbb{R}^{r \times r},

where \Pi is a permutation matrix. For example, with \Pi the identity matrix with its columns in reverse order,

\Pi^T\mskip-5mu A\Pi  = \left[\begin{array}{cccc} 4 & 2 & 1 & 1\\ 2 & 2 & 1 & 1\\ 1 & 1 & 1 & 1\\ 1 & 1 & 1 & 1 \end{array}\right]    = \left[\begin{array}{cccc} 2 & 0 & 0 & 0\\ 1 & 1 & 0 & 0\\ \frac{1}{2} & \frac{1}{2} & \frac{\sqrt{2}}{2} & 0\\[2pt] \frac{1}{2} & \frac{1}{2} & \frac{\sqrt{2}}{2} & 0 \end{array}\right] \left[\begin{array}{cccc} 2 & 1 & \frac{1}{2} & \frac{1}{2}\\[2pt] 0 & 1 & \frac{1}{2} & \frac{1}{2}\\[2pt] 0 & 0 & \frac{\sqrt{2}}{2} & \frac{\sqrt{2}}{2}\\ 0 & 0 & 0 & 0 \end{array}\right],

which clearly displays the rank of A.

References

This is a minimal set of references, which contain further useful references within.

Related Blog Posts

This article is part of the “What Is” series, available from https://nhigham.com/category/what-is and in PDF form from the GitHub repository https://github.com/higham/what-is.

This entry was posted in what-is. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s