# What Is an LU Factorization?

An LU factorization of an $n\times n$ matrix $A$ is a factorization $A = LU$, where $L$ is unit lower triangular and $U$ is upper triangular. “Unit” means that $L$ has ones on the diagonal. Example:

$\notag \left[\begin{array}{rrrr} 3 & -1 & 1 & 1\\ -1 & 3 & 1 & -1\\ -1 & -1 & 3 & 1\\ 1 & 1 & 1 & 3 \end{array}\right] = \left[\begin{array}{rrrr} 1 & 0 & 0 & 0\\ -\frac{1}{3} & 1 & 0 & 0\\ -\frac{1}{3} & -\frac{1}{2} & 1 & 0\\ \frac{1}{3} & \frac{1}{2} & 0 & 1 \end{array}\right] \left[\begin{array}{rrrr} 3 & -1 & 1 & 1\\ 0 & \frac{8}{3} & \frac{4}{3} & -\frac{2}{3}\\ 0 & 0 & 4 & 1\\ 0 & 0 & 0 & 3 \end{array}\right]. \qquad (1)$

An LU factorization simplifies the solution of many problems associated with linear systems. In particular, solving a linear system $Ax = b$ reduces to solving the triangular systems $Ly = b$ and $Ux = y$, since then $b = L(Ux)$.

For a given $A$, an LU factorization may or may not exist, and if it does exist it may not be unique. Conditions for existence and uniqueness are given in the following result (see Higham, 2002, Thm. 9.1 for a proof). Denote by $A_k = A(1\colon k,1\colon k)$ the leading principal submatrix of $A$ of order $k$.

Theorem 1. The matrix $A\in\mathbb{R}^{n\times n}$ has a unique LU factorization if and only if $A_k$ is nonsingular for $k=1\colon n-1$. If $A_k$ is singular for some $1\le k \le n-1$ then the factorization may exist, but if so it is not unique.

Note that the (non)singularity of $A$ plays no role in Theorem 1. However, if $A$ is nonsingular and has an LU factorization then the factorization is unique. Indeed if $A$ has LU factorizations $A = L_1U_1 = L_2U_2$ then the $U_i$ are necessarily nonsingular and so $L_2^{-1}L_1 = U_2U_1^{-1}$. The left side of this equation is unit lower triangular and the right side is upper triangular; therefore both sides must equal the identity matrix, which means that $L_1 = L_2$ and $U_1 = U_2$, as required.

Equating leading principal submatrices in $A = LU$ gives $A_k = L_k U_k$, which implies that $\det(A_k) = \det(U_k) = u_{11} u_{22} \dots u_{kk}$. Hence $u_{kk} = \det(A_k)/\det(A_{k-1})$. In fact, such determinantal formulas hold for all the elements of $L$ and $U$:

\notag \begin{aligned} \ell_{ij} &= \frac{ \det\bigl( A( [1:j-1, \, i], 1:j ) \bigr) }{ \det( A_j ) }, \quad i > j, \\ u_{ij} &= \frac{ \det\bigl( A( 1:i, [1:i-1, \, j] ) \bigr) } { \det( A_{i-1} ) }, \quad i \le j. \end{aligned}

Here, $A(u,v)$, where $u$ and $v$ are vectors of subscripts, denotes the submatrix formed from the intersection of the rows indexed by $u$ and the columns indexed by $v$.

## Relation with Gaussian Elimination

LU factorization is intimately connected with Gaussian elimination. Recall that Gaussian elimination transforms a matrix $A^{(1)} = A\in\mathbb{R}^{n\times n}$ to upper triangular form $U = A^{(n)}$ in $n-1$ stages. At the $k$th stage, multiples of row $k$ are added to the later rows to eliminate the elements below the diagonal in column $k$, using the formulas

$\notag a_{ij}^{(k+1)} = a_{ij}^{(k)} - m_{ik} a_{kj}^{(k)}, \quad i = k+1 \colon n, \; j = k+1 \colon n,$

where the quantities $m_{ik} = a_{ik}^{(k)} / a_{kk}^{(k)}$ are the multipliers. Of course each $a_{kk}^{(k)}$ must be nonzero for these formulas to be defined, and this is connected with the conditions of Theorem 1, since $u_{kk} = a_{kk}^{(k)}$. The final $U$ is the upper triangular LU factor, with $u_{ij} = a_{ij}^{(i)}$ for $j\ge i$, and $\ell_{ij} = m_{ij}$ for $i > j$, that is, the multipliers make up the $L$ factor (for a proof of these properties see any textbook on numerical linear algebra).

The matrix factorization viewpoint is well established as a powerful paradigm for thinking and computing. Separating the computation of LU factorization from its application is beneficial. For example, given $A = LU$ we saw above how to solve $Ax = b$. If we need to solve for another right-hand side $b_2$ we can just solve $Ly_2 = b_2$ and $Ux_2 = y_2$, re-using the LU factorization. Similarly, solving $A^Tz = c$ reduces to solving the triangular systems $U^T w = c$ and $L^Tz = w$.

## Computation

An LU factorization can be computed by directly solving for the components of $L$ and $U$ in the equation $A = LU$. Indeed because $L$ has unit diagonal the first row of $U$ is the same as the first row of $A$, and $a_{k1} = \ell_{k1} u_{11} = \ell_{k1} a_{11}$ then determines the first column of $L$. One can go on to determine the $k$th row of $U$ and the $k$th row of $L$, for $k = 2\colon n$. This leads to the Doolittle method, which involves inner products of partial rows of $L$ and partial columns of $U$.

Given the equivalence between LU factorization and Gaussian elimination we can also employ the Gaussian elimination equations:

$\notag \begin{array}{l} \%~kji~\mathrm{form~of~LU~factorization.}\\ \mbox{for}~k=1:n-1 \\ \qquad \mbox{for}~ j=k+1:n \\ \qquad \qquad \mbox{for}~ i=k+1:n \\ \qquad\qquad\qquad a_{ij}^{(k+1)} = a_{ij}^{(k)} - a_{ik}^{(k)}a_{kj}^{(k)} / a_{kk}^{(k)}\\ \qquad\qquad\mbox{end}\\ \qquad\mbox{end}\\ \mbox{end}\\ \end{array}$

This $kji$ ordering of the loops in the factorization is the basis of early Fortran implementations of LU factorization, such as that in LINPACK. The inner loop travels down the columns of $A$, accessing contiguous elements of $A$ since Fortran stores arrays by column. Interchanging the two inner loops gives the $kij$ ordering, which updates the matrix a row at a time, and is appropriate for a language such as C that stores arrays by row.

The $ijk$ and $jik$ orderings correspond to the Doolittle method. The last two of the $3! = 6$ orderings are the $ikj$ and $jki$ orderings, to which we will return later.

## Schur Complements

For $A\in\mathbb{R}^{n\times n}$ with $\alpha = a_{11} \ne 0$ we can write

$\notag A = \begin{bmatrix} \alpha & b^T \\ c & D \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ c/\alpha & I_{n-1} \end{bmatrix} \begin{bmatrix} \alpha & b^T \\ 0 & D - cb^T/\alpha \end{bmatrix} = : L_1U_1. \qquad (2)$

The $(n-1)\times (n-1)$ matrix $S = D - cb^T/\alpha$ is called the Schur complement of $\alpha$ in $A$.

The first row and column of $L_1$ and $U_1$ have the correct forms for a unit lower triangular matrix and an upper triangular matrix, respectively. If we can find an LU factorization $S = L_2U_2$ then

$\notag A = \begin{bmatrix} 1 & 0 \\ c/\alpha & L_2 \end{bmatrix} \begin{bmatrix} \alpha & b^T \\ 0 & U_2 \end{bmatrix}$

is an LU factorization of $A$. Note that this is simply another way to express the $kji$ algorithm above.

For several matrix structures it is immediate that $\alpha \ne 0$. If we can show that the Schur complement inherits the same structure then it follows by induction that we can compute the factorization for $S$, and so an LU factorization of $A$ exists. Classes of matrix for which $a_{11} \ne 0$ and $A$ being in the class implies the Schur complement $S$ is also in the class include

• symmetric positive definite matrices,
• $M$-matrices,
• matrices (block) diagonally dominant by rows or columns.

(The proofs of these properties are nontrivial.) Note that the matrix (1) is row diagonally dominant, as is its $U$ factor, as must be the case since its rows are contained in Schur complements.

We say that $A$ has upper bandwidth $q$ if $a_{ij} = 0$ for $j>i+q$ and lower bandwidth $p$ if $a_{ij} = 0$ for $i>j+p$. Another use of (2) is to show that $L$ and $U$ inherit the bandwidths of $A$.

Theorem 2. Let $A\in\mathbb{R}^{n\times n}$ have lower bandwidth $p$ and upper bandwidth $q$. If $A$ has an LU factorization then $L$ has lower bandwidth $p$ and $U$ has upper bandwidth $q$.

Proof. In (2), the first column of $L_1$ and the first row of $U_1$ have the required structure and $S$ has upper bandwidth $q$ and lower bandwidth $p$, since $c$ and $b$ have only $p$ and $q$ nonzero components, respectively. The result follows by induction.

## Block Implementations

In order to achieve high performance on modern computers with their hierarchical memories, LU factorization is implemented in a block form expressed in terms of matrix multiplication and the solution of multiple right-hand side triangular systems. We describe two block forms of LU factorization. First, consider a block form of (2) with block size $p$, where $A_{11}$ is $p \times p$:

$\notag A = \begin{bmatrix} A_{11} & A_{12}\\ A_{21} & A_{22} \end{bmatrix} = \begin{bmatrix} L_{11} & 0 \\ L_{21}& I_{n-p} \end{bmatrix} \begin{bmatrix} U_{11} & U_{12} \\ 0 & S \end{bmatrix}.$

Here, $S$ is the Schur complement of $A_{11}$ in $A$, given by $S = A_{22} - A_{21}A_{11}^{-1}A_{12}$. This leads to the following algorithm:

1. Factor $A_{11} = L_{11}U_{11}$.
2. Solve $L_{11}U_{12} = A_{12}$ for $U_{12}$.
3. Solve $L_{21}U_{11} = A_{21}$ for $L_{21}$.
4. Form $S = A_{22}-L_{21}U_{12}$.
5. Repeat steps 1–4 on $S$ to obtain $S = L_{22}U_{22}$.

The factorization on step 1 can be done by any form of LU factorization. This algorithm is known as a right-looking algorithm, since it accesses data to the right of the block being worked on (in particular, at each stage lines 2 and 4 access the last few columns of the matrix).

An alternative algorithm can derived by considering a block $3\times 3$ partitioning, in which we assume we have already computed the first block column of $L$ and $U$:

$\notag A = \begin{bmatrix} A_{11} & A_{12} & A_{13}\\ A_{21} & A_{22} & A_{23}\\ A_{31} & A_{32} & A_{33} \end{bmatrix} = \begin{bmatrix} L_{11} & 0 & 0 \\ L_{21} & L_{22}& 0 \\ L_{31} & L_{32} & I \end{bmatrix} \begin{bmatrix} U_{11} & U_{12} & \times \\ 0 & U_{22} & \times \\ 0 & 0 & \times \end{bmatrix}.$

We now compute the middle block column of $L$ and $U$, comprising $p$ columns, by

1. Solve $L_{11}U_{12} = A_{12}$ for $U_{12}$.
2. Factor $A_{22}-L_{21}U_{12} = L_{22}U_{22}$.
3. Solve $L_{32}U_{22} = A_{32} - L_{31}U_{12}$ for $L_{32}$.
4. Repartition so that the first two block columns become a single block column and repeat steps 1–4.

This algorithm corresponds to the $jki$ ordering. Note that the Schur complement is updated only a block column at a time. Because the algorithm accesses data only to the left of the block column being worked on, it is known as a left-looking algorithm.

Our description of these block algorithms emphasizes the mathematical ideas. The implementation details, especially for the left-looking algorithm, are not trivial. The optimal choice of block size $p$ will depend on the machine, but $p$ is typically in the range $64$$512$.

An important point is that all these different forms of LU factorization, no matter which $ijk$ ordering or which value of $p$, carry out the same operations. The only difference is the order in which the operations are performed (and the order in which the data is accessed). Even the rounding errors are the same for all versions (assuming the use of “plain vanilla” floating-point arithmetic).

## Rectangular Matrices

Although it is most commonly used for square matrices, LU factorization is defined for rectangular matrices, too. If $A\in\mathbb{R}^{m\times n}$ then the factorization has the form $A = LU$ with $L\in\mathbb{R}^{m\times m}$ lower triangular and $U\in\mathbb{R}^{m\times n}$ upper trapezoidal. The conditions for existence and uniqueness of an LU factorization of $A$ are the same as those for $A(1\colon p, 1\colon p)$, where $p = \min(m,n)$.

## Block LU Factorization

Another form of LU factorization relaxes the structure of $L$ and $U$ from triangular to block triangular, with $L$ having identity matrices on the diagonal:

$\notag L = \begin{bmatrix} I & & & \\ L_{21} & I & & \\ \vdots & & \ddots & \\ L_{m1} & \dots & L_{m,m-1} & I \end{bmatrix}, \quad U = \begin{bmatrix} U_{11} & U_{12} & \dots & U_{1m} \\ & U_{22} & & \vdots \\ & & \ddots & U_{m-1,m} \\ & & & U_{mm} \end{bmatrix}.$

Note that $U$ is not, in general, upper triangular.

An example of a block LU factorization is

$\notag A = \left[ \begin{array}{rr|rr} 0 & 1 & 1 & 1 \\ -1 & 1 & 1 & 1 \\\hline -2 & 3 & 4 & 2 \\ -1 & 2 & 1 & 3 \\ \end{array} \right] = \left[ \begin{array}{cc|cc} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\\hline 1 & 2 & 1 & 0 \\ 1 & 1 & 0 & 1 \\ \end{array} \right] \left[ \begin{array}{rr|rr} 0 & 1 & 1 & 1 \\ -1 & 1 & 1 & 1 \\\hline 0 & 0 & 1 & -1 \\ 0 & 0 & -1 & 1 \\ \end{array} \right].$

LU factorization fails on $A$ because of the zero $(1,1)$ pivot. This block LU factorization corresponds to using the leading $2\times 2$ principal submatrix of $A$ to eliminate the elements in the $(3\colon 4,1\colon 2)$ submatrix. In the context of a linear system $Ax=b$, we have effectively solved for the variables $x_1$ and $x_2$ in terms of $x_3$ and $x_4$ and then substituted for $x_1$ and $x_2$ in the last two equations.

Conditions for the existence of a block LU factorization are analogous to, but less stringent than, those for LU factorization in Theorem 1.

Theorem 3. The matrix $A\in\mathbb{R}^{n\times n}$ has a unique block LU factorization if and only if the first $m-1$ leading principal block submatrices of $A$ are nonsingular.

The conditions in Theorem 3 can be shown to be satisfied if $A$ is block diagonally dominant by rows or columns.

Note that to solve a linear system $Ax = b$ using a block LU factorization we need to solve $Ly = b$ and $Ux = y$, but the latter system is not triangular and requires the solution of systems $U_{ii}x_i = y_i$ involving the diagonal blocks of $U$, which would normally be done by (standard) LU factorization.

## Sensitivity

If $A$ has a unique LU factorization then for a small enough perturbation $\Delta A$ an LU factorization $A + \Delta A = (L + \Delta L)(U + \Delta U)$ exists. To first order, this equation is $\Delta A = \Delta L U + L \Delta U$, which gives

$\notag L^{-1}\Delta A \mskip2mu U^{-1} = L^{-1}\Delta L + \Delta U \mskip2mu U^{-1}.$

Since $\Delta L$ is strictly lower triangular and $\Delta U$ is upper triangular, we have, to first order,

$\notag \Delta L = L \mskip 1mu \mathrm{tril}( L^{-1}\Delta A U^{-1} ), \quad \Delta U = \mathrm{triu}( L^{-1}\Delta A U^{-1} )U,$

where $\mathrm{tril}$ denotes the strictly lower triangular part and $\mathrm{triu}$ the strictly upper triangular part. Clearly, the sensitivity of the LU factors depends on the inverses of $L$ and $U$. However, in most situations, such as when we are solving a linear system $Ax = b$, it is the backward stability of the LU factors, not their sensitivity, that is relevant.

## Pivoting and Numerical Stability

Since not all matrices have an LU factorization, we need the option of applying row and column interchanges to ensure that the pivots are nonzero unless the column in question is already in triangular form.

In finite precision computation it is important that computed LU factors $\widehat L$ and $\widehat U$ are numerically stable in the sense that $\widehat L \widehat U = A + \Delta A$ with $\|\Delta A\|\le c_n u \|A\|$, where $c_n$ is a constant and $u$ is the unit roundoff. For certain matrix properties, such as diagonal dominance by rows or columns, numerical stability is guaranteed, but in general it is necessary to incorporate row interchanges, or row or column interchanges, in order to obtain a stable factorization.

See What Is the Growth Factor for Gaussian Elimination? for details of pivoting strategies and see Randsvd Matrices with Large Growth Factors for some recent research on growth factors.

## References

This is a minimal set of references, which contain further useful references within.