Bounds for the Norm of the Inverse of a Triangular Matrix

In many situations we need to estimate or bound the norm of the inverse of a matrix, for example to compute an error bound or to check whether an iterative process is guaranteed to converge. This is the same problem as bounding the condition number \kappa(A) = \|A\| \|A^{-1}\|, assuming \|A\| is easy to compute or estimate. Here, we focus on triangular matrices. The bounds we derive can be applied to a general matrix if an LU or QR factorization is available.

We denote by \|\cdot\| any matrix norm, and we take the consistency condition \|AB\| \le \|A\| \|B\| as one of the defining properties of a matrix norm.

It will be useful to note that

\notag       \left[\begin{array}{crrrr}       1 & -\theta & -\theta & -\theta & -\theta\\         & 1 & -\theta & -\theta & -\theta\\         &   & 1 & -\theta & -\theta\\         &   &   & 1 & -\theta\\         &   &   &   & 1 \end{array}\right]^{-1} =      \left[\begin{array}{ccccc}      1 & \theta & \theta(1+\theta) & \theta(1+\theta)^2 & \theta(1+\theta)^3\\        & 1 & \theta & \theta(1+\theta) & \theta(1+\theta)^2\\        &   & 1 & \theta & \theta(1+\theta)\\        &   &   & 1 & \theta\\        &   &   &   & 1      \end{array}\right]

and that more generally the inverse of the n\times n upper triangular matrix T(\theta) with

\notag   (T(\theta))_{ij} = \begin{cases} 1, & i=j, \\                     -\theta, & i<j, \end{cases} \qquad (1)

is given by

\notag   \bigl(T(\theta)^{-1}\bigr)_{ij} =     \begin{cases} 1, & i=j, \\                 \theta(1+\theta)^{j-i-1}, & j > i. \end{cases} \qquad (2)

Lower Bound

First, we consider a general matrix A\in\mathbb{C}^{n\times n} and let \lambda be an eigenvalue with |\lambda| = \rho(A) (the spectral radius) and x a corresponding eigenvector. With X = xe^T \in\mathbb{C}^{n\times n}, where e is the vector of ones, AX = \lambda X, so

\notag     |\lambda| \|X\| = \|\lambda X\| = \| AX \| \le \|A\| \|X\|,

which implies |\lambda| \le \|A\| since X\ne 0. Hence

\notag           \|A\| \ge \rho(A).

Let T be a triangular matrix. Applying the latter bound to T^{-1}, whose eigenvalues are its diagonal entries t_{ii}^{-1}, gives

\notag       \|T^{-1}\| \ge \displaystyle\frac{1}{\min_i |t_{ii}|}.  \qquad (3)

Combining this bound with the analogous bound for \|T\| gives

\notag       \kappa(T) \ge \displaystyle\frac{\max_i |t_{ii}|}{\min_i |t_{ii}|}. \qquad (4)

We note that commonly used norms satisfy \|A\| \ge \max_{i,j}|a_{ij}|, which yields another proof of (3) and (4).

For any x and y such that y = Tx we have the lower bound \|x\| / \|y\| \le \| T^{-1} \|. We can choose y and then solve the triangular system Tx = y for x to obtain the lower bound. Condition number estimation techniques, which we will describe in another article, provide ways to choose y that usually yield estimates of \| T^{-1} \| correct to within an order of magnitude.

For the 2-norm, we can choose y and then compute x = (T^TT)^{-k}y by repeated triangular solves, obtaining the lower bound (\|x\|_2 / \|y\|_2)^{\frac{1}{2k}} \le \| T^{-1} \|_2. This bound is simply the power method applied to (T^TT)^{-1}.

Upper Bounds

Let T\in\mathbb{C}^{n\times n} be an upper triangular matrix. The upper bounds for \|T^{-1}\| that we will discuss depend only on the absolute values of the elements of T. This limits the ability of the bounds to distinguish between well-conditioned and ill-conditioned matrices. For example, consider

\notag \begin{gathered}      T_1 =       \left[\begin{array}{crrrr} 1 & -2 & -2 & -2 & -2\\         & 1 & -2 & -2 & -2\\         &   & 1 & -2 & -2\\         &   &   & 1 & -2\\         &   &   &   & 1 \end{array}\right], \quad      T_1^{-1} =      \left[\begin{array}{ccccc}      1 & 2 & 6 & 18 & 54\\        & 1 & 2 & 6 & 18\\        &   & 1 & 2 & 6\\        &   &   & 1 & 2\\        &   &   &   & 1      \end{array}\right], \\     T_2 =     \left[\begin{array}{ccccc}     1 & 2 & 2 & 2 & 2\\       & 1 & 2 & 2 & 2\\       &   & 1 & 2 & 2\\       &   &   & 1 & 2\\       &   &   &   & 1     \end{array}\right], \quad   T_2^{-1} =     \left[\begin{array}{crrrr}      1 & -2 & 2 & -2 & 2\\        & 1 & -2 & 2 & -2\\        &   & 1 & -2 & 2\\        &   &   & 1 & -2\\        &   &   &   & 1     \end{array}\right]. \end{gathered}

The bounds for T_1^{-1} and T_2^{-1} will be the same, yet the inverses are of different sizes (the more so as the dimension increases).

Let D = \mathrm{diag}(T) and write

\notag    T = D(I - N),

where N is strictly upper triangular and hence nilpotent with N^n = 0. Then

\notag    T^{-1} = (I + N + N^2 + \cdots + N^{n-1}) D^{-1}.

Taking absolute values and using the triangle inequality gives

\notag   |T^{-1}| \le (I + |N| + |N|^2 + \cdots + |N|^{n-1}) |D|^{-1}, \qquad(5)

where the inequalities hold elementwise.

The comparison matrix M(A) associated with a general A\in\mathbb{C}^{n \times n} is the matrix with

\notag   (M(A))_{ij} =    \begin{cases} |a_{ii}|, & i=j, \\                 -|a_{ij}|, & i\ne j.     \end{cases}

It is not hard to see that M(T) is upper triangular with M(T) = |D| (I - |N|) and so the bound (5) is

\notag   |T^{-1}| \le M(T)^{-1}.

If we replace every element above the diagonal of M(T) by the most negative off-diagonal element in its row we obtain the upper triangular matrix W(T) with

\notag     (W(T))_{ij} = \begin{cases}                     |t_{ii}|, & i=j, \\                             -\max_{i+1\le k\le n}|t_{ik}|, & i<j. \\                 \end{cases}

Then W(T) = |D| (I - |N_1|), where |N| \le |N_1|, so

\notag \begin{aligned}   M(T)^{-1} &= (I + |N| + |N|^2 + \cdots + |N|^{n-1}) |D|^{-1}\\   & \le (I + |N_1| + |N_1|^2 + \cdots + |N_1|^{n-1}) |D|^{-1} = W(T)^{-1}. \end{aligned}

Finally, let Z(T) = \min_i|t_{ii}|(I - |N_2|), where N_2 is strictly upper triangular with every element above the diagonal equal to the maximum element of |N_1|, that is,

\notag      (Z(T))_{ij} = \begin{cases}       \alpha, & i=j, \\                               -\alpha\beta, & i<j, \\                 \end{cases} \qquad     \alpha = \min_i|t_{ii}|, \quad      \beta = \max_{i < j}|t_{ij}|/|t_{ii}|.

Then

\notag \begin{aligned}   W(T)^{-1} &= (I + |N_1| + |N_1|^2 + \cdots + |N_1|^{n-1}) |D|^{-1} \\             &\le \alpha^{-1} (I + |N_2| + |N_2|^2 + \cdots + |N_2|^{n-1}) = Z(T)^{-1}. \end{aligned}

We note that M(T), W(T), and Z(T) are all nonsingular M-matrices. We summarize the bounds.

Theorem 1.

If T\in\mathbb{C}^{n\times n} is a nonsingular upper triangular matrix then

\notag      |T^{-1}|  \le M(T)^{-1}                \le W(T)^{-1}                \le Z(T)^{-1}. \qquad (6)

We make two remarks.

  • The bounds (6) are equally valid for lower triangular matrices as long as the maxima in the definitions of W(T) and Z(T) are taken over columns instead of rows.
  • We could equally well have written A = (I-N)D. The comparison matrix M(T) = (I - |N|)|D| is unchanged, and (6) continues to hold as long as the maxima in the definitions of W(T) and Z(T) are taken over columns rather than rows.

It follows from the theorem that

\notag    \|T^{-1}\|  \le \|M(T)^{-1}\|                \le \|W(T)^{-1}\|                \le \|Z(T)^{-1}\|

for the 1-, 2-, and \infty-norms and the Frobenius norm. Now M(T), W(T), and Z(T) all have nonnegative inverses, and for a matrix A with nonnegative inverse we have \|A^{-1}\|_{\infty} = \|A^{-1}e\|_{\infty}. Hence

\notag   \begin{aligned}    \|T^{-1}\|_{\infty}                &\le \|M(T)^{-1}e\|_{\infty}                \le \|W(T)^{-1}e\|_{\infty}                \le \|Z(T)^{-1}e\|_{\infty}\\         O(n^3) \hskip10pt & \hskip35pt  O(n^2)                  \hskip65pt  O(n)                  \hskip65pt O(1)   \end{aligned}

where the big-Oh expressions show the asymptotic cost in flops of evaluating each term by solving the relevant triangular system. As the bounds become less expensive to compute they become weaker. The quantity \|Z(T)^{-1}\|_p can be explicitly evaluated for p = \infty, using (2). It has the same value for p = 1, and since \|A\|_2 \le (\|A\|_1\|A\|_{\infty})^{1/2} we have

\notag    \|T^{-1}\|_p \le \displaystyle\frac{ (\beta + 1)^{n-1}}{\alpha}, \quad p = 1,2,\infty.    \qquad(7)

This bound is an equality for p = 1,\infty for the matrix T(\theta) in (1).

For the Frobenius norm, evaluating \|Z(T)^{-1}\|_F, and using \|A\|_2 \le \|A\|_F, gives

\notag  \|T^{-1}\|_{2,F} \le        \displaystyle\frac{ \bigl( (\beta + 1)^{2n} + 2n(\beta + 2) - 1 \bigr)^{1/2}}             {\alpha(\beta + 2)}.       \qquad(8)

For the 2-norm, either of (7) and (8) can be the smaller bound depending on \beta.

For the special case of a bidiagonal matrix B it is easy to show that |B^{-1}| = M(B)^{-1}, and so \|B^{-1}\|_{\infty} = \|M(B)^{-1}\|_{\infty} = \|M(B)^{-1}e\|_{\infty} can be computed exactly in O(n) flops.

These upper bounds can be arbitrarily weak, even for a fixed n, as shown by the example

\notag   T(\theta) = \begin{bmatrix} \theta^{-1} &   1      & 1       \\                       0     &  \theta^{-1} & \theta^{-1} \\                       0     &   0      & \theta^{-2} \end{bmatrix},     \quad \theta > 0,

for which

\notag   T(\theta)^{-1} =           \begin{bmatrix} \theta      &  -\theta^2   & 0       \\                       0     &  \theta      & -\theta^2   \\                       0     &   0      & \theta^2    \end{bmatrix},           \quad   M(T(\theta))^{-1} =           \begin{bmatrix} \theta      &  \theta^2    & 2\theta^3   \\                       0     &  \theta      & \theta^2    \\                       0     &   0      & \theta^2    \end{bmatrix}.

As \theta\to\infty, \|M(T(\theta))^{-1}\|_{\infty} /\|T(\theta)^{-1}\|_{\infty} \approx 2\theta. On the other hand, the overestimation is bounded as a function of n for triangular matrices resulting from certain pivoting strategies.

Theorem 1.

Suppose the upper triangular matrix T\in\mathbb{C}^{n\times n} satisfies

\notag       |t_{ii}| \ge |t_{ij}|, \quad j>i. \qquad (9)

Then, for the 1-, 2-, and \infty-norms,

\notag     \displaystyle\frac{1}{\min_i|t_{ii}|} \le \|T^{-1}\| \le \|M(T)^{-1}\|                               \le \|W(T)^{-1}\|                               \le \|Z(T)^{-1}\|                               \le \displaystyle\frac{2^{n-1}}{{\min_i|t_{ii}|}}.

Proof. The first four inequalities are a combination of (3) and (6). The fifth inequality is obtained from the expression (7) for \|Z(T)^{-1}\| with \beta = 1.

Condition (9) is satisfied for the triangular factors from QR factorization with column pivoting and for the transpose of the unit lower triangular factors from LU factorization with any form of pivoting.

The upper bounds we have described have been derived independently by several authors, as explained by Higham (2002).

References

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s