Lecture 8 — Cayley–Hamilton Theorem & Diagonalization

1. Cayley–Hamilton Theorem

Definition

The Cayley–Hamilton theorem states that every square matrix satisfies its own characteristic equation.

For a square matrix \(A\) of order \(n\), with characteristic polynomial \[ p(\lambda) = \det(A - \lambda I) = \lambda^n + c_{1}\lambda^{n-1} + \cdots + c_{n}, \] then \[ p(A) = A^n + c_{1}A^{n-1} + \cdots + c_{n}I = 0. \]

Properties

It provides a relation between powers of the matrix.
Useful in computing \(A^{-1}\), \(A^k\), and simplifying polynomials of \(A\).
Holds for all square matrices (real or complex).

Examples

Example 1 (2×2)

Let \[ A = \begin{bmatrix} 2 & 1 \\ 1 & 2 \end{bmatrix}. \] Characteristic polynomial: \[ \det(A-\lambda I) = (2-\lambda)^2 - 1 = \lambda^2 - 4\lambda + 3. \] So by Cayley–Hamilton: \[ A^2 - 4A + 3I = 0. \]

Example 2 (3×3)

Let \[ A = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 3 \end{bmatrix}. \] Characteristic polynomial: \[ (\lambda-1)(\lambda-2)(\lambda-3) = \lambda^3 - 6\lambda^2 + 11\lambda - 6. \] So: \[ A^3 - 6A^2 + 11A - 6I = 0. \]

Example 3 (3×3)

Let \[ A = \begin{bmatrix} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 0 & 0 \end{bmatrix}. \] Characteristic polynomial: \[ \det(A-\lambda I) = -\lambda^3 + 1. \] So: \[ A^3 - I = 0. \]

2×2 — Example 1

Matrix \(A=\begin{pmatrix}2 & 1\\[2pt]1 & 2\end{pmatrix}\).

\(\operatorname{tr}A = 2+2 = 4\). \(\det A = 2\cdot2-1\cdot1=3.\)
C–H: \(A^2 - 4A + 3I = 0\). Multiply by \(1/3\): \(A^{-1} = \frac{1}{3}(4I - A)\).
Compute \(4I-A = \begin{pmatrix}2 & -1\\-1 & 2\end{pmatrix}\). So \(A^{-1} = \frac{1}{3}\begin{pmatrix}2 & -1\\-1 & 2\end{pmatrix} = \begin{pmatrix}2/3 & -1/3\\ -1/3 & 2/3\end{pmatrix}.\)

2×2 — Example 2

Matrix \(B=\begin{pmatrix}1 & 2\\[2pt]3 & 4\end{pmatrix}\).

\(\operatorname{tr}B = 5,\ \det B = 1\cdot4-2\cdot3=-2.\)
C–H: \(B^2 - 5B -2I = 0\). So \( -2 B^{-1} = 5I - B\) and \[ B^{-1} = \frac{1}{-2}(5I - B). \]
Compute \(5I - B = \begin{pmatrix}4 & -2\\ -3 & 1\end{pmatrix}\). Thus \(B^{-1} = -\tfrac12\begin{pmatrix}4 & -2\\ -3 & 1\end{pmatrix} = \begin{pmatrix}-2 & 1\\ 1.5 & -0.5\end{pmatrix}.\)

2×2 — Example 3

Matrix \(C=\begin{pmatrix}0 & 1\\ -1 & 0\end{pmatrix}\).

\(\operatorname{tr}C = 0, \det C = 1.\)
C–H: \(C^2 + I = 0\) (since \(C^2 = -I\)). Alternatively, from general formula \(C^{-1}= (1/d)(tI - C) = -C\).
Hence \(C^{-1} = -C = \begin{pmatrix}0 & -1\\ 1 & 0\end{pmatrix}.\)

3×3 — Example 4 (diagonal)

Matrix \(D=\operatorname{diag}(1,2,3)=\begin{pmatrix}1&0&0\\0&2&0\\0&0&3\end{pmatrix}.\)

\(t=\operatorname{tr}D = 1+2+3=6.\) \(D^2=\operatorname{diag}(1,4,9)\) so \(\operatorname{tr}(D^2)=14.\)
\(s=\tfrac12(t^2-\operatorname{tr}D^2)=\tfrac12(36-14)=11.\) \(\det D = 1\cdot2\cdot3 = 6.\)
C–H: \(D^3 - 6D^2 + 11D - 6I = 0.\) Then \[ D^{-1} = \frac{1}{6}(D^2 - 6D + 11I). \]
Compute diagonal entries: \(D^{-1}=\operatorname{diag}(1,1/2,1/3)\). (Verify: the formula gives \(\operatorname{diag}(6,3,2)/6 = \operatorname{diag}(1,1/2,1/3)\)).

3×3 — Example 5 (non-diagonal)

Matrix \(E=\begin{pmatrix}2 & 1 & 0\\[2pt]0 & 1 & 1\\[2pt]1 & 0 & 1\end{pmatrix}.\)

Compute traces and determinant: \(\displaystyle t=\operatorname{tr}E=2+1+1=4.\) \(E^2=\begin{pmatrix}4&3&1\\[2pt]1&1&2\\[2pt]3&1&1\end{pmatrix}\) so \(\operatorname{tr}(E^2)=6.\)
\(s=\tfrac12(t^2-\operatorname{tr}E^2)=\tfrac12(16-6)=5.\) \(\det E = 3.\)
C–H: \(E^3 - 4E^2 + 5E - 3I = 0.\) Using the inverse formula: \[ E^{-1} = \frac{1}{3}\big(E^2 - 4E + 5I\big). \]
Compute numerically: \[ E^2 - 4E + 5I = \begin{pmatrix}4&3&1\\1&1&2\\3&1&1\end{pmatrix} -4\begin{pmatrix}2&1&0\\0&1&1\\1&0&1\end{pmatrix} +5\begin{pmatrix}1&0&0\\0&1&0\\0&0&1\end{pmatrix} \] which equals \(\begin{pmatrix}1 & -1 & 1\\ 1 & 2 & -2\\ -1 & 1 & 2\end{pmatrix}\). Dividing by \(3\) gives \[ E^{-1} = \begin{pmatrix}1/3 & -1/3 & 1/3\\ 1/3 & 2/3 & -2/3\\ -1/3 & 1/3 & 2/3\end{pmatrix}, \] which matches the direct inverse.

2. Diagonalization

Definition

A square matrix \(A\) is said to be diagonalizable if there exists an invertible matrix \(P\) and a diagonal matrix \(D\) such that \[ A = P D P^{-1}. \] Here, \(D\) contains eigenvalues of \(A\) on the diagonal, and \(P\) contains the corresponding eigenvectors.

Properties

A matrix with \(n\) distinct eigenvalues is always diagonalizable.
Symmetric (real) matrices are always diagonalizable with orthogonal eigenvectors.
Diagonalization simplifies computing powers: \(A^k = P D^k P^{-1}\).

Examples

Example 1 (2×2)

\[ A = \begin{bmatrix} 4 & 1 \\ 0 & 2 \end{bmatrix}. \] Characteristic polynomial: \((\lambda-4)(\lambda-2)\). Eigenvalues: \(\lambda_1=4, \lambda_2=2\). Eigenvectors: \([1,0]^T\), \([1,-2]^T\). So \[ P = \begin{bmatrix} 1 & 1 \\ 0 & -2 \end{bmatrix}, \quad D = \begin{bmatrix} 4 & 0 \\ 0 & 2 \end{bmatrix}. \]

Example 2 (3×3)

\[ A = \begin{bmatrix} 2 & 0 & 0 \\ 0 & 3 & 0 \\ 0 & 0 & 4 \end{bmatrix}. \] Already diagonal: \(D = A\). \(P = I\).

Example 3 (3×3)

\[ A = \begin{bmatrix} 1 & 1 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 2 \end{bmatrix}. \] Eigenvalues: \(\lambda=1\) (multiplicity 2), \(\lambda=2\). Eigenvectors: \([1,0,0]^T, [0,1,0]^T, [0,0,1]^T\). So \[ P = I, \quad D = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 2 \end{bmatrix}. \]