Diagonalization

Author: Gao

Tags:

What is diagonalization?

For a n×nn \times n matrix AA the diagonalization is the process to find a diagonal matrix DD and a invertible n×nn \times n matrix PP that holds following equality A=PDP1A = PDP^{-1}

Why we need diagonalization?

We often need to calculate the power of the matrix, this operation AkA^{k} is expensive to calculate when AA and kk is big enough. However, if the matrix AA is diagonalizable. The power operation is then cheaper because Ak=(PDP1)k=PDkP1A^{k} = (PDP^{-1})^k = PD^{k}P^{-1} and the powers of diagonal matrix is straight forward.

Dk=[λ1k000λ2k000λnk]D^{k} = \begin{bmatrix} \lambda_{1}^{k} & 0 & \dots & 0\\ 0 & \lambda_{2}^{k} & \dots & 0\\ \vdots & \vdots & \ddots & \vdots\\ 0 & 0 & \dots & \lambda_{n}^{k}\\ \end{bmatrix}

The relationship of diagonalization with homogeneous system

Because A=PDP1    AP=PDA = PDP^{-1} \implies AP = PD, if we expand the calculation

[a11a1nan1ann][p11p1npn1pnn]=[p11p1npn1pnn][λ100λn][(a11p11+...+a1npn1)(a11p1n+...+a1npnn)(an1p11+...+annpn1)(an1p1n+...+annpnn)]=[p11λ1p1nλnpn1λ1pnnλn]\begin{aligned} \begin{bmatrix} a_{11} & \dots & a_{1n}\\ \vdots & \ddots & \vdots\\ a_{n1} & \dots & a_{nn}\\ \end{bmatrix} * \begin{bmatrix} p_{11} & \dots & p_{1n}\\ \vdots & \ddots & \vdots\\ p_{n1} & \dots & p_{nn}\\ \end{bmatrix} &= \begin{bmatrix} p_{11} & \dots & p_{1n}\\ \vdots & \ddots & \vdots\\ p_{n1} & \dots & p_{nn}\\ \end{bmatrix} * \begin{bmatrix} \lambda_{1} & \dots & 0\\ \vdots & \ddots & \vdots\\ 0 & \dots & \lambda_{n}\\ \end{bmatrix}\\ \begin{bmatrix} (a_{11}p_{11}+...+a_{1n}p_{n1}) & \dots & (a_{11}p_{1n}+...+a_{1n}p_{nn})\\ \vdots & \ddots & \vdots\\ (a_{n1}p_{11}+...+a_{nn}p_{n1}) & \dots & (a_{n1}p_{1n}+...+a_{nn}p_{nn})\\ \end{bmatrix} &= \begin{bmatrix} p_{11}\lambda_{1} & \dots & p_{1n}\lambda_{n}\\ \vdots & \ddots & \vdots\\ p_{n1}\lambda{1} & \dots & p_{nn}\lambda{n}\\ \end{bmatrix} \end{aligned}

We will have the following observation:

(a11p1j+...+a1npnjan1p1j+...+annpnj)=(p1jλjpnjλj)    (AλjI)(p1jpnj)=0\begin{pmatrix} a_{11}p_{1j}+...+a_{1n}p_{nj}\\ \vdots\\ a_{n1}p_{1j}+...+a_{nn}p_{nj}\\ \end{pmatrix} = \begin{pmatrix} p_{1j}\lambda_{j}\\ \vdots\\ p_{nj}\lambda_{j}\\ \end{pmatrix} \implies (A-\lambda_{j}I) \cdot \begin{pmatrix} p_{1j}\\ \vdots\\ p_{nj}\\ \end{pmatrix} = \vec{0}

The above observation is important because it is the same as solving a homogeneous system of equations

(AλjI)pj=0(A-\lambda_{j}I) \cdot \vec{p}_{j} = \vec{0}

which pj\vec{p}_{j} is

(p1jpnj)\begin{pmatrix} p_{1j}\\ \vdots\\ p_{nj}\\ \end{pmatrix}

pj\vec{p}_{j} is called eigenvector, λj\lambda{j} is the corresponding eigenvalue.

Geometric meaning of eigenvalues and eigenvectors

From (AλI)p=0(A-\lambda I) \cdot \vec{p} = \vec{0} we know that Ap=λpA\vec{p} = \lambda\vec{p}, so from geometrical perspective, eigenvector p\vec{p} only suffer stretch for a given transformation AA and λ\lambda its corresponding stretch ratio.

If we draw in a 2D space, the effect vvv^{\prime}-v of applying the transformation of a 2×22 \times 2 matrix AA, where vv is the original vector and vv^{\prime} the vector after the transformation.

Eigenvectors

We notice the vectors with the same direction as eigenvectors (red arrows) haven't changed the direction after the transformation.

If we use eigenvectors as basis, then the transformation is easier because we only need to multiply each component per corresponding eigenvalue. PP is actually the linear transformation that change basis formed by eigenvectors to standard basis and P1P^{-1} from standard basis to basis formed by eigenvectors.

When a matrix is diagonalizable

A n×nn \times n matrix AA is diagonalizable if and only if AA has nn linearly independent eigenvectors.

What matrices are not diagonalizable

How to find all the eigenvalues and corresponding eigenvectors

From the previous conclusion (AλjI)pj=0(A-\lambda_{j}I) \cdot \vec{p}_{j} = \vec{0} and invertible matrix theorem, then a number λ\lambda is an eigenvalue of AA if and only if f(λ)=det(AλI)=0f(\lambda) = det(A-\lambda I)= 0, f(λ)f(\lambda) is called the characteristic polynomial of AA.

The process of finding all the eigenvalues and corresponding eigenvectors is following

  1. Find all the roots λ\lambda of the characteristic polynomial.
  2. For each λ\lambda, compute corresponding eigenvectors by finding linearly independent non-trivial solutions of the homogeneous system of equations (AλI)p=0(A-\lambda I) \cdot \vec{p} = \vec{0}

Application of diagonalization

One of important application of diagonalization is PCA. In PCA we need to calculate eigenvectors and eigenvalues of covariance matrix. Since covariance matrix is symmetric matrix hence it is always diagonalizable.

Reference