For a
n×n
matrix
A
the diagonalization is the process to find a diagonal matrix
D
and a invertible
n×n
matrix
P
that holds following equality
A=PDP−1
Why we need diagonalization?
We often need to calculate the power of the matrix, this
operation
Ak
is expensive to calculate when
A
and
k
is big enough. However, if the matrix
A
is diagonalizable. The power operation is then cheaper because
Ak=(PDP−1)k=PDkP−1
and the powers of diagonal matrix is straight forward.
Dk=λ1k0⋮00λ2k⋮0……⋱…00⋮λnk
The relationship of diagonalization with homogeneous system
Because
A=PDP−1⟹AP=PD, if we expand the calculation
The above observation is important because it is the same as
solving a homogeneous system of equations
(A−λjI)⋅pj=0
which
pj
is
p1j⋮pnj
pj
is called eigenvector,
λj
is the corresponding eigenvalue.
Geometric meaning of eigenvalues and eigenvectors
From
(A−λI)⋅p=0
we know that
Ap=λp, so from geometrical perspective, eigenvector
p
only suffer stretch for a given transformation
A
and
λ
its corresponding stretch ratio.
If we draw in a 2D space, the effect
v′−v
of applying the transformation of a
2×2
matrix
A, where
v
is the original vector and
v′
the vector after the transformation.
We notice the vectors with the same direction as eigenvectors
(red arrows) haven't changed the direction after the
transformation.
If we use eigenvectors as basis, then the transformation is
easier because we only need to multiply each component per
corresponding eigenvalue.
P
is actually the linear transformation that change basis formed
by eigenvectors to standard basis and
P−1
from standard basis to basis formed by eigenvectors.
When a matrix is diagonalizable
A
n×n
matrix
A
is diagonalizable if and only if
A
has
n
linearly independent eigenvectors.
What matrices are not diagonalizable
A rotation matrix is not diagonalizable over real fields
How to find all the eigenvalues and corresponding eigenvectors
From the previous conclusion
(A−λjI)⋅pj=0
and invertible matrix theorem, then a number
λ
is an eigenvalue of
A
if and only if
f(λ)=det(A−λI)=0,
f(λ)
is called the characteristic polynomial of
A.
The process of finding all the eigenvalues and corresponding
eigenvectors is following
Find all the roots
λ
of the characteristic polynomial.
For each
λ, compute corresponding eigenvectors by finding linearly
independent non-trivial solutions of the homogeneous system of
equations
(A−λI)⋅p=0
Application of diagonalization
One of important application of diagonalization is
PCA. In PCA we need to calculate eigenvectors and eigenvalues of
covariance matrix. Since covariance matrix is
symmetric matrix hence it is always diagonalizable.