[Linear Algebra Series Part 20] SVD and an Introduction to Dimensionality Reduction

What this post covers

This post introduces SVD and dimensionality reduction.

How SVD factorizes a matrix
What singular values measure
Why low-rank approximation works
How SVD connects to compression, recommendation systems, and embeddings

Key terms

SVD: a factorization of a matrix into orthonormal directions and scaling
singular value: the strength of stretching along a special direction
low-rank approximation: approximating a matrix with fewer important directions
dimensionality reduction: keeping the important structure while reducing coordinates

Core idea

SVD factorizes a matrix as

A = U Σ V^T

The big idea is simple: SVD reveals which directions a matrix stretches most, which directions it barely changes, and how to describe data using fewer important coordinates.

with shapes

A (m x n) = U (m x m) Σ (m x n) V^T (n x n)

A useful way to read this is:

V^T gives orthonormal directions in the input space (perpendicular unit vectors forming a clean coordinate system)
Σ tells us how much each direction is stretched
U gives the corresponding orthonormal directions in the output space

So SVD tells us which directions matter most and how strongly the matrix acts on them.

That is why SVD naturally leads to dimensionality reduction. If some singular values are tiny, then the corresponding directions contribute very little, so we can often discard them and keep a shorter coordinate description.

Step-by-step examples

Example 1) Image compression

If an image is stored as a matrix, then keeping only the largest singular values can preserve the main visual structure while reducing storage. For example, if an image effectively uses only the top 20 singular values out of hundreds, then we can store a much smaller approximation while still keeping the main shapes and contrast patterns.

Example 2) Recommendation systems

A user-item matrix can be huge and sparse, but often most of its structure can be approximated by a lower-rank representation. That is why low-rank methods are useful in recommendation systems.

Example 3) Embeddings and document matrices

High-dimensional text or embedding matrices often contain structure that can be summarized in fewer directions. SVD helps compress while still preserving important relationships.

Math notes

Every real or complex matrix has an SVD, including rectangular matrices.
The singular values are the nonnegative square roots of the eigenvalues of A^T A.
If v_i is a right singular vector, then

A v_i = σ_i u_i

for the corresponding left singular vector u_i.

Truncated SVD gives the optimal rank-k approximation in Frobenius norm.
PCA can be viewed through SVD on centered data.

Common mistakes

Thinking SVD is just a library call

It is also a way to read matrix structure.

Confusing eigendecomposition with SVD

They are related, but SVD is more general and works for rectangular matrices too.

Thinking small singular values are always useless

Whether a direction matters depends on the goal. Sometimes subtle structure matters.

Practice or extension

Why can keeping only a few large singular values still preserve a lot of structure?
What advantage does low-rank approximation give in recommendation systems?
If the singular values are [10, 5, 1, 0.1], how much of the total squared magnitude is kept by the top two?

Wrap-up

This post introduced SVD and dimensionality reduction.

SVD breaks a matrix into meaningful directions and scaling.
Large singular values point to important structure.
Low-rank approximation helps with compression and representation.
This completes the main arc of the linear algebra series.