[Linear Algebra Series Part 1] A Linear Algebra Roadmap for Programmers

What this post covers

This post gives you the map of the whole linear algebra series before we dive into the details.

Why programmers need linear algebra at all
Why the 20-post sequence is arranged in this order
How the major sections connect to machine learning, graphics, search, and recommender systems
How to choose a practical reading path instead of treating all 20 posts as equally urgent from day one

Key terms

You do not need to master these yet. Treat them as guideposts you will return to later.

vector: the basic object used to represent multiple values in an ordered way
matrix: the coordinate representation of a rule that sends one vector to another
linear transformation: a rule that preserves vector addition and scalar multiplication
basis: a linearly independent set of vectors that spans a space
dimension: the number of vectors in any basis of that space

Core idea

Many people remember linear algebra as “the course with matrix calculations.” From a programming perspective, that description is too small. Linear algebra is better understood as the language for representing data as vectors, transforming those vectors with matrices, and explaining which information is preserved, compressed, or discarded along the way.

A sentence embedding in a search system is a vector. A user preference profile in a recommender system is a vector. Position and velocity in a game engine are vectors. Even an image can be viewed as a large vector if you flatten its pixel values. A matrix is how we write a linear transformation once we choose coordinates or bases. Linear algebra gives all of that a shared grammar.

This series does not follow the most traditional “determinants first, heavy proof later” route. It starts from the kinds of questions programmers actually run into.

When does data count as a vector?
What does it mean to combine features mathematically?
What is a linear layer really doing?
Why can some outputs be produced while others cannot?
When we reduce dimension, what survives and what disappears?

To answer those questions, the series is split into four blocks.

1) Learn the language of vectors and matrices

Parts 2-8 build the basic language. We separate scalars from vectors, learn vector addition and scalar multiplication, use length and dot products to talk about similarity, and read a matrix as a transformation instead of as a mere table.

This block matters because most later ideas start here. In plain language, you learn what vectors are, how to combine them, how to measure their size or similarity, and how matrices transform them. If you want to understand a neural network's linear layer, you need to be able to read matrix-vector multiplication as a weighted combination of columns.

2) Understand `Ax=b` and the structure of solutions

Parts 9-16 cover the core structure of linear algebra. We package systems of linear equations as Ax=b, solve them with elimination, and classify whether a system has one solution, no solution, or infinitely many. Then we move into the more formal language: span, linear independence, basis, dimension, subspaces, column space, null space, rank, and nullity.

This is not just a block about technique. It is where we learn to ask structural questions in plain terms: What outputs can this matrix produce? Which inputs get lost? How many genuinely independent directions are there in the data? Those questions later connect to latent factors in recommender systems, dimension reduction, and feature redundancy.

3) Understand orthogonality and approximation

Parts 17 and 18 focus on orthogonality, projection, and least squares. Real data rarely fits perfectly. Sensor data includes noise, user behavior data fluctuates, and observed values include error. So instead of looking for an exact solution, we often look for the best approximate one.

This block is essential for understanding the linear-algebra backbone of regression. It explains why least squares is natural, why error turns into an orthogonality condition, and why projection gives the closest point once a dot product gives us a notion of perpendicularity.

4) Understand decomposition and dimension reduction

Parts 19 and 20 cover eigenvalues, eigenvectors, and singular value decomposition (SVD). Many learners find this part intimidating, but for programmers it is one of the most practical sections.

Principal component analysis (PCA) focuses on directions of large variance.
SVD handles more general matrices and leads naturally to low-dimensional approximation.
PageRank can be explained in terms of dominant eigenvectors in a special stochastic-matrix setting.

So the final block is not a collection of “fancy advanced topics.” It is where the earlier ideas about representation, spaces, and orthogonality turn into practical decomposition tools.

How to read this series

You do not have to read all 20 parts with the same depth at the start. A better route depends on what you need.

If you are completely new: read Parts 1-8 first, then read 9-12 carefully enough to understand Ax=b and representability.
If you mainly care about machine learning links: read 1-8, then 17-20, then come back to basis, dimension, and rank.
If you want the structure to feel mathematically solid: read 1-20 in order and compute the examples yourself from Parts 12-18 onward.

A practical learning flow looks like this:

vector and matrix intuition -> Ax=b structure -> subspaces and dimension -> orthogonality and approximation -> decomposition and dimension reduction

What this series will and will not cover

This series focuses on finite-dimensional real vector spaces, the part of linear algebra that shows up most often in programming, machine learning, graphics, and data work.

We will focus on intuition, structure, notation, and practical interpretation.
We will mention numerical stability, condition number, regularization, and computational complexity when they clarify the idea.
We will not turn into a proof-heavy abstract algebra course.
We will not cover infinite-dimensional spaces, functional analysis, or a full numerical linear algebra treatment.

The 20-part sequence

Here is the full structure of the series, with a one-line goal for each post.

Part	Filename	Topic	One-line goal
Part 1	`01-roadmap`	A linear algebra roadmap for programmers	Get the big picture and choose a learning path
Part 2	`02-scalars-and-vectors`	Seeing scalars and vectors through a programmer's lens	Understand when data is truly a vector
Part 3	`03-vector-addition-and-scalar-multiplication`	Vector addition, scalar multiplication, and the start of linearity	Learn the core operations behind linear combinations
Part 4	`04-norm-and-distance`	Length and distance	Read vector size and error numerically
Part 5	`05-dot-product-and-cosine-similarity`	Dot products and cosine similarity	Build the starting point for direction, similarity, and orthogonality
Part 6	`06-matrices-as-transformations`	A matrix is a transformation, not just a table	Read matrices as coordinate representations of transformations
Part 7	`07-matrix-vector-multiplication`	What matrix-vector multiplication really does	Interpret `Ax` as a linear combination of columns
Part 8	`08-matrix-multiplication-and-composition`	Matrix multiplication and composition	Understand matrix multiplication as composition of transformations
Part 9	`09-linear-systems-ax-equals-b`	Systems of equations and `Ax=b`	View many equations as one structure
Part 10	`10-gaussian-elimination`	Gaussian elimination	Read both the solving procedure and the pivot structure
Part 11	`11-solution-structures-of-linear-systems`	One solution, none, or infinitely many	Classify the structure of the solution set
Part 12	`12-linear-combinations-and-span`	Linear combinations and span	See representability as a space question
Part 13	`13-linear-independence`	Linear independence and dependence	Judge what counts as non-redundant information
Part 14	`14-basis-and-dimension`	Basis and dimension	Understand minimal directions and degrees of freedom
Part 15	`15-subspaces-column-space-and-null-space`	Subspaces, column space, and null space	Read the spaces a matrix creates and destroys
Part 16	`16-rank-and-nullity`	Rank and nullity	Summarize preserved versus lost information
Part 17	`17-orthogonality-and-projection`	Orthogonality and projection	Understand the structure of the closest approximation
Part 18	`18-least-squares-and-linear-regression`	Least squares and the basics of linear regression	Find the best approximation when no exact solution exists
Part 19	`19-eigenvalues-and-eigenvectors`	Eigenvalues and eigenvectors	Find the directions preserved by repeated transformations
Part 20	`20-svd-and-dimensionality-reduction`	SVD and an introduction to dimension reduction	Understand matrix decomposition and low-dimensional approximation

Step-by-step examples

The full series becomes easier to follow if you picture how it connects to real programming problems.

Example 1) Embeddings in search systems

Suppose a model turns a sentence into a 768-dimensional vector. The sentence is no longer just a string. It becomes a vector of numbers. Parts 2 and 3 teach you how to read such a vector and what it means to combine features.

Then Parts 4 and 5 compare two sentence vectors using distance and dot products. Cosine similarity, which appears all the time in retrieval work, enters naturally there.

Example 2) A neural network's linear layer

Suppose you have an input feature vector x and a weight matrix W. The core computation of a linear layer looks like this.

Wx

That line is not just arithmetic. It is a transformation that sends the input vector into a new feature space. Parts 6-8 train you to read that computation as transformation language, not only as a formula.

Example 3) Recommender systems and latent factors

Suppose a movie platform has a huge user-movie rating matrix with many missing entries. The system wants to infer hidden patterns such as “this user prefers action movies” or “this movie appeals to viewers who like slow dramas.” Those hidden patterns are often called latent factors. Parts 9-16 give you the space language needed to reason about that setting: what combinations are representable, what structure is redundant, and how many independent directions the data is really moving along.

Example 4) Noisy real-world data

Real data often does not satisfy Ax=b exactly. That is why projection and least squares matter in Parts 17 and 18. Once you understand what it means to pick the closest solution, regression and error minimization feel much less mysterious.

Example 5) Compression and dimension reduction

The final parts teach matrix decompositions. Finding the important directions and keeping the large structure while discarding the rest is exactly what shows up in PCA, SVD, and matrix factorization for recommendation systems.

A short code glimpse

This is what linear algebra often looks like in actual code.


# a 2D input vector
x = np.array([1.0, 2.0])

# a matrix that scales x by 2 and y by 3
W = np.array([[2.0, 0.0], [0.0, 3.0]])

# apply the transformation
y = W @ x
# y = [2.0, 6.0]

Mathematically that is just y = Wx, but in code it appears immediately as array computation. This series keeps connecting the math notation to that programming view.

Math notes

The central question of linear algebra is: what can be represented, and how can that representation be compressed without unnecessary redundancy?
Span, basis, rank, eigenvalues, and singular values are all different angles on that same question.
A basis is a linearly independent set that spans the space, and dimension is the number of vectors in any basis. The important theorem is that every basis of the same finite-dimensional space has the same size.
A linear transformation preserves addition and scalar multiplication: T(u + v) = T(u) + T(v) and T(cu) = cT(u).
When we say that information is preserved, we mean that inputs can still be distinguished or reconstructed. That depends on invertibility, on rank, and on which directions disappear into null space.
PCA and SVD are related but not identical. PCA is a variance-oriented data-analysis method; SVD is a general matrix decomposition that works for any matrix.

Common mistakes

"If I can calculate matrices, I am fine"

That is enough for a few early exercises, but later ideas like span, basis, rank, and projection start feeling disconnected. Linear algebra is not mainly about calculation. It is about reading structure.

"Eigenvalues and SVD are enough for practical work"

They are important practical topics, but learning only the final tools without vectors, transformations, subspaces, and orthogonality makes them feel like recipes with no explanation.

"If I understand linear models, I basically understand real systems"

Linear algebra gives a powerful backbone, but many real models include nonlinearity. Activation functions in neural networks, nonlinear optimization, and kernel methods all go beyond purely linear structure. Still, even those systems keep using linear algebra internally.

Practice or extension

You do not need to start heavy calculation practice yet. Instead, answer these questions first.

Which data I work with can be represented as vectors?
What does each coordinate or feature axis mean?
Which operations are true transformations, not just storage-format changes?
If I had to compress the result, what information would I want to keep and what would I be willing to discard?

From a practical perspective, it also helps to keep these keywords in mind early.

numerical stability
condition number
regularization
computational complexity

They will return later in the series when we move from concept to implementation.

Wrap-up

The goal of this series is to reorganize linear algebra as math that programmers actually use.

Early parts build the language of vectors and matrices.
Middle parts explain the structure of solutions and spaces.
Final parts move into approximation, decomposition, and dimension reduction.
Along the way, the series keeps connecting the math to code and practical modeling notes.

The next post starts with the most basic distinction of all: scalars versus vectors, and when a collection of numbers is truly a vector.