What this post covers
This post gives you the map of the whole linear algebra series before we dive into the details.
- Why programmers need linear algebra at all
- Why the 20-post sequence is arranged in this order
- How the major sections connect to machine learning, graphics, search, and recommender systems
- How to choose a practical reading path instead of treating all 20 posts as equally urgent from day one
Key terms
You do not need to master these yet. Treat them as guideposts you will return to later.
- vector: the basic object used to represent multiple values in an ordered way
- matrix: the coordinate representation of a rule that sends one vector to another
- linear transformation: a rule that preserves vector addition and scalar multiplication
- basis: a linearly independent set of vectors that spans a space
- dimension: the number of vectors in any basis of that space
Core idea
Many people remember linear algebra as “the course with matrix calculations.” From a programming perspective, that description is too small. Linear algebra is better understood as the language for representing data as vectors, transforming those vectors with matrices, and explaining which information is preserved, compressed, or discarded along the way.
A sentence embedding in a search system is a vector. A user preference profile in a recommender system is a vector. Position and velocity in a game engine are vectors. Even an image can be viewed as a large vector if you flatten its pixel values. A matrix is how we write a linear transformation once we choose coordinates or bases. Linear algebra gives all of that a shared grammar.
This series does not follow the most traditional “determinants first, heavy proof later” route. It starts from the kinds of questions programmers actually run into.
- When does data count as a vector?
- What does it mean to combine features mathematically?
- What is a linear layer really doing?
- Why can some outputs be produced while others cannot?
- When we reduce dimension, what survives and what disappears?
To answer those questions, the series is split into four blocks.
1) Learn the language of vectors and matrices
Parts 2-8 build the basic language. We separate scalars from vectors, learn vector addition and scalar multiplication, use length and dot products to talk about similarity, and read a matrix as a transformation instead of as a mere table.
This block matters because most later ideas start here. In plain language, you learn what vectors are, how to combine them, how to measure their size or similarity, and how matrices transform them. If you want to understand a neural network's linear layer, you need to be able to read matrix-vector multiplication as a weighted combination of columns.
2) Understand Ax=b and the structure of solutions
Parts 9-16 cover the core structure of linear algebra. We package systems of linear equations as Ax=b, solve them with elimination, and classify whether a system has one solution, no solution, or infinitely many. Then we move into the more formal language: span, linear independence, basis, dimension, subspaces, column space, null space, rank, and nullity.
This is not just a block about technique. It is where we learn to ask structural questions in plain terms: What outputs can this matrix produce? Which inputs get lost? How many genuinely independent directions are there in the data? Those questions later connect to latent factors in recommender systems, dimension reduction, and feature redundancy.
3) Understand orthogonality and approximation
Parts 17 and 18 focus on orthogonality, projection, and least squares. Real data rarely fits perfectly. Sensor data includes noise, user behavior data fluctuates, and observed values include error. So instead of looking for an exact solution, we often look for the best approximate one.
This block is essential for understanding the linear-algebra backbone of regression. It explains why least squares is natural, why error turns into an orthogonality condition, and why projection gives the closest point once a dot product gives us a notion of perpendicularity.
4) Understand decomposition and dimension reduction
Parts 19 and 20 cover eigenvalues, eigenvectors, and singular value decomposition (SVD). Many learners find this part intimidating, but for programmers it is one of the most practical sections.
- Principal component analysis (PCA) focuses on directions of large variance.
- SVD handles more general matrices and leads naturally to low-dimensional approximation.
- PageRank can be explained in terms of dominant eigenvectors in a special stochastic-matrix setting.
So the final block is not a collection of “fancy advanced topics.” It is where the earlier ideas about representation, spaces, and orthogonality turn into practical decomposition tools.
How to read this series
You do not have to read all 20 parts with the same depth at the start. A better route depends on what you need.
- If you are completely new: read Parts 1-8 first, then read 9-12 carefully enough to understand
Ax=band representability. - If you mainly care about machine learning links: read 1-8, then 17-20, then come back to basis, dimension, and rank.
- If you want the structure to feel mathematically solid: read 1-20 in order and compute the examples yourself from Parts 12-18 onward.
A practical learning flow looks like this:
vector and matrix intuition -> Ax=b structure -> subspaces and dimension -> orthogonality and approximation -> decomposition and dimension reduction
What this series will and will not cover
This series focuses on finite-dimensional real vector spaces, the part of linear algebra that shows up most often in programming, machine learning, graphics, and data work.
- We will focus on intuition, structure, notation, and practical interpretation.
- We will mention numerical stability, condition number, regularization, and computational complexity when they clarify the idea.
- We will not turn into a proof-heavy abstract algebra course.
- We will not cover infinite-dimensional spaces, functional analysis, or a full numerical linear algebra treatment.
The 20-part sequence
Here is the full structure of the series, with a one-line goal for each post.
| Part | Filename | Topic | One-line goal |
|---|---|---|---|
| Part 1 | 01-roadmap |
A linear algebra roadmap for programmers | Get the big picture and choose a learning path |
| Part 2 | 02-scalars-and-vectors |
Seeing scalars and vectors through a programmer's lens | Understand when data is truly a vector |
| Part 3 | 03-vector-addition-and-scalar-multiplication |
Vector addition, scalar multiplication, and the start of linearity | Learn the core operations behind linear combinations |
| Part 4 | 04-norm-and-distance |
Length and distance | Read vector size and error numerically |
| Part 5 | 05-dot-product-and-cosine-similarity |
Dot products and cosine similarity | Build the starting point for direction, similarity, and orthogonality |
| Part 6 | 06-matrices-as-transformations |
A matrix is a transformation, not just a table | Read matrices as coordinate representations of transformations |
| Part 7 | 07-matrix-vector-multiplication |
What matrix-vector multiplication really does | Interpret Ax as a linear combination of columns |
| Part 8 | 08-matrix-multiplication-and-composition |
Matrix multiplication and composition | Understand matrix multiplication as composition of transformations |
| Part 9 | 09-linear-systems-ax-equals-b |
Systems of equations and Ax=b |
View many equations as one structure |
| Part 10 | 10-gaussian-elimination |
Gaussian elimination | Read both the solving procedure and the pivot structure |
| Part 11 | 11-solution-structures-of-linear-systems |
One solution, none, or infinitely many | Classify the structure of the solution set |
| Part 12 | 12-linear-combinations-and-span |
Linear combinations and span | See representability as a space question |
| Part 13 | 13-linear-independence |
Linear independence and dependence | Judge what counts as non-redundant information |
| Part 14 | 14-basis-and-dimension |
Basis and dimension | Understand minimal directions and degrees of freedom |
| Part 15 | 15-subspaces-column-space-and-null-space |
Subspaces, column space, and null space | Read the spaces a matrix creates and destroys |
| Part 16 | 16-rank-and-nullity |
Rank and nullity | Summarize preserved versus lost information |
| Part 17 | 17-orthogonality-and-projection |
Orthogonality and projection | Understand the structure of the closest approximation |
| Part 18 | 18-least-squares-and-linear-regression |
Least squares and the basics of linear regression | Find the best approximation when no exact solution exists |
| Part 19 | 19-eigenvalues-and-eigenvectors |
Eigenvalues and eigenvectors | Find the directions preserved by repeated transformations |
| Part 20 | 20-svd-and-dimensionality-reduction |
SVD and an introduction to dimension reduction | Understand matrix decomposition and low-dimensional approximation |
Step-by-step examples
The full series becomes easier to follow if you picture how it connects to real programming problems.
Example 1) Embeddings in search systems
Suppose a model turns a sentence into a 768-dimensional vector. The sentence is no longer just a string. It becomes a vector of numbers. Parts 2 and 3 teach you how to read such a vector and what it means to combine features.
Then Parts 4 and 5 compare two sentence vectors using distance and dot products. Cosine similarity, which appears all the time in retrieval work, enters naturally there.
Example 2) A neural network's linear layer
Suppose you have an input feature vector x and a weight matrix W. The core computation of a linear layer looks like this.
Wx
That line is not just arithmetic. It is a transformation that sends the input vector into a new feature space. Parts 6-8 train you to read that computation as transformation language, not only as a formula.
Example 3) Recommender systems and latent factors
Suppose a movie platform has a huge user-movie rating matrix with many missing entries. The system wants to infer hidden patterns such as “this user prefers action movies” or “this movie appeals to viewers who like slow dramas.” Those hidden patterns are often called latent factors. Parts 9-16 give you the space language needed to reason about that setting: what combinations are representable, what structure is redundant, and how many independent directions the data is really moving along.
Example 4) Noisy real-world data
Real data often does not satisfy Ax=b exactly. That is why projection and least squares matter in Parts 17 and 18. Once you understand what it means to pick the closest solution, regression and error minimization feel much less mysterious.
Example 5) Compression and dimension reduction
The final parts teach matrix decompositions. Finding the important directions and keeping the large structure while discarding the rest is exactly what shows up in PCA, SVD, and matrix factorization for recommendation systems.
A short code glimpse
This is what linear algebra often looks like in actual code.
# a 2D input vector
x = np.array([1.0, 2.0])
# a matrix that scales x by 2 and y by 3
W = np.array([[2.0, 0.0], [0.0, 3.0]])
# apply the transformation
y = W @ x
# y = [2.0, 6.0]
Mathematically that is just y = Wx, but in code it appears immediately as array computation. This series keeps connecting the math notation to that programming view.
Math notes
- The central question of linear algebra is: what can be represented, and how can that representation be compressed without unnecessary redundancy?
- Span, basis, rank, eigenvalues, and singular values are all different angles on that same question.
- A basis is a linearly independent set that spans the space, and dimension is the number of vectors in any basis. The important theorem is that every basis of the same finite-dimensional space has the same size.
- A linear transformation preserves addition and scalar multiplication:
T(u + v) = T(u) + T(v)andT(cu) = cT(u). - When we say that information is preserved, we mean that inputs can still be distinguished or reconstructed. That depends on invertibility, on rank, and on which directions disappear into null space.
- PCA and SVD are related but not identical. PCA is a variance-oriented data-analysis method; SVD is a general matrix decomposition that works for any matrix.
Common mistakes
"If I can calculate matrices, I am fine"
That is enough for a few early exercises, but later ideas like span, basis, rank, and projection start feeling disconnected. Linear algebra is not mainly about calculation. It is about reading structure.
"Eigenvalues and SVD are enough for practical work"
They are important practical topics, but learning only the final tools without vectors, transformations, subspaces, and orthogonality makes them feel like recipes with no explanation.
"If I understand linear models, I basically understand real systems"
Linear algebra gives a powerful backbone, but many real models include nonlinearity. Activation functions in neural networks, nonlinear optimization, and kernel methods all go beyond purely linear structure. Still, even those systems keep using linear algebra internally.
Practice or extension
You do not need to start heavy calculation practice yet. Instead, answer these questions first.
- Which data I work with can be represented as vectors?
- What does each coordinate or feature axis mean?
- Which operations are true transformations, not just storage-format changes?
- If I had to compress the result, what information would I want to keep and what would I be willing to discard?
From a practical perspective, it also helps to keep these keywords in mind early.
- numerical stability
- condition number
- regularization
- computational complexity
They will return later in the series when we move from concept to implementation.
Wrap-up
The goal of this series is to reorganize linear algebra as math that programmers actually use.
- Early parts build the language of vectors and matrices.
- Middle parts explain the structure of solutions and spaces.
- Final parts move into approximation, decomposition, and dimension reduction.
- Along the way, the series keeps connecting the math to code and practical modeling notes.
The next post starts with the most basic distinction of all: scalars versus vectors, and when a collection of numbers is truly a vector.
💬 댓글
이 글에 대한 의견을 남겨주세요