What this post covers
This post introduces least squares and linear regression.
- Why approximate solutions are needed in real data
- What least squares minimizes
- How projection and the residual are connected
- How regression fits into a linear-algebra viewpoint
Key terms
- least squares: minimizing the squared length of the error vector
- linear regression: fitting a linear model to data
- normal equation: the equation
A^T A x = A^T b - residual: the difference between observation and model output
- column space: the space of outputs the model can explain
Core idea
Real data rarely fits a line or plane exactly. For example, the three points (1, 2), (2, 3), and (3, 5.5) do not all lie on one perfect line. That means Ax = b often has no exact solution.
Instead of demanding perfection, we look for the best approximation by minimizing
||Ax - b||^2
This is the least-squares problem.
We square the error because signed errors can cancel out, and because the squared Euclidean norm is smooth and easy to optimize.
The vector Ax is what the model can explain. The vector b is what we actually observe. Their difference is the error, or residual.
Step-by-step examples
Example 1) Fitting a line
Suppose we want a line y = β0 + β1 x for data points (x1, y1), ..., (xn, yn). Each data point gives one equation β0 + β1 x_i = y_i, and stacking those equations gives one matrix problem.
Then we can write
A = [1 x1
1 x2
...
1 xn]
β_hat = [β0
β1]
b_vec = [y1
y2
...
yn]
So the problem becomes
A β_hat ≈ b_vec
Example 2) The projection viewpoint
The outputs a model can explain form the column space of A.
If b lies outside that space, no exact solution exists. The least-squares solution chooses A β_hat to be the projection of b onto Col(A).
That means the residual
r = b - A β_hat
is orthogonal to the column space.
Example 3) The normal equation
If the residual is orthogonal to every column of A, then
A^T r = 0
Substituting r = b - Ax gives
A^T(b - Ax) = 0
A^T A x = A^T b
which is the normal equation.
Math notes
- Least squares minimizes the squared Euclidean norm of the residual.
- If
Ahas full column rank, the least-squares solution is unique. - If not, multiple least-squares solutions may exist, and one often chooses the minimum-norm solution.
- In practice, QR decomposition or SVD is often preferred to solving the normal equation directly.
Linear regression is also a statistical model, but here the main focus is its linear-algebra skeleton.
Common mistakes
Thinking regression is only a statistics formula sheet
It is also a projection problem in linear algebra.
Thinking “no exact solution” means “no useful answer”
Approximation is the normal case in real data.
Assuming least-squares solutions are always unique
Uniqueness depends on the columns of A being independent enough.
Practice or extension
- Why does real data often fail to satisfy
Ax = bexactly? - Why do we square the error instead of just summing signed errors?
- What does it mean geometrically to project
bontoCol(A)?
Wrap-up
This post introduced least squares and linear regression.
- When exact solutions fail, best approximations still exist.
- Least squares minimizes squared residual size.
- The solution can be read as a projection.
- Next, we move to eigenvalues and eigenvectors, where transformations reveal their special directions.
💬 댓글
이 글에 대한 의견을 남겨주세요