What this post covers
This post explains matrix-vector multiplication.
- What
Axmeans at both the component level and the structural level - Why
Axcan be read as a linear combination of columns - How the row view and column view complement each other
- Why this matters for linear layers, feature mixing, and output spaces
Key terms
- matrix: a rule that maps vectors from one space to another
- vector: the input and output object in matrix-vector multiplication
- linear transformation: the viewpoint that turns
Axinto more than a formula
Core idea
If the previous post treated a matrix as a transformation, this post explains what the actual calculation Ax is doing.
In this series, we use the column-vector convention. We write a column vector like [1; 3], where the semicolon means the entries are stacked vertically. If A is m x n, then x must have n entries, and the product Ax is an m x 1 vector.
At first glance, matrix-vector multiplication looks like a row-by-row calculation: each output component is the dot product of one row of A with x.
That is correct, but the more important structural interpretation is:
Ax = a weighted sum of the columns of A
If the columns of A are a1, a2, ..., an and the components of x are x1, x2, ..., xn, then
Ax = x1 a1 + x2 a2 + ... + xn an
So the output is made by mixing the columns of the matrix according to the entries of the input vector.
This is the viewpoint that later becomes span, column space, and rank. For now, you can read it in plain language as: the output has to be built from the matrix columns.
The row view also matters
The same calculation can be read row-wise.
- Row view: each output coordinate is computed as a row-dot-product with the input.
- Column view: the whole output vector is built as a combination of columns.
Both views are valid. The row view is great for seeing how each output number is computed. The column view is great for understanding what outputs are even possible.
Step-by-step examples
Example 1) A small calculation
Let
A = [1 2
3 4]
x = [5
6]
Then
Ax = [1*5 + 2*6
3*5 + 4*6]
= [17
39]
That is the row view.
Now read the same computation through columns. The columns of A are [1; 3] and [2; 4], so
Ax = 5[1; 3] + 6[2; 4]
Same answer, different interpretation. The row view is computational; the column view is structural and geometric.
Example 2) Outputs you can and cannot make
Consider
A = [1 2
2 4]
The second column is just twice the first. So for any input x = [x1; x2],
Ax = x1[1; 2] + x2[2; 4]
= x1[1; 2] + 2x2[1; 2]
= (x1 + 2x2)[1; 2]
That means every output lies on the single line spanned by [1; 2]. The matrix cannot produce a vector like [1; 0], because [1; 0] is not on that line.
So the matrix does not produce every possible vector in R^2. It only produces vectors inside a restricted output set. Later we will call that set the column space of A.
Example 3) Feature mixing in a linear layer
In a neural network, Wx does not merely copy the input features. It mixes them into new output features. For example, if x = [red; green; blue] and W is a 2 x 3 matrix, then Wx can produce two learned features such as [feature1; feature2], each built as a weighted mix of the RGB values.
That is why matrix-vector multiplication is a good model for “re-expressing data in a new feature space.”
In practice, frameworks may store tensors with batch dimensions or use transposed layouts, so the same math may appear with different shapes in code. But the underlying idea is the same.
Example 4) Coordinate transformation
When a matrix acts on a point or displacement vector in 2D, it moves that object according to one consistent rule. So Ax is simultaneously:
- one concrete output calculation, and
- one example of how the whole space is transformed.
Math notes
- Reading
Axas a column combination shows that every output lies in the column space ofA, meaning the set of all linear combinations of its columns. - That is why the equation
Ax = bis really asking whetherbbelongs to the column space. - If
Aism x n, then it hasmrows andncolumns, so it maps ann-dimensional input to anm-dimensional output.
So matrix-vector multiplication is not just arithmetic. It is the gateway to understanding what a matrix can produce.
Common mistakes
Memorizing the rule as pure mechanics
If you only remember “row times column,” you can compute numbers but miss why the calculation matters.
Assuming any output vector is possible
No. The outputs must lie inside the space generated by the matrix columns.
Thinking the row view and column view compete with each other
They do not. They answer different questions.
Mixing up input and output dimension
An m x n matrix takes n-dimensional inputs and produces m-dimensional outputs. This matters constantly in model code.
Practice or extension
- If a
3 x 2matrix has two linearly independent columns, what dimension can its output space have? - What is the output dimension of a
2 x 3matrix multiplied by a3 x 1vector? - Why does the column view make
Ax = beasier to interpret? - Describe the same multiplication once through rows and once through columns.
A good exercise is to compute the same example in both ways and confirm the answers match.
Wrap-up
This post gave a structural reading of matrix-vector multiplication.
Axis a weighted sum of the columns ofA.- At the same time, each output coordinate is computed row by row.
- Outputs live inside the column space of the matrix.
- This idea leads directly to
Ax = b, column space, and rank.
In the next post, we will move from Ax to matrix-matrix multiplication and interpret it as composition of transformations.
💬 댓글
이 글에 대한 의견을 남겨주세요