[Linear Algebra Series Part 7] What Does Matrix-Vector Multiplication Actually Do?

한국어 버전

What this post covers

This post explains matrix-vector multiplication.

  • What Ax means at both the component level and the structural level
  • Why Ax can be read as a linear combination of columns
  • How the row view and column view complement each other
  • Why this matters for linear layers, feature mixing, and output spaces

Key terms

  • matrix: a rule that maps vectors from one space to another
  • vector: the input and output object in matrix-vector multiplication
  • linear transformation: the viewpoint that turns Ax into more than a formula

Core idea

If the previous post treated a matrix as a transformation, this post explains what the actual calculation Ax is doing.

In this series, we use the column-vector convention. We write a column vector like [1; 3], where the semicolon means the entries are stacked vertically. If A is m x n, then x must have n entries, and the product Ax is an m x 1 vector.

At first glance, matrix-vector multiplication looks like a row-by-row calculation: each output component is the dot product of one row of A with x.

That is correct, but the more important structural interpretation is:

Ax = a weighted sum of the columns of A

If the columns of A are a1, a2, ..., an and the components of x are x1, x2, ..., xn, then

Ax = x1 a1 + x2 a2 + ... + xn an

So the output is made by mixing the columns of the matrix according to the entries of the input vector.

This is the viewpoint that later becomes span, column space, and rank. For now, you can read it in plain language as: the output has to be built from the matrix columns.

The row view also matters

The same calculation can be read row-wise.

  • Row view: each output coordinate is computed as a row-dot-product with the input.
  • Column view: the whole output vector is built as a combination of columns.

Both views are valid. The row view is great for seeing how each output number is computed. The column view is great for understanding what outputs are even possible.

Step-by-step examples

Example 1) A small calculation

Let

A = [1 2
     3 4]

x = [5
     6]

Then

Ax = [1*5 + 2*6
      3*5 + 4*6]
   = [17
      39]

That is the row view.

Now read the same computation through columns. The columns of A are [1; 3] and [2; 4], so

Ax = 5[1; 3] + 6[2; 4]

Same answer, different interpretation. The row view is computational; the column view is structural and geometric.

Example 2) Outputs you can and cannot make

Consider

A = [1 2
     2 4]

The second column is just twice the first. So for any input x = [x1; x2],

Ax = x1[1; 2] + x2[2; 4]
   = x1[1; 2] + 2x2[1; 2]
   = (x1 + 2x2)[1; 2]

That means every output lies on the single line spanned by [1; 2]. The matrix cannot produce a vector like [1; 0], because [1; 0] is not on that line.

So the matrix does not produce every possible vector in R^2. It only produces vectors inside a restricted output set. Later we will call that set the column space of A.

Example 3) Feature mixing in a linear layer

In a neural network, Wx does not merely copy the input features. It mixes them into new output features. For example, if x = [red; green; blue] and W is a 2 x 3 matrix, then Wx can produce two learned features such as [feature1; feature2], each built as a weighted mix of the RGB values.

That is why matrix-vector multiplication is a good model for “re-expressing data in a new feature space.”

In practice, frameworks may store tensors with batch dimensions or use transposed layouts, so the same math may appear with different shapes in code. But the underlying idea is the same.

Example 4) Coordinate transformation

When a matrix acts on a point or displacement vector in 2D, it moves that object according to one consistent rule. So Ax is simultaneously:

  • one concrete output calculation, and
  • one example of how the whole space is transformed.

Math notes

  • Reading Ax as a column combination shows that every output lies in the column space of A, meaning the set of all linear combinations of its columns.
  • That is why the equation Ax = b is really asking whether b belongs to the column space.
  • If A is m x n, then it has m rows and n columns, so it maps an n-dimensional input to an m-dimensional output.

So matrix-vector multiplication is not just arithmetic. It is the gateway to understanding what a matrix can produce.

Common mistakes

Memorizing the rule as pure mechanics

If you only remember “row times column,” you can compute numbers but miss why the calculation matters.

Assuming any output vector is possible

No. The outputs must lie inside the space generated by the matrix columns.

Thinking the row view and column view compete with each other

They do not. They answer different questions.

Mixing up input and output dimension

An m x n matrix takes n-dimensional inputs and produces m-dimensional outputs. This matters constantly in model code.

Practice or extension

  1. If a 3 x 2 matrix has two linearly independent columns, what dimension can its output space have?
  2. What is the output dimension of a 2 x 3 matrix multiplied by a 3 x 1 vector?
  3. Why does the column view make Ax = b easier to interpret?
  4. Describe the same multiplication once through rows and once through columns.

A good exercise is to compute the same example in both ways and confirm the answers match.

Wrap-up

This post gave a structural reading of matrix-vector multiplication.

  • Ax is a weighted sum of the columns of A.
  • At the same time, each output coordinate is computed row by row.
  • Outputs live inside the column space of the matrix.
  • This idea leads directly to Ax = b, column space, and rank.

In the next post, we will move from Ax to matrix-matrix multiplication and interpret it as composition of transformations.

💬 댓글

이 글에 대한 의견을 남겨주세요