[Linear Algebra Series Part 5] Dot Product and Cosine Similarity

What this post covers

This post introduces the dot product and cosine similarity.

Why the dot product is more than “multiply and add”
Why cosine similarity is common in embeddings, retrieval, and recommendation systems
How directional similarity differs from distance
Why the dot product becomes the foundation for orthogonality and projection

Key terms

dot product: an operation that captures both component contribution and directional relation
norm: the rule that gives vector length and appears in cosine similarity
projection: the next concept that reveals the geometric meaning of the dot product

Core idea

Sometimes length and distance are not enough. In many problems, what matters is not absolute size but how closely two vectors point in the same direction. That is where the dot product enters.

In this post, we use the familiar Euclidean dot product. If

u = (u1, u2, ..., un)
v = (v1, v2, ..., vn)

then

u · v = u1v1 + u2v2 + ... + unvn

So the dot product is the sum of componentwise products.

Notation note: we write u · v for the dot product and ||v|| for the norm or length of v. In code, the same ideas often appear as u.dot(v) or np.linalg.norm(v).

That formula looks purely algebraic, but it carries geometry too. The same quantity can also be written as

u · v = ||u|| ||v|| cos(theta)

where theta is the angle between the two vectors. This identity comes from the law of cosines. We will use it as a geometric fact here rather than derive it in detail.

This tells us something important.

The dot product grows when vectors point in similar directions.
It becomes 0 when vectors are orthogonal.
It can become negative when vectors point in opposite directions.
It also depends on length, not only direction.

So the dot product is not “just direction” and not “just magnitude.” It combines both.

Reading the dot product through projection

A good geometric way to read the dot product is through projection. If u is a unit vector, then u · v tells us how much of v lies in the u direction.

For example, let u = (1, 0) and v = (3, 4). Then u · v = 3, so 3 units of v lie in the x-direction. If u were not a unit vector, the result would be a scaled version of that directional contribution.

That is why the dot product is so useful. It does not merely compare two vectors. It tells us how much one vector contributes along a chosen direction.

Why cosine similarity exists

Because the dot product depends on length as well as angle, it can be misleading when you want direction only. That is why cosine similarity is often used.

cos(u, v) = (u · v) / (||u|| ||v||)

Its value always lies between -1 and 1 for real vectors when it is defined.

1: same direction
0: orthogonal
-1: opposite direction

One important warning: if either vector is the zero vector, cosine similarity is undefined because the denominator becomes 0.

Cosine similarity is especially useful in embeddings and retrieval because the question is often “do these vectors point in similar semantic directions?” rather than “are their raw magnitudes similar?”

Step-by-step examples

Example 1) A simple dot product

Let u = (1, 2) and v = (3, 4). Then

u · v = 1*3 + 2*4 = 11

The value 11 matters less than the interpretation: the vectors have strong positive alignment.

Example 2) Orthogonal vectors

Let u = (1, 0) and v = (0, 1). Then

u · v = 0

So the vectors are orthogonal. In 2D this means perpendicular. In higher dimensions we still use the same condition: dot product 0 means orthogonal.

Example 3) A large dot product does not automatically mean “close”

Compare these vectors.

a = (100, 0)
b = (100, 1)

Their dot product is

a · b = 10000

which is very large. But that large value comes mostly from magnitude.

Now compare

c = (1, 0)
d = (0.8, 0.6)

Their dot product is only 0.8, but directionally they are still quite similar.

So if the task is about direction rather than raw scale, cosine similarity is often the better tool.

Example 4) Retrieval with embeddings

In modern search and NLP systems, text is often mapped into high-dimensional vectors called embeddings. The idea is that nearby or similarly directed vectors should represent semantically related content.

Suppose a query becomes vector q, and documents become vectors d1, d2, d3. A retrieval system can compute cosine similarity between q and each document vector, then rank the documents by that score.

This works well because semantic retrieval often cares more about directional alignment than raw embedding length.

Example 5) Scoring in recommendation systems

A user-preference vector u and an item-feature vector i can be combined through

score = u · i

Each axis contributes to the final score, and the total score is the sum of those contributions.

In some systems the vectors are normalized first, making the dot product exactly equal to cosine similarity. In others, magnitude is intentionally kept because activity level or confidence matters.

Math notes

The dot product ties together length, angle, orthogonality, and projection.

In this post, orthogonality is defined using the standard Euclidean dot product, so u · v = 0 means the vectors are orthogonal in that chosen geometry.
Euclidean length can be recovered from the dot product by ||v|| = sqrt(v · v).
Cosine similarity is a similarity measure, not a metric. For example, it does not satisfy the triangle-inequality style behavior we expect from true distances.

So the dot product becomes a central bridge concept. Once you see that bridge, later topics like projection, least squares, and orthogonal bases feel much less disconnected.

A practical caveat: in high-dimensional embeddings, angles can concentrate, and vector magnitude may still contain useful information. So normalization is not automatically the right choice in every task.

Common mistakes

Assuming a large dot product always means two vectors are close

Not necessarily. The dot product can be large simply because the vectors are long.

Assuming cosine similarity is always the best similarity measure

If magnitude itself carries meaning, cosine similarity may throw away useful information.

Treating cosine similarity as if it were a distance metric

It measures directional similarity. It does not automatically satisfy the usual distance properties.

Thinking orthogonality only belongs to 2D right-angle pictures

In higher dimensions, orthogonality is still defined cleanly through the condition u · v = 0.

Practice or extension

Try these by hand.

Compute the dot product of u = (2, 1) and v = (1, 2).
Compute the dot product of u = (1, 1) and v = (1, -1).
Compute cosine similarity after finding each vector's norm.
Explain why cosine similarity is undefined for the zero vector.

Then think about these questions.

Why can the dot product encode direction if it is only “multiply and add”?
Why is cosine similarity often preferred in semantic search?
When might normalization remove information you actually want to keep?

If you want to go further, you can also look up how centered cosine similarity connects to Pearson correlation.

Wrap-up

This post introduced the dot product and cosine similarity.

The dot product combines axis-wise contribution with directional alignment.
Cosine similarity removes length and keeps directional comparison.
Both ideas are common in search, recommendation, and embeddings.
Cosine similarity is undefined for zero vectors and is not the same thing as distance.
These ideas lead directly into orthogonality and projection.

In the next post, we will start reading a matrix not as a static table of numbers, but as a transformation that sends one vector to another.