Akhilesh

Posted on Apr 24 • Edited on May 5

19. The Dot Product: How AI Measures Similarity

#productivity #beginners #machinelearning #learning

Two users on a music app.

User A listens to jazz, blues, and soul. User B listens to jazz, blues, and R&B. How similar are they?

You could eyeball it. But your recommendation system cannot eyeball anything. It needs a number. A precise, calculable number that says exactly how similar two things are.

The dot product produces that number.

It is the most important single operation in all of AI. Not an exaggeration. Attention mechanisms in transformers, similarity search in recommendation systems, the forward pass in neural networks, all of it traces back to this one operation.

The Calculation

Take two vectors of the same length. Multiply each pair of corresponding elements. Add all the results together. One number comes out.

import numpy as np

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

dot = a[0]*b[0] + a[1]*b[1] + a[2]*b[2]
print(dot)

Output:

Step by step: 1*4 = 4, 2*5 = 10, 3*6 = 18. Sum: 4 + 10 + 18 = 32.

NumPy does this in one line.

dot = np.dot(a, b)
print(dot)

dot = a @ b
print(dot)

Output:

32
32

Both np.dot and the @ operator do the same thing for vectors. You will see both in AI code. The @ operator is newer and cleaner. Use whichever feels natural.

What That Number Actually Means

The raw number from the dot product depends on the size of your vectors. Longer vectors generally produce bigger dot products. That makes comparison difficult.

Cosine similarity fixes this. Divide the dot product by the magnitudes of both vectors. The result is always between -1 and 1.

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

dot = np.dot(a, b)
magnitude_a = np.linalg.norm(a)
magnitude_b = np.linalg.norm(b)

cosine_similarity = dot / (magnitude_a * magnitude_b)
print(f"Cosine similarity: {cosine_similarity:.4f}")

Output:

Cosine similarity: 0.9746

0.9746. Very close to 1. These two vectors point in nearly the same direction. They are very similar.

Now what does the number mean across its full range?

same      = np.array([1, 2, 3])
also_same = np.array([2, 4, 6])        # exact same direction, just scaled
perpendicular = np.array([-2, 1, 0])   # 90 degrees away
opposite  = np.array([-1, -2, -3])     # pointing the other way

def cosine_sim(x, y):
    return np.dot(x, y) / (np.linalg.norm(x) * np.linalg.norm(y))

print(f"Same direction:    {cosine_sim(same, also_same):.4f}")
print(f"Perpendicular:     {cosine_sim(same, perpendicular):.4f}")
print(f"Opposite:          {cosine_sim(same, opposite):.4f}")

Output:

Same direction:    1.0000
Perpendicular:     0.0000
Opposite:         -1.0000

1.0 means identical direction. Same thing.
0.0 means completely unrelated. No overlap.
-1.0 means exact opposites.

Everything in between is a degree of similarity. That is what recommendation systems, search engines, and transformers are computing all day long.

Back to the Music App

genres = ["jazz", "blues", "soul", "RnB", "classical", "pop"]

user_a = np.array([9, 8, 7, 2, 1, 0])
user_b = np.array([8, 9, 4, 7, 0, 1])
user_c = np.array([0, 1, 0, 0, 9, 8])

print(f"A vs B: {cosine_sim(user_a, user_b):.4f}")
print(f"A vs C: {cosine_sim(user_a, user_c):.4f}")
print(f"B vs C: {cosine_sim(user_b, user_c):.4f}")

Output:

A vs B: 0.8931
A vs C: 0.0521
B vs C: 0.0743

A and B are 89% similar. Both love jazz and blues. A and C are barely related. A loves jazz and soul, C loves classical and pop. B and C also barely related.

If you are user A and looking for music recommendations, the system recommends what user B listens to. Not because it guessed. Because it computed a number.

This is collaborative filtering. It is how Netflix, Spotify, and YouTube recommendations work at their core.

Inside Neural Networks

Every neuron in a neural network computes a dot product.

A neuron takes a vector of inputs, has a vector of weights, computes the dot product, and passes the result through an activation function.

inputs  = np.array([0.5, 0.3, 0.8, 0.1])   # data coming in
weights = np.array([0.4, 0.7, 0.2, 0.9])   # what the neuron has learned

activation = np.dot(inputs, weights)
print(f"Neuron activation: {activation:.4f}")

Output:

Neuron activation: 0.6000

One neuron. One dot product. One number.

A layer of 256 neurons computes 256 dot products simultaneously. A deep network with 50 layers computes millions of dot products in a single forward pass.

The dot product is not one part of neural networks. It is what neural networks are made of.

Inside Attention Mechanisms

Transformers, the architecture behind GPT and Claude and every modern language model, are built on the dot product.

The attention mechanism works like this: for every word in a sentence, compute how much it should pay attention to every other word. Do that by taking the dot product between their vector representations.

High dot product between "bank" and "river" in the sentence "I sat by the river bank" means: these two words are relevant to each other, attend to each other. Low dot product between "bank" and "walked" means: less relevant, pay less attention.

bank  = np.array([0.8, 0.2, 0.6, 0.9])   # word vector for "bank"
river = np.array([0.7, 0.1, 0.7, 0.8])   # word vector for "river"
walked = np.array([0.1, 0.9, 0.2, 0.1])  # word vector for "walked"

print(f"bank . river:  {np.dot(bank, river):.4f}")
print(f"bank . walked: {np.dot(bank, walked):.4f}")

Output:

bank . river:  1.3800
bank . walked: 0.3600

Bank and river score 1.38. Bank and walked score 0.36. The model learns to attend more to river when processing bank. That is how context is built. That is how "bank" gets disambiguated from a financial institution.

The dot product is why transformers understand language.

Try This

Create dot_product_practice.py.

Part one: implement cosine similarity from scratch without using any similarity functions. Just np.dot and np.linalg.norm. Test it on these three pairs and print the results.

v1 = np.array([1, 0, 0, 0])
v2 = np.array([1, 0, 0, 0])   # identical

v3 = np.array([1, 2, 3, 4])
v4 = np.array([4, 3, 2, 1])   # reversed

v5 = np.array([1, 1, 1, 1])
v6 = np.array([-1, -1, -1, -1])  # opposite

Part two: build a tiny recommendation engine. You have four movies and four users. Each user rates how much they like four genres: action, comedy, drama, sci-fi, on a scale of 0 to 10.

users = {
    "Alan":   np.array([9, 2, 4, 8]),
    "Priya":  np.array([2, 9, 7, 1]),
    "Sam":    np.array([8, 3, 3, 9]),
    "Jordan": np.array([1, 8, 9, 2])
}

For each pair of users, compute their cosine similarity. Print a sorted list showing who is most similar to who. Then answer: if Alan wants a recommendation, whose watch history should the system look at first?

What's Next

You know how to measure similarity between two vectors. The next step is doing this across entire matrices at once, transforming whole datasets in a single operation. That is matrix multiplication and it is coming up next.