Here is something most AI tutorials hide from you.
A neural network layer is matrix multiplication.
Not kind of. Not metaphorically. When your network takes an input and transforms it through a hidden layer, what is actually happening is one matrix multiplied by another. Strip away the jargon about neurons and activation functions and layers, and underneath it all there is a matrix multiply.
Understanding this does not just satisfy curiosity. It tells you why layers work, why shapes matter so much in PyTorch, and why debugging deep learning errors almost always involves staring at shape mismatches.
Building From the Dot Product
You already know the dot product. Take two vectors of the same length, multiply element by element, add everything up, get one number.
Matrix multiplication is just many dot products computed at once.
Take every row from the left matrix. Take every column from the right matrix. Compute the dot product between each pair. The result of each dot product becomes one element in the output matrix.
That is the whole operation.
import numpy as np
A = np.array([
[1, 2],
[3, 4],
[5, 6]
])
B = np.array([
[7, 8, 9],
[10, 11, 12]
])
C = A @ B
print(C)
print(C.shape)
Output:
[[ 27 30 33]
[ 61 68 75]
[ 95 106 117]]
(3, 3)
A is (3, 2). B is (2, 3). Result is (3, 3).
Let's verify one element manually. Top left corner of C, which is 27.
Row 0 of A: [1, 2]
Column 0 of B: [7, 10]
Dot product: 1*7 + 2*10 = 7 + 20 = 27. Correct.
The Shape Rule You Cannot Break
Matrix multiplication has exactly one hard rule.
The number of columns in the left matrix must equal the number of rows in the right matrix.
(m x n) @ (n x p) = (m x p)
The inner dimensions must match. The outer dimensions become the result shape.
A = np.ones((3, 4))
B = np.ones((4, 5))
C = np.ones((3, 5))
print((A @ B).shape) # (3, 5) works: inner dims both 4
try:
result = A @ C # (3,4) @ (3,5) fails: 4 != 3
except ValueError as e:
print(f"Error: {e}")
Output:
(3, 5)
Error: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 3 is different from 4)
This error message will be your companion throughout deep learning. Every time you see a shape mismatch, trace back to which matrix multiply has incompatible inner dimensions.
What the Multiplication Actually Does
Matrix multiplication transforms vectors.
Take any input vector. Multiply it by a matrix. You get a new vector. The transformation can scale it, rotate it, project it into a different number of dimensions, anything.
input_vector = np.array([1.0, 0.5, 0.8]) # 3 features
weight_matrix = np.array([
[0.2, 0.8],
[0.5, 0.3],
[0.7, 0.1]
]) # (3 x 2): transforms 3 features into 2
output = input_vector @ weight_matrix
print(f"Input shape: {input_vector.shape}")
print(f"Weight shape: {weight_matrix.shape}")
print(f"Output shape: {output.shape}")
print(f"Output: {output}")
Output:
Input shape: (3,)
Weight shape: (3, 2)
Output shape: (2,)
Output: [0.81 1.21]
Three features went in. Two features came out. The weight matrix determined how the transformation happened. This is a neural network layer. Literally this. One input vector, one weight matrix, one output vector.
When you train a neural network, you are finding the weight matrix values that make this transformation useful for your task.
Multiple Samples at Once
In real training you do not transform one sample at a time. You process a whole batch simultaneously.
batch = np.array([
[1.0, 0.5, 0.8], # sample 1
[0.3, 0.9, 0.2], # sample 2
[0.7, 0.4, 0.6], # sample 3
[0.1, 0.8, 0.9] # sample 4
]) # shape: (4, 3)
weights = np.array([
[0.2, 0.8],
[0.5, 0.3],
[0.7, 0.1]
]) # shape: (3, 2)
output = batch @ weights
print(f"Batch shape: {batch.shape}")
print(f"Output shape: {output.shape}")
print(output)
Output:
Batch shape: (4, 3)
Output shape: (4, 2)
[[0.81 1.21]
[0.44 0.59]
[0.76 0.98]
[0.56 0.65]]
Four samples in. Four transformed samples out. Each row is one sample's transformed features. The entire batch processed in one operation, no loop required.
This is why GPUs exist. Matrix multiplication on large batches is the dominant computation in deep learning. Modern GPUs are essentially specialized matrix multiplication engines.
Stacking Layers
One matrix multiply is one layer. Stack them and you have a deep network.
input_data = np.random.randn(8, 10) # 8 samples, 10 input features
W1 = np.random.randn(10, 6) # layer 1: 10 inputs -> 6 hidden
W2 = np.random.randn(6, 4) # layer 2: 6 hidden -> 4 hidden
W3 = np.random.randn(4, 2) # layer 3: 4 hidden -> 2 outputs
hidden1 = input_data @ W1
hidden2 = hidden1 @ W2
output = hidden2 @ W3
print(f"Input: {input_data.shape}")
print(f"Layer 1: {hidden1.shape}")
print(f"Layer 2: {hidden2.shape}")
print(f"Output: {output.shape}")
Output:
Input: (8, 10)
Layer 1: (8, 6)
Output: (8, 4)
Output: (8, 2)
8 samples traveling through a three-layer network. Each layer transforms the representation. 10 features shrink to 6, then to 4, then to 2 final outputs.
This is a neural network. Not a metaphor. Not a diagram. This is the math.
In real code, you also add biases and pass through activation functions at each layer. But the matrix multiplication is the engine. Everything else is decoration.
Why Shape Errors Are So Common
Look at the shape rule again.
(m x n) @ (n x p)
In real networks, n can be 768. Or 4096. Or 12288. One number off anywhere and the whole thing crashes. You forget to transpose something. You load pretrained weights with the wrong configuration. You change a hyperparameter that affects output dimensions without updating the next layer.
Shape errors are not a sign you do not understand deep learning. Every deep learning engineer deals with them constantly. The fix is always the same: print the shapes, trace through the chain of matrix multiplies, find where the inner dimensions stop matching.
def check_shapes(matrices, names):
for name, mat in zip(names, matrices):
print(f"{name}: {mat.shape}")
W1 = np.random.randn(10, 6)
W2 = np.random.randn(6, 4)
W3_wrong = np.random.randn(5, 2) # bug: should be (4, 2)
check_shapes([W1, W2, W3_wrong], ["W1", "W2", "W3"])
Output:
W1: (10, 6)
W2: (6, 4)
W3: (5, 2)
W2 outputs 4. W3 expects 5. That mismatch causes the error. Print shapes before multiplying. It saves hours.
Try This
Create matmul_practice.py.
Part one: build a three layer network manually for a classification task.
np.random.seed(42)
X = np.random.randn(100, 8) # 100 samples, 8 features
Create weight matrices W1, W2, W3 so that the data flows from shape (100, 8) through (100, 16), then (100, 8), then finally (100, 3). Three output values per sample.
Print the shape at every step to confirm it works. Do not proceed until each shape is correct.
Part two: you are given two matrices but one needs to be transposed before they can be multiplied.
A = np.random.randn(5, 3)
B = np.random.randn(5, 4)
Figure out how to get a result with shape (3, 4). Hint: one of them needs .T. Write the multiplication and confirm the output shape.
Part three: compute matrix multiplication by hand for these two tiny matrices without using @ or np.dot. Use only loops and addition. Then verify your answer matches NumPy.
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
What's Next
Data organized. Similarity measured. Transformations applied.
Now comes the question the whole series has been building toward. How does a model actually learn? How does it look at its wrong predictions and figure out which direction to adjust its weights?
The answer is derivatives. Specifically, understanding the slope of the error surface and which way to walk to reduce it. That is the next post.
Top comments (0)