Most people use PyTorch without really knowing what's happening underneath. This series breaks the foundations down into the simplest possible explanations — one concept at a time, with code you can run and exactly what goes in and comes out.
This is Part 1 of 5. By the end you'll understand the five building blocks every neural network is made of: creating tensors, doing math on them, reshaping them, computing gradients, and bending them with activation functions.
No assumed knowledge. Let's go.
1. What a tensor actually is
Everything in deep learning is built from one object: the tensor. Don't let the name scare you — a tensor is just a box of numbers.
- 1 number → a scalar
- a row of numbers → a vector
- a grid → a matrix
- stacked grids → a tensor
An image is literally a 3D tensor: height × width × colour.
The first skill is creating them — filled with zeros, ones, or any value you want. Then .tolist() reads the tensor back as a plain Python list.
import torch
def create_tensor(method, shape, value=0.0):
if method == "zeros":
t = torch.zeros(shape)
elif method == "ones":
t = torch.ones(shape)
else: # "full"
t = torch.full(shape, value)
return t.tolist()
What goes in and what comes out:
create_tensor("zeros", [2, 3]) -> [[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]
create_tensor("full", [2, 2], 7.0) -> [[7.0, 7.0], [7.0, 7.0]]
💡 Gotcha: a function only hands back a value if you write
return. Forget it and your function silently returnsNone— one of the most common beginner bugs.
2. Doing math on tensors
There are two kinds of math you'll use constantly, and mixing them up is the #1 beginner mistake.
-
Element-wise — same position meets same position.
[1, 2, 3] + [4, 5, 6] = [5, 7, 9]. -
Matrix multiplication (
@) — rows × columns. This one mixes values together, and it's the single most-used operation in all of deep learning. Every layer of every model is a matmul.
import torch
def tensor_op(x, y, op):
a = torch.tensor(x, dtype=torch.float32)
b = torch.tensor(y, dtype=torch.float32)
if op == "add":
result = a + b
elif op == "multiply":
result = a * b
elif op == "matmul":
result = a @ b
elif op == "power":
result = a ** b
else: # "max"
result = torch.maximum(a, b)
return result.tolist()
Input → output:
tensor_op([1,2,3], [4,5,6], "add") -> [5.0, 7.0, 9.0]
tensor_op([[1,2],[3,4]], [[5,6],[7,8]], "matmul") -> [[19.0, 22.0], [43.0, 50.0]]
💡 Two traps:
*is element-wise multiply,@is matrix multiply — completely different operations. And for the element-wise maximum of two tensors, usetorch.maximum(a, b), not Python's built-inmax()(that one can't compare tensors position-by-position).
3. Reshaping tensors
Reshaping means: same numbers, new shape. The data never changes — only how it's arranged. This matters because data arrives in one shape and the next layer expects another. Reshaping is the quiet glue holding a network together.
-
flatten→ squash a grid into a single line -
squeeze→ drop useless size-1 dimensions -
transpose(.T) → flip rows and columns
import torch
def reshape_tensor(x, op):
t = torch.tensor(x, dtype=torch.float32)
if op == "flatten":
result = torch.flatten(t)
elif op == "squeeze":
result = torch.squeeze(t)
else: # "transpose"
result = t.T
return result.tolist()
Input → output:
reshape_tensor([[1,2,3],[4,5,6]], "flatten") -> [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]
reshape_tensor([[1,2],[3,4]], "transpose") -> [[1.0, 3.0], [2.0, 4.0]]
💡 Gotcha:
elseis a catch-all — it never takes a condition. Writingelse op == "transpose":is a syntax error. Justelse:.
4. Autograd — the engine that trains everything
This is the most important idea in deep learning, and PyTorch does the hard part for you.
A gradient is just a slope. It answers one question: "if I nudge this input a little, does the output go up or down, and how steeply?" That slope is what tells a network which direction to adjust its weights to reduce its error.
The trick: mark a tensor with requires_grad=True, do your math, then call .backward(). PyTorch quietly records every operation and computes all the slopes automatically — no calculus by hand.
import torch
def compute_gradient(values):
x = torch.tensor(values, dtype=torch.float32, requires_grad=True)
y = (x**3 + 2*x).sum() # collapse to a single number
y.backward() # walk backward, fill in the slopes
return x.grad.tolist()
Input → output:
compute_gradient([1, 2, 3]) -> [5.0, 14.0, 29.0]
Why those numbers? The slope of x³ + 2x is 3x² + 2. At x = 1 that's 3(1) + 2 = 5. At x = 2 it's 3(4) + 2 = 14. PyTorch produced the exact analytical answer — automatically. That's the whole point: it works even for equations far too big to differentiate by hand.
💡 Two gotchas:
requires_gradneeds floats (integers can't track gradients), and.backward()must start from a single number — that's why we call.sum()first.
5. Activation functions — adding the bend
Here's a surprising fact: if you stack layers that each compute weight × input + bias, the whole stack collapses into a single straight line — even if it's a hundred layers deep. A straight line can't model faces, language, or anything interesting.
Activation functions fix this by adding a bend (a "nonlinearity") after each layer. That bend is what lets a network learn curves and complex patterns.
The common ones:
-
ReLU → cut off negatives:
max(0, x) -
Sigmoid → squash any number into
0…1 -
Tanh → squash into
−1…1 - LeakyReLU → like ReLU, but lets a tiny bit of negatives through so neurons don't "die"
import torch
def activation(x, method):
t = torch.tensor(x, dtype=torch.float32)
if method == "relu":
result = torch.clamp(t, min=0)
elif method == "sigmoid":
result = 1 / (1 + torch.exp(-t))
elif method == "tanh":
result = torch.tanh(t)
else: # "leaky_relu"
result = torch.where(t > 0, t, 0.01 * t)
return result.tolist()
Input → output:
activation([-2,-1,0,1,2], "relu") -> [0.0, 0.0, 0.0, 1.0, 2.0]
activation([-1,0,1], "sigmoid") -> [0.269, 0.5, 0.731]
💡 Gotcha: in
1 / (1 + torch.exp(-t)), the parentheses matter. Without them, Python computes(1/1) + exp(-t)because division runs before addition. When a whole expression is the denominator, wrap it in brackets.
The big picture
Put it together and you have the entire core loop of deep learning:
numbers → make a guess → measure the error → compute the slopes → adjust → repeat
That's it. Tensors hold the numbers. Matrix multiply and activations make the guess. Autograd computes the slopes. Everything else — CNNs, transformers, LLMs — is a remix of these same five ideas.
Coming in Part 2: we take these pieces and assemble them into a real, working neural network from scratch.
If this was useful, follow along — I'm building the whole thing in public, one part at a time.
🔗 I post the short version of each part on X: @Meclin_A_Francis
Top comments (0)