DEV Community: Meclin A Francis

PyTorch from Scratch — Part 1: Tensors, Gradients & Activations

Meclin A Francis — Sat, 06 Jun 2026 14:19:08 +0000

Most people use PyTorch without really knowing what's happening underneath. This series breaks the foundations down into the simplest possible explanations — one concept at a time, with code you can run and exactly what goes in and comes out.

This is Part 1 of 5. By the end you'll understand the five building blocks every neural network is made of: creating tensors, doing math on them, reshaping them, computing gradients, and bending them with activation functions.

No assumed knowledge. Let's go.

1. What a tensor actually is

Everything in deep learning is built from one object: the tensor. Don't let the name scare you — a tensor is just a box of numbers.

1 number → a scalar
a row of numbers → a vector
a grid → a matrix
stacked grids → a tensor

An image is literally a 3D tensor: height × width × colour.

The first skill is creating them — filled with zeros, ones, or any value you want. Then .tolist() reads the tensor back as a plain Python list.

import torch

def create_tensor(method, shape, value=0.0):
    if method == "zeros":
        t = torch.zeros(shape)
    elif method == "ones":
        t = torch.ones(shape)
    else:                       # "full"
        t = torch.full(shape, value)
    return t.tolist()

What goes in and what comes out:

create_tensor("zeros", [2, 3])        -> [[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]
create_tensor("full", [2, 2], 7.0)    -> [[7.0, 7.0], [7.0, 7.0]]

💡 Gotcha: a function only hands back a value if you write return. Forget it and your function silently returns None — one of the most common beginner bugs.

2. Doing math on tensors

There are two kinds of math you'll use constantly, and mixing them up is the #1 beginner mistake.

Element-wise — same position meets same position. [1, 2, 3] + [4, 5, 6] = [5, 7, 9].
Matrix multiplication (@) — rows × columns. This one mixes values together, and it's the single most-used operation in all of deep learning. Every layer of every model is a matmul.

import torch

def tensor_op(x, y, op):
    a = torch.tensor(x, dtype=torch.float32)
    b = torch.tensor(y, dtype=torch.float32)
    if op == "add":
        result = a + b
    elif op == "multiply":
        result = a * b
    elif op == "matmul":
        result = a @ b
    elif op == "power":
        result = a ** b
    else:                       # "max"
        result = torch.maximum(a, b)
    return result.tolist()

Input → output:

tensor_op([1,2,3], [4,5,6], "add")              -> [5.0, 7.0, 9.0]
tensor_op([[1,2],[3,4]], [[5,6],[7,8]], "matmul") -> [[19.0, 22.0], [43.0, 50.0]]

💡 Two traps: * is element-wise multiply, @ is matrix multiply — completely different operations. And for the element-wise maximum of two tensors, use torch.maximum(a, b), not Python's built-in max() (that one can't compare tensors position-by-position).

3. Reshaping tensors

Reshaping means: same numbers, new shape. The data never changes — only how it's arranged. This matters because data arrives in one shape and the next layer expects another. Reshaping is the quiet glue holding a network together.

flatten → squash a grid into a single line
squeeze → drop useless size-1 dimensions
transpose (.T) → flip rows and columns

import torch

def reshape_tensor(x, op):
    t = torch.tensor(x, dtype=torch.float32)
    if op == "flatten":
        result = torch.flatten(t)
    elif op == "squeeze":
        result = torch.squeeze(t)
    else:                       # "transpose"
        result = t.T
    return result.tolist()

Input → output:

reshape_tensor([[1,2,3],[4,5,6]], "flatten")   -> [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]
reshape_tensor([[1,2],[3,4]], "transpose")     -> [[1.0, 3.0], [2.0, 4.0]]

💡 Gotcha: else is a catch-all — it never takes a condition. Writing else op == "transpose": is a syntax error. Just else:.

4. Autograd — the engine that trains everything

This is the most important idea in deep learning, and PyTorch does the hard part for you.

A gradient is just a slope. It answers one question: "if I nudge this input a little, does the output go up or down, and how steeply?" That slope is what tells a network which direction to adjust its weights to reduce its error.

The trick: mark a tensor with requires_grad=True, do your math, then call .backward(). PyTorch quietly records every operation and computes all the slopes automatically — no calculus by hand.

import torch

def compute_gradient(values):
    x = torch.tensor(values, dtype=torch.float32, requires_grad=True)
    y = (x**3 + 2*x).sum()      # collapse to a single number
    y.backward()                # walk backward, fill in the slopes
    return x.grad.tolist()

Input → output:

compute_gradient([1, 2, 3])    -> [5.0, 14.0, 29.0]

Why those numbers? The slope of x³ + 2x is 3x² + 2. At x = 1 that's 3(1) + 2 = 5. At x = 2 it's 3(4) + 2 = 14. PyTorch produced the exact analytical answer — automatically. That's the whole point: it works even for equations far too big to differentiate by hand.

💡 Two gotchas: requires_grad needs floats (integers can't track gradients), and .backward() must start from a single number — that's why we call .sum() first.

5. Activation functions — adding the bend

Here's a surprising fact: if you stack layers that each compute weight × input + bias, the whole stack collapses into a single straight line — even if it's a hundred layers deep. A straight line can't model faces, language, or anything interesting.

Activation functions fix this by adding a bend (a "nonlinearity") after each layer. That bend is what lets a network learn curves and complex patterns.

The common ones:

ReLU → cut off negatives: max(0, x)
Sigmoid → squash any number into 0…1
Tanh → squash into −1…1
LeakyReLU → like ReLU, but lets a tiny bit of negatives through so neurons don't "die"

import torch

def activation(x, method):
    t = torch.tensor(x, dtype=torch.float32)
    if method == "relu":
        result = torch.clamp(t, min=0)
    elif method == "sigmoid":
        result = 1 / (1 + torch.exp(-t))
    elif method == "tanh":
        result = torch.tanh(t)
    else:                       # "leaky_relu"
        result = torch.where(t > 0, t, 0.01 * t)
    return result.tolist()

Input → output:

activation([-2,-1,0,1,2], "relu")     -> [0.0, 0.0, 0.0, 1.0, 2.0]
activation([-1,0,1], "sigmoid")       -> [0.269, 0.5, 0.731]

💡 Gotcha: in 1 / (1 + torch.exp(-t)), the parentheses matter. Without them, Python computes (1/1) + exp(-t) because division runs before addition. When a whole expression is the denominator, wrap it in brackets.

The big picture

Put it together and you have the entire core loop of deep learning:

numbers → make a guess → measure the error → compute the slopes → adjust → repeat

That's it. Tensors hold the numbers. Matrix multiply and activations make the guess. Autograd computes the slopes. Everything else — CNNs, transformers, LLMs — is a remix of these same five ideas.

Coming in Part 2: we take these pieces and assemble them into a real, working neural network from scratch.

If this was useful, follow along — I'm building the whole thing in public, one part at a time.

🔗 I post the short version of each part on X: @Meclin_A_Francis

What are vectors?

Meclin A Francis — Sat, 22 Mar 2025 09:08:06 +0000

Let's start with the basics

Vectors are mathematical tools for describing things that have both size and direction.

Imagine you're describing how to get somewhere. You wouldn't just say "go 25 kilometers," you'd also say which way to go, right?

That's what vectors do! They're mathematical objects that capture both direction and magnitude. So, if a bike travels 25km west from point A to point B, we can neatly represent that information as a vector.

The vector captures the bike's displacement. Its magnitude reveals the distance traveled (15km), and its direction specifies the direction (west).

The vector starts at point B (the tail) and ends at point A (the head). We can represent this vector using its endpoints, writing it as

Let's say we need to illustrate the displacement vector of a bike that travels 15km on a bearing of 60 degrees. What would that diagram look like?

Let's pick a starting point for our vector anywhere on the plane. We'll use a vertical line to show which way is north.

we go 60 degree clockwise from the north

equal vectors !

For vectors to be considered equal, they must have the same length and point in the same direction. The example vectors in the image illustrate this concept.

From the picture above, we can say

Understanding how to add vectors using the triangle law.

To add two vectors together, a visual method is to connect them head-to-tail. to be more descriptive "The sum of two vectors can be found graphically by positioning the starting point (tail) of one vector at the ending point (head) of the other."

Adding 'a' and 'b' involves a simple geometric construction. First, place the tail of vector 'b' at the head of vector 'a'. Then, the vector 'a+b' is the one that connects the tail of 'a' to the head of 'b'.

This method of adding vectors head-to-tail is known as the triangle law, and the resulting vector from this addition is often called the "resultant" vector.

opposite and the zero vector

Imagine a vector going from point A to point B (AB). The opposite vector (BA) travels the same distance but goes from point B back to point A.

we write

To compute the sum of two opposite vectors,

we can use the triangle law of addition.

The triangle law of addition tells us that when adding vectors, the result ("resultant") is a vector drawn from the starting point of the first vector to the ending point of the last. If we end up back where we started (the tail and head meet),

we get a special vector called the zero vector, denoted as 0. It has no direction and zero length, meaning its head and tail are at the same location.

Understanding how to subtract vectors using the triangle law.

We've learned how the triangle law helps us add vectors. But did you know we can also use it for subtraction?

The secret is to realize that subtracting a vector is the same as adding its inverse. So, the operation a - b is identical to a + (-b). Remember, -b is simply b pointing the other way, with the same length. Let's clarify this with an example.

Subtracting vector b from vector a (written as a - b) can be thought of as adding a to the opposite of b (written as a + (-b)). The vector -b has the same length as b but points in the exact opposite direction.

To visualize this, imagine drawing vector a anywhere on a plane. Then, starting from the tip (head) of vector a, draw vector -b. Now, the resulting vector (a - b) is the one that connects the starting point (tail) of a to the ending point (head) of -b. This follows the triangle law of addition, but with -b instead of b."

Figuring out the length or size of a vector.

Let's say we have a vector that goes from point A to point B, and we call it "a". The size or length of this vector – also known as its magnitude or modulus – is written as |AB| or |a|.

For instance, if a vector (let's call it AB) has a length of 8, we'd write that as |AB| = 8.

Now, imagine this: vector "a" points straight up and has a length of 5 (|a|=5). Vector "b" points directly to the right and has a length of 8 (|b|=8). How would we find the length of the vector we get when we add "a" and "b" together (i.e., what is |a+b|)?

The first step is to use the triangle law of vector addition to figure out what the resulting vector (a+b) looks like.

Looking at the diagram, we can see a right triangle is formed. This allows us to use the Pythagorean theorem to find relationships between the sides.

= 25 + 64
= 89

therefore, |a+b|= √ 89 = 9.43

Important: We just used the Pythagorean theorem, but be careful! This only holds true when the vectors 'a' and 'b' are perpendicular (at right angles). This isn't always the case! Next, we'll tackle how to work with vectors that are not at right angles.

Calculating the magnitude of a vector using the law of cosines

if adding vectors using the triangle law doesn't result in a right triangle, we can't rely on the Pythagorean theorem to easily calculate the magnitude of the resulting vector.

For instance, when we add vectors 'a' and 'b' and the resulting triangle isn't a right triangle (instead, it has an angle greater than 90 degrees, called an obtuse angle, theta), the Pythagorean theorem won't work directly.

how ever we can use the law of cosines.

If you have two vectors and you visualize their sum as forming a triangle, you can always calculate the length of the sum vector using the law of cosines.