DEV Community

Cover image for There Are Not 3 But Total 5 Dimensions
Piyush Yadav
Piyush Yadav

Posted on

There Are Not 3 But Total 5 Dimensions

We all grew up learning the same thing.

1D Line.
2D Flat shape(x,y).
3D The world we live in(x,y,z).
_
That's it. 3 dimensions. End of story.
_
And honestly? I never questioned it. Why would I?

Then I started learning PyTorch and someone casually dropped the term "4D tensor" in a tutorial like it was nothing.
I panicked a little. I thought, okay, are we doing physics now? Is this quantum stuff? Do I need to understand space-time to do machine learning?

No.

Turns out I just didn't know what dimensions actually meant in data.
And once I got it everything clicked.

Wait, What Even Is a Tensor?
Before I get to the 5 dimensions thing, let me just quickly explain what a tensor is because I didn't know either.

A tensor is just numbers, organized in a specific shape.
That's genuinely it.

One number alone → that's a scalar
A row of numbers → that's a 1D tensor
Rows stacked into a grid → that's a 2D tensor
Grids stacked on top of each other → 3D
And it keeps going...

In PyTorch you'd write something like:

import torch

# 1D - just a simple list of numbers
t = torch.tensor([10, 20, 30, 40])
print(t.shape)  # torch.Size([4])

# 2D - a grid, like a table
t2 = torch.tensor([[1, 2, 3], [4, 5, 6]])
print(t2.shape)  # torch.Size([2, 3])
Enter fullscreen mode Exit fullscreen mode

Easy enough. But now here's where things got interesting for me.

The Part That Actually Surprised Me
In school, dimensions described space.
**In PyTorch, dimensions describe structure of **data.

Completely different thing, Same word.
And when you apply this to images and videos it goes all the way up to 5 dimensions. Here's how:

Dimension 1 — Just a List

t = torch.tensor([255, 120, 89, 200])
# shape: [4]
Enter fullscreen mode Exit fullscreen mode

A simple row of numbers. Could be pixel brightness values, Could be anything

Dimension 2 — A Grayscale Image

img = torch.zeros(28, 28)
# shape: [28, 28]
Enter fullscreen mode Exit fullscreen mode

A black and white photo is literally just a grid of numbers.
Every single pixel is one number, how bright it is
0 means black,
255 means white,

everything in between is a shade of grey.

That's a 2D tensor. Height × Width. That's it.

Dimension 3 — A Color Image

color_img = torch.zeros(3, 224, 224)
# shape: [3, 224, 224]
Enter fullscreen mode Exit fullscreen mode

Okay this one got me.

_A color image has three color channels — Red, Green, Blue.
_Each channel is its own separate grid of numbers, Stack all three on top of each other and we will get a 3D tensor.

So it's not "3D" like a cube. It's three flat grids, layered.

Red layer      224 × 224 numbers
Green layer    224 × 224 numbers
Blue layer     224 × 224 numbers
─────────────────────────────────
Stack them     shape: [3, 224, 224]

Every photo on your phone, Every image on Instagram, This is how it actually lives in memory.

**Dimension 4  A Batch of Color Images**
Enter fullscreen mode Exit fullscreen mode


python
batch = torch.zeros(32, 3, 224, 224)

shape: [32, 3, 224, 224]


When you train an AI model, you don't show it one image at a time that would take forever, You show it 32 images at once, or 64, or 128. That's called a batch.

So now we have one more dimension: **how many images are in this batch?** 
_32 images, Each with 3 color channels. Each channel is 224×224.
_

Four numbers. One line. An entire batch of photos.


**Dimension 5 — A Batch of Videos**

Enter fullscreen mode Exit fullscreen mode


python
video_batch = torch.zeros(8, 16, 3, 224, 224)

shape: [8, 16, 3, 224, 224]




And here's the one that genuinely stopped me for a second.

A video is just photos played one after another frames. So we add one more dimension for how many frames each video has.

_**This right here is 8 videos, each with 16 frames, each frame being a 224×224 color image.**_

Five numbers. That's a batch of videos encoded in a single tensor shape.


**So What's the Difference Between School Dimensions and PyTorch Dimensions?**

Let me just put it plainly:

![Difference table](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/5qhajqrqnkwo6d6f4deq.png)


In School, a "dimension" told you about *where* something exists.

In PyTorch, a "dimension" tells you *how* your data is organized.

Same word. Totally different meaning. And once you separate the two in your head everything about tensors starts making sense.


**The Table I Wish Someone Showed Me on Day 1**

![Tensor table](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/z3zpigwhtsbuqqiemdgn.png)


If you made it this far **Honestly, Thank you.** It means this connected with you in some way, and that's exactly why I write.
I'm just someone learning in public, sharing things as they connect for me. If you have any questions, want to discuss something, or just want to say hi — drop me a mail at _**piyutejyadav2004@gmail.com**_, I reply to everyone.
And if you want to connect along as I go deeper into PyTorch and machine learning. let's connect on [LinkedIn](https://www.linkedin.com/in/piyushai/). I'd love to have you on this journey.
Enter fullscreen mode Exit fullscreen mode

Top comments (0)