Jimin Lee

Posted on Dec 15, 2025 • Originally published at Medium

PyTorch Tensor Internals: Storage, Shape, and Stride Explained

#deeplearning #ai #pytorch

If you’ve spent any time with PyTorch, you’ve almost certainly hit this wall:

RuntimeError: view size is not compatible with input tensor's size and stride...

It’s the moment you throw your hands up, think, "I just want to change the shape, why is this so hard?" and swap .view() for .reshape() just to make the error go away.

But have you ever wondered why that error happens? Or how PyTorch actually manages Tensors under the hood?

Today, we’re going to open up the hood and look at the mechanics of Storage, Shape, and Stride.

Note: This guide assumes you’re familiar with the basics of PyTorch and have used view, reshape, or transpose in your code.

1. The Double Life of a Tensor

A PyTorch Tensor is effectively "two-faced." It has an internal implementation where the data lives, and an external presentation that we interact with.

Storage: Where the actual data bytes sit in memory.
Metadata: How that data is presented to us (the API).

Storage: The Reality Check

Let's start with a simple 2x3 matrix.

import torch

# Create a 2x3 tensor
t = torch.tensor([[1, 2, 3], [4, 5, 6]])

print(t.shape)
# Output: torch.Size([2, 3])

As expected, t is a 2x3 tensor.

However, t isn't stored as a 2D grid in your RAM. Regardless of the shape you define, tensors are stored physically as a 1-dimensional array.

[1, 2, 3, 4, 5, 6]

This 1D array is the Storage. It doesn't care about your dimensions; it just holds the numbers.

So, if the data is always a flat line, how does PyTorch know to display it as a matrix? That’s determined by two critical pieces of metadata: Shape and Stride.

2. The Blueprint: Shape

Shape is the blueprint that tells PyTorch, "Here is a flat line of data; here is how I want you to slice it up visually."

1D (6,): Treat the data as one long strip of 6 items.
2D (2, 3): Chop the data every 3 items to make a row. Stack 2 of these rows.
3D (2, 1, 3): Take those 3-item rows, stack 1 of them to make a "plane," and stack 2 of those planes.

There is one golden rule here:

The product of all numbers in the Shape must equal the total length of the Storage.

If your Shape is (2, 3), then 2 X 3 = 6, which matches our 6 data points. If you try to make it (2, 4), 2 X 4 = 8. You don't have enough data, so PyTorch throws an error.

Once the Shape is set, we need to know how to navigate that data. That’s where Stride comes in.

3. The Art of Movement: Stride

Now that we have the skeleton (Shape), we need to know how to walk through it.

Stride answers the question: "How many steps do I need to skip in memory to get to the next index?"

Let's look at our (2, 3) tensor t again.

Logical View:

[[1, 2, 3],
 [4, 5, 6]]

Physical Storage:

[1, 2, 3, 4, 5, 6]

Question 1:

If I am at t0, 0 and I want to move one step to the right to t0, 1, how do I move in Storage?

I go from index 0 to index 1.
Move: 1 step.

Question 2:

If I am at t[0, 0] (value: 1) and I want to move one step down to t[1, 0] (value: 4), how do I move in Storage?

I have to skip the entire first row (3 items).
I go from index 0 to index 3.
Move: 3 steps.

Defining Stride

So, what is the Stride of t? It is (3, 1).

The first number (3) is the stride for the rows (dimension 0). To go to the next row, skip 3 items in memory.
The second number (1) is the stride for the columns (dimension 1). To go to the next column, skip 1 item in memory.

This gives us a navigation formula:

Storage Index = (Index_Dim0 X Stride_Dim0) + (Index_Dim1 X Stride_Dim1) + ...

Let's find the physical location of t[1, 2]:

index = (1 X 3) + (2 X 1) = 5

Checking our storage list... index 5 is indeed 6.

How is Stride Calculated?

By default, Strides are calculated in reverse order based on the Shape.

Take a tensor with Shape (Channel: 2, Height: 3, Width: 4).

Width (Innermost): To move 1 step in width, we move 1 step in memory.
Height: To move 1 step in height, we must skip a whole width row. That's 4 steps.
Channel (Outermost): To move 1 step in channel, we must skip a whole Height x Width plane. That's 3 X 4 = 12 steps.

The stride would be (12, 4, 1).

4. The Magic of View: Zero-Copy Operations

Now let's use .view(). This function changes the Shape of the tensor.

t = torch.tensor([[1, 2, 3], [4, 5, 6]])

# Flatten the 2x3 tensor to 1x6
flat_t = t.view(6)

# --- [Internals of t] ---
# Storage: [1, 2, 3, 4, 5, 6]  <-- Address: @0x100
# Shape  : (2, 3)
# Stride : (3, 1)

# --- [Internals of flat_t] ---
# Storage: [1, 2, 3, 4, 5, 6]  <-- Address: @0x100 (SAME ADDRESS!)
# Shape  : (6,)
# Stride : (1,)

Key Takeaway:

flat_t is a new tensor object, but it points to the exact same memory address as t. No data was copied.

PyTorch simply created a new Metadata wrapper (new Shape (6,), new Stride (1,)) and slapped it onto the existing Storage. This is why view() is incredibly fast—it’s a lightweight metadata operation.

5. Playing with Dimensions: Squeeze and Unsqueeze

The popular unsqueeze and squeeze methods work the same way.

Unsqueeze: "Add a wrapper"

vec = torch.tensor([1, 2, 3])
# Shape: (3), Stride: (1)

# Add a batch dimension at index 0
unsqueezed_vec = vec.unsqueeze(0)

# --- [Internals] ---
# Storage: [1, 2, 3]   <-- Shared
# Shape  : (1, 3)      <-- Dimension added
# Stride : (3, 1)      <-- Stride updated

Squeeze: "Remove the wrapper"

row_vec = torch.tensor([[1, 2, 3]])
# Shape: (1, 3), Stride: (3, 1)

squeezed_vec = row_vec.squeeze()

# --- [Internals] ---
# Storage: [1, 2, 3]   <-- Shared
# Shape  : (3)
# Stride : (1)

Again, zero memory copying. Just math on the metadata.

6. The Troubleshooting Begins: Transpose

Everything is peaceful until transpose() enters the chat.

t = torch.tensor([[1, 2, 3], [4, 5, 6]])
# Storage: [1, 2, 3, 4, 5, 6]
# Shape: (2, 3)
# Stride: (3, 1)

# Transpose the matrix
t_transposed = t.t()

# --- [Internals of t_transposed] ---
# Storage: [1, 2, 3, 4, 5, 6]  <-- Still shared!
# Shape  : (3, 2)              <-- Logically 3x2
# Stride : (1, 3)              <-- !!! LOOK HERE !!!

Wait a minute.

If we had created a fresh 3x2 tensor, the stride should be (2, 1).

But here, the stride is (1, 3).

Why? Because PyTorch didn't move the physical data. It just tricked the metadata.

Visualizing the mess:

Logically, t_transposed looks like this:

[[1, 4],
 [2, 5],
 [3, 6]]

But physically, the storage is still [1, 2, 3, 4, 5, 6].

To read the first row [1, 4]:

Read index 0 (Value 1).
To get 4, we have to skip 3 steps in storage (index 3).

This effectively breaks the standard "row-major" contiguous layout. PyTorch calls this state non-contiguous.

7. The Diagnostic Tool: .is_contiguous()

You don't need to calculate strides in your head. PyTorch provides a boolean check:

# 1. The original tensor
t = torch.tensor([[1, 2, 3], [4, 5, 6]])
print(f"Is t contiguous? {t.is_contiguous()}")
# Output: True

# 2. The transposed tensor
t_transposed = t.t()
print(f"Is t_transposed contiguous? {t_transposed.is_contiguous()}")
# Output: False (Strides are messed up!)

8. How "Contiguous" is Checked

Does is_contiguous() check every single data point? No. It just validates the math.

The Rule:

Starting from the innermost dimension (backwards), the Stride of the current dimension must equal:

(Next Dimension's Stride) X (Next Dimension's Shape)

The last dimension's stride must be 1.

Here is a Python implementation of that logic:

def check_contiguous_logic(tensor):
    shape = tensor.shape
    stride = tensor.stride()

    expected_stride = 1

    # Loop backwards through dimensions
    for i in range(len(shape) - 1, -1, -1):
        # If the actual stride doesn't match the math...
        if stride[i] != expected_stride:
            return False

        expected_stride *= shape[i]

    return True

t = torch.tensor([[1, 2, 3], [4, 5, 6]])
print(check_contiguous_logic(t)) # True

t_t = t.t()
print(check_contiguous_logic(t_t)) # False

9. Catching the Culprit

Now, let's look at why view() crashes.

If you try to view() a non-contiguous tensor:

try:
    # t_transposed is non-contiguous
    t_transposed.view(6)
except RuntimeError as e:
    print("Error:", e)

# Result: RuntimeError: view size is not compatible with input tensor's size and stride...

The translation of that error is:

"You asked me to view this as a flat 1D array of 6 items. That requires the data to be physically stored as 1, 4, 2, 5, 3, 6. But your storage is actually 1, 2, 3, 4, 5, 6. I cannot map this using just metadata changes. I give up."

10. The Fixes: .contiguous() vs .reshape()

To fix this, we have to bite the bullet and rearrange the physical storage.

Method 1: .contiguous() - The Manual Fix

# Force the data to be physically rearranged
t_contiguous = t_transposed.contiguous()

print(f"Is it contiguous now? {t_contiguous.is_contiguous()}") # True!

# --- [Internals of t_contiguous] ---
# Storage: [1, 4, 2, 5, 3, 6]  <-- NEW Address! Data Copied!
# Shape  : (3, 2)
# Stride : (2, 1)              <-- Standard stride restored

.contiguous() creates a new memory allocation and copies the data over in the correct order.

Method 2: .reshape() - The "Just Do It" Fix

# reshape handles the logic for you
t_reshaped = t_transposed.reshape(6)

# --- [Internals] ---
# Storage: [1, 4, 2, 5, 3, 6]  <-- It called .contiguous() internally
# Shape  : (6,)
# Stride : (1,)

.reshape() is a convenience wrapper. It checks if the tensor is contiguous.

If it is, it returns a view (fast, zero-copy).
If it is not, it performs a copy (equivalent to .contiguous().view()).

The Cost of Convenience

"Can't I just use reshape for everything?"

And technically, you’re right. reshape() is incredibly smart. If your tensor is behaving itself (contiguous), reshape acts exactly like view—fast and efficient. If your tensor is "tangled" (non-contiguous), reshape automatically creates a copy to handle it safely.

However, knowing whether you are copying data or not is critical for performance engineering. Are you getting a free ride, or are you paying a hidden tax?

Let’s stop guessing and look at the numbers.

import torch
import time

# Create a massive 10,000 x 10,000 tensor (Approx. 380MB)
size = 10000
x = torch.randn(size, size)

print("=== 1. The Happy Path (Contiguous Tensor) ===")
start = time.time()
x.view(-1)
print(f"View (Zero-Copy): {time.time() - start:.6f} seconds")

start = time.time()
x.reshape(-1)
print(f"Reshape (Zero-Copy): {time.time() - start:.6f} seconds")
# Conclusion: Both are blazing fast because reshape defaults to view internally.

print("\n=== 2. The Twisted Path (Non-Contiguous Tensor) ===")
y = x.t() # Mess up the memory layout with a transpose

try:
    y.view(-1)
except RuntimeError:
    print("View: Failed! (Error raised as expected)")

start = time.time()
y.reshape(-1) # A physical copy happens here!
print(f"Reshape (Copy Occurred): {time.time() - start:.6f} seconds")

Results on a standard machine:

=== 1. The Happy Path (Contiguous Tensor) ===
View (Zero-Copy): 0.000238 seconds
Reshape (Zero-Copy): 0.000040 seconds

=== 2. The Twisted Path (Non-Contiguous Tensor) ===
View: Failed! (Error raised as expected)
Reshape (Copy Occurred): 0.157830 seconds

The Verdict

When Contiguous (Normal): view and reshape are neck-and-neck. There is virtually no difference.
When Non-Contiguous (Twisted): reshape handles the situation silently, but it is roughly 10,000 times slower than the zero-copy operation because it has to physically move memory.

Here is the danger: If you use reshape inside a training loop (like a DataLoader or a custom layer), you might be paying a massive memory copy tax without ever realizing it.

This is why I recommend getting into the habit of using view. Think of view as a statement of intent: "I refuse to copy memory!"

If view throws an error, it forces you to acknowledge the issue. You can then consciously decide to fix it using .contiguous() or switch to .reshape() knowing exactly what the cost is.

Summary

Storage is a flat 1D array. PyTorch tries very hard not to move it.
Metadata (Shape & Stride) determines how that flat array is interpreted as dimensions.
View only changes metadata. It fails if the requested shape contradicts the physical memory layout.
Contiguous means the memory layout perfectly matches the shape (row-major).
Reshape will fix the error by copying data, but be aware of the performance cost.

Next time you see that RuntimeError, you'll know exactly what's happening under the hood. It’s not just a shape mismatch; it’s a stride conflict.

DEV Community