DEV Community

Jimin Lee
Jimin Lee

Posted on • Originally published at Medium

PyTorch Tensor Internals: Storage, Shape, and Stride Explained

If you’ve spent any time with PyTorch, you’ve almost certainly hit this wall:

RuntimeError: view size is not compatible with input tensor's size and stride...

It’s the moment you throw your hands up, think, "I just want to change the shape, why is this so hard?" and swap .view() for .reshape() just to make the error go away.

But have you ever wondered why that error happens? Or how PyTorch actually manages Tensors under the hood?

Today, we’re going to open up the hood and look at the mechanics of Storage, Shape, and Stride.

Note: This guide assumes you’re familiar with the basics of PyTorch and have used view, reshape, or transpose in your code.


1. The Double Life of a Tensor

A PyTorch Tensor is effectively "two-faced." It has an internal implementation where the data lives, and an external presentation that we interact with.

  • Storage: Where the actual data bytes sit in memory.

  • Metadata: How that data is presented to us (the API).

Storage: The Reality Check

Let's start with a simple 2x3 matrix.

import torch

# Create a 2x3 tensor
t = torch.tensor([[1, 2, 3], [4, 5, 6]])

print(t.shape)
# Output: torch.Size([2, 3])
Enter fullscreen mode Exit fullscreen mode

As expected, t is a 2x3 tensor.

However, t isn't stored as a 2D grid in your RAM. Regardless of the shape you define, tensors are stored physically as a 1-dimensional array.

[1, 2, 3, 4, 5, 6]
Enter fullscreen mode Exit fullscreen mode

This 1D array is the Storage. It doesn't care about your dimensions; it just holds the numbers.

So, if the data is always a flat line, how does PyTorch know to display it as a matrix? That’s determined by two critical pieces of metadata: Shape and Stride.

2. The Blueprint: Shape

Shape is the blueprint that tells PyTorch, "Here is a flat line of data; here is how I want you to slice it up visually."

  • 1D (6,): Treat the data as one long strip of 6 items.

  • 2D (2, 3): Chop the data every 3 items to make a row. Stack 2 of these rows.

  • 3D (2, 1, 3): Take those 3-item rows, stack 1 of them to make a "plane," and stack 2 of those planes.

There is one golden rule here:

The product of all numbers in the Shape must equal the total length of the Storage.

If your Shape is (2, 3), then 2 X 3 = 6, which matches our 6 data points. If you try to make it (2, 4), 2 X 4 = 8. You don't have enough data, so PyTorch throws an error.

Once the Shape is set, we need to know how to navigate that data. That’s where Stride comes in.

3. The Art of Movement: Stride

Now that we have the skeleton (Shape), we need to know how to walk through it.

Stride answers the question: "How many steps do I need to skip in memory to get to the next index?"

Let's look at our (2, 3) tensor t again.

Logical View:

[[1, 2, 3],
 [4, 5, 6]]
Enter fullscreen mode Exit fullscreen mode

Physical Storage:

[1, 2, 3, 4, 5, 6]

Question 1:

If I am at t0, 0 and I want to move one step to the right to t0, 1, how do I move in Storage?

  • I go from index 0 to index 1.

  • Move: 1 step.

Question 2:

If I am at t[0, 0] (value: 1) and I want to move one step down to t[1, 0] (value: 4), how do I move in Storage?

  • I have to skip the entire first row (3 items).

  • I go from index 0 to index 3.

  • Move: 3 steps.

Defining Stride

So, what is the Stride of t? It is (3, 1).

  • The first number (3) is the stride for the rows (dimension 0). To go to the next row, skip 3 items in memory.

  • The second number (1) is the stride for the columns (dimension 1). To go to the next column, skip 1 item in memory.

This gives us a navigation formula:

Storage Index = (Index_Dim0 X Stride_Dim0) + (Index_Dim1 X Stride_Dim1) + ...

Let's find the physical location of t[1, 2]:

index = (1 X 3) + (2 X 1) = 5

Checking our storage list... index 5 is indeed 6.

How is Stride Calculated?

By default, Strides are calculated in reverse order based on the Shape.

Take a tensor with Shape (Channel: 2, Height: 3, Width: 4).

  1. Width (Innermost): To move 1 step in width, we move 1 step in memory.

  2. Height: To move 1 step in height, we must skip a whole width row. That's 4 steps.

  3. Channel (Outermost): To move 1 step in channel, we must skip a whole Height x Width plane. That's 3 X 4 = 12 steps.

The stride would be (12, 4, 1).

4. The Magic of View: Zero-Copy Operations

Now let's use .view(). This function changes the Shape of the tensor.

t = torch.tensor([[1, 2, 3], [4, 5, 6]])

# Flatten the 2x3 tensor to 1x6
flat_t = t.view(6)

# --- [Internals of t] ---
# Storage: [1, 2, 3, 4, 5, 6]  <-- Address: @0x100
# Shape  : (2, 3)
# Stride : (3, 1)

# --- [Internals of flat_t] ---
# Storage: [1, 2, 3, 4, 5, 6]  <-- Address: @0x100 (SAME ADDRESS!)
# Shape  : (6,)
# Stride : (1,)
Enter fullscreen mode Exit fullscreen mode

Key Takeaway:

flat_t is a new tensor object, but it points to the exact same memory address as t. No data was copied.

PyTorch simply created a new Metadata wrapper (new Shape (6,), new Stride (1,)) and slapped it onto the existing Storage. This is why view() is incredibly fast—it’s a lightweight metadata operation.

5. Playing with Dimensions: Squeeze and Unsqueeze

The popular unsqueeze and squeeze methods work the same way.

Unsqueeze: "Add a wrapper"

vec = torch.tensor([1, 2, 3])
# Shape: (3), Stride: (1)

# Add a batch dimension at index 0
unsqueezed_vec = vec.unsqueeze(0)

# --- [Internals] ---
# Storage: [1, 2, 3]   <-- Shared
# Shape  : (1, 3)      <-- Dimension added
# Stride : (3, 1)      <-- Stride updated
Enter fullscreen mode Exit fullscreen mode

Squeeze: "Remove the wrapper"

row_vec = torch.tensor([[1, 2, 3]])
# Shape: (1, 3), Stride: (3, 1)

squeezed_vec = row_vec.squeeze()

# --- [Internals] ---
# Storage: [1, 2, 3]   <-- Shared
# Shape  : (3)
# Stride : (1)
Enter fullscreen mode Exit fullscreen mode

Again, zero memory copying. Just math on the metadata.

6. The Troubleshooting Begins: Transpose

Everything is peaceful until transpose() enters the chat.

t = torch.tensor([[1, 2, 3], [4, 5, 6]])
# Storage: [1, 2, 3, 4, 5, 6]
# Shape: (2, 3)
# Stride: (3, 1)

# Transpose the matrix
t_transposed = t.t()

# --- [Internals of t_transposed] ---
# Storage: [1, 2, 3, 4, 5, 6]  <-- Still shared!
# Shape  : (3, 2)              <-- Logically 3x2
# Stride : (1, 3)              <-- !!! LOOK HERE !!!
Enter fullscreen mode Exit fullscreen mode

Wait a minute.

If we had created a fresh 3x2 tensor, the stride should be (2, 1).

But here, the stride is (1, 3).

Why? Because PyTorch didn't move the physical data. It just tricked the metadata.

Visualizing the mess:

Logically, t_transposed looks like this:

[[1, 4],
 [2, 5],
 [3, 6]]
Enter fullscreen mode Exit fullscreen mode

But physically, the storage is still [1, 2, 3, 4, 5, 6].

To read the first row [1, 4]:

  1. Read index 0 (Value 1).

  2. To get 4, we have to skip 3 steps in storage (index 3).

This effectively breaks the standard "row-major" contiguous layout. PyTorch calls this state non-contiguous.

7. The Diagnostic Tool: .is_contiguous()

You don't need to calculate strides in your head. PyTorch provides a boolean check:

# 1. The original tensor
t = torch.tensor([[1, 2, 3], [4, 5, 6]])
print(f"Is t contiguous? {t.is_contiguous()}")
# Output: True

# 2. The transposed tensor
t_transposed = t.t()
print(f"Is t_transposed contiguous? {t_transposed.is_contiguous()}")
# Output: False (Strides are messed up!)
Enter fullscreen mode Exit fullscreen mode

8. How "Contiguous" is Checked

Does is_contiguous() check every single data point? No. It just validates the math.

The Rule:

Starting from the innermost dimension (backwards), the Stride of the current dimension must equal:

(Next Dimension's Stride) X (Next Dimension's Shape)

The last dimension's stride must be 1.

Here is a Python implementation of that logic:

def check_contiguous_logic(tensor):
    shape = tensor.shape
    stride = tensor.stride()

    expected_stride = 1

    # Loop backwards through dimensions
    for i in range(len(shape) - 1, -1, -1):
        # If the actual stride doesn't match the math...
        if stride[i] != expected_stride:
            return False

        expected_stride *= shape[i]

    return True

t = torch.tensor([[1, 2, 3], [4, 5, 6]])
print(check_contiguous_logic(t)) # True

t_t = t.t()
print(check_contiguous_logic(t_t)) # False
Enter fullscreen mode Exit fullscreen mode

9. Catching the Culprit

Now, let's look at why view() crashes.

If you try to view() a non-contiguous tensor:

try:
    # t_transposed is non-contiguous
    t_transposed.view(6)
except RuntimeError as e:
    print("Error:", e)

# Result: RuntimeError: view size is not compatible with input tensor's size and stride...
Enter fullscreen mode Exit fullscreen mode

The translation of that error is:

"You asked me to view this as a flat 1D array of 6 items. That requires the data to be physically stored as 1, 4, 2, 5, 3, 6. But your storage is actually 1, 2, 3, 4, 5, 6. I cannot map this using just metadata changes. I give up."

10. The Fixes: .contiguous() vs .reshape()

To fix this, we have to bite the bullet and rearrange the physical storage.

Method 1: .contiguous() - The Manual Fix

# Force the data to be physically rearranged
t_contiguous = t_transposed.contiguous()

print(f"Is it contiguous now? {t_contiguous.is_contiguous()}") # True!

# --- [Internals of t_contiguous] ---
# Storage: [1, 4, 2, 5, 3, 6]  <-- NEW Address! Data Copied!
# Shape  : (3, 2)
# Stride : (2, 1)              <-- Standard stride restored
Enter fullscreen mode Exit fullscreen mode

.contiguous() creates a new memory allocation and copies the data over in the correct order.

Method 2: .reshape() - The "Just Do It" Fix

# reshape handles the logic for you
t_reshaped = t_transposed.reshape(6)

# --- [Internals] ---
# Storage: [1, 4, 2, 5, 3, 6]  <-- It called .contiguous() internally
# Shape  : (6,)
# Stride : (1,)
Enter fullscreen mode Exit fullscreen mode

.reshape() is a convenience wrapper. It checks if the tensor is contiguous.

  1. If it is, it returns a view (fast, zero-copy).

  2. If it is not, it performs a copy (equivalent to .contiguous().view()).

The Cost of Convenience

"Can't I just use reshape for everything?"

And technically, you’re right. reshape() is incredibly smart. If your tensor is behaving itself (contiguous), reshape acts exactly like view—fast and efficient. If your tensor is "tangled" (non-contiguous), reshape automatically creates a copy to handle it safely.

However, knowing whether you are copying data or not is critical for performance engineering. Are you getting a free ride, or are you paying a hidden tax?

Let’s stop guessing and look at the numbers.

import torch
import time

# Create a massive 10,000 x 10,000 tensor (Approx. 380MB)
size = 10000
x = torch.randn(size, size)

print("=== 1. The Happy Path (Contiguous Tensor) ===")
start = time.time()
x.view(-1)
print(f"View (Zero-Copy): {time.time() - start:.6f} seconds")

start = time.time()
x.reshape(-1)
print(f"Reshape (Zero-Copy): {time.time() - start:.6f} seconds")
# Conclusion: Both are blazing fast because reshape defaults to view internally.

print("\n=== 2. The Twisted Path (Non-Contiguous Tensor) ===")
y = x.t() # Mess up the memory layout with a transpose

try:
    y.view(-1)
except RuntimeError:
    print("View: Failed! (Error raised as expected)")

start = time.time()
y.reshape(-1) # A physical copy happens here!
print(f"Reshape (Copy Occurred): {time.time() - start:.6f} seconds")
Enter fullscreen mode Exit fullscreen mode

Results on a standard machine:

=== 1. The Happy Path (Contiguous Tensor) ===
View (Zero-Copy): 0.000238 seconds
Reshape (Zero-Copy): 0.000040 seconds

=== 2. The Twisted Path (Non-Contiguous Tensor) ===
View: Failed! (Error raised as expected)
Reshape (Copy Occurred): 0.157830 seconds
Enter fullscreen mode Exit fullscreen mode

The Verdict

  1. When Contiguous (Normal): view and reshape are neck-and-neck. There is virtually no difference.

  2. When Non-Contiguous (Twisted): reshape handles the situation silently, but it is roughly 10,000 times slower than the zero-copy operation because it has to physically move memory.

Here is the danger: If you use reshape inside a training loop (like a DataLoader or a custom layer), you might be paying a massive memory copy tax without ever realizing it.

This is why I recommend getting into the habit of using view. Think of view as a statement of intent: "I refuse to copy memory!"

If view throws an error, it forces you to acknowledge the issue. You can then consciously decide to fix it using .contiguous() or switch to .reshape() knowing exactly what the cost is.

Summary

  • Storage is a flat 1D array. PyTorch tries very hard not to move it.

  • Metadata (Shape & Stride) determines how that flat array is interpreted as dimensions.

  • View only changes metadata. It fails if the requested shape contradicts the physical memory layout.

  • Contiguous means the memory layout perfectly matches the shape (row-major).

  • Reshape will fix the error by copying data, but be aware of the performance cost.

Next time you see that RuntimeError, you'll know exactly what's happening under the hood. It’s not just a shape mismatch; it’s a stride conflict.

Top comments (0)