This is part 5 of Awesome Curated: The Tools, where I do deep dives on the tools that pass the filter of our automatic curation system. If you landed here directly, I'd recommend starting from post #1 on Docker for Novices to understand how the process works. In the previous post we covered TensorFlow. Today it's its eternal rival — and, spoiler, the one that ended up winning the battle for researchers' hearts.
A couple years ago I was trying to reproduce an NLP paper. Completely normal thing in the academic world: the author publishes the code, you download it, you pray, and you try to get it to run. The paper was from 2019. The code was in TensorFlow 1.x. The absolute mess I got into with the versions, the static graphs, the tf.Session(), the placeholders... I lost half a day. Then I found an unofficial reimplementation in PyTorch. It worked in fifteen minutes. That difference — the feeling that the framework is working with you and not against you — is exactly what I'm going to try to explain in this post.
PyTorch doesn't need an introduction in 2025, but it deserves an honest explanation. Because there's a difference between knowing something exists and understanding why it won.
What it does
PyTorch is an open source machine learning library developed primarily by Meta AI (formerly Facebook AI Research). It's based on Torch, a scientific computing library that came from the Lua world, and since 2016 it's lived in Python as a first-class citizen.
The technical differentiator that defines it is its define-by-run approach (also called dynamic graph or eager execution). Unlike the original TensorFlow, which built a static computation graph and then executed it, PyTorch builds the graph as it executes. That might sound like an implementation detail, but in practice it changes everything: you can use a normal debugger, you can throw a print() in the middle of your neural network and actually see what's happening, you can have real conditional logic with Python ifs and fors.
import torch
import torch.nn as nn
# Simple neural network definition — pure Python, no magic
class SimpleNet(nn.Module):
def __init__(self):
super().__init__()
# One hidden layer with 128 neurons, one output layer with 10 classes
self.layers = nn.Sequential(
nn.Linear(784, 128), # input: flattened 28x28 image
nn.ReLU(), # activation function
nn.Linear(128, 10) # output: 10 classes (e.g. MNIST digits)
)
def forward(self, x):
return self.layers(x)
# Instantiate the network and send it to GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
net = SimpleNet().to(device)
# Autograd computes gradients automatically — free backprop
print(net)
Native GPU support via CUDA is transparent: you move a tensor with .to(device) and that's it. The autograd system automatically computes gradients for any operation you perform on tensors, which means implementing custom backpropagation is surprisingly manageable.
The ecosystem that grew around it is monumental: torchvision for computer vision, torchaudio for audio processing, HuggingFace Transformers (which runs primarily on PyTorch), PyTorch Lightning for structuring the training loop without losing your mind. If you're looking for the official implementation of some paper from the last five years, odds are high it's in PyTorch.
# Basic training loop — this is what Lightning later abstracts away
optimizer = torch.optim.Adam(net.parameters(), lr=1e-3)
criterion = nn.CrossEntropyLoss()
for epoch in range(10):
for images, labels in dataloader: # dataloader iterates the dataset
images = images.to(device)
labels = labels.to(device)
optimizer.zero_grad() # clear gradients from previous step
predictions = net(images) # forward pass
loss = criterion(predictions, labels) # compute error
loss.backward() # backward pass — autograd in action
optimizer.step() # update weights
print(f"Epoch {epoch+1}, Loss: {loss.item():.4f}")
Why it's on the list
It showed up in 6 independent awesome lists. That's not a coincidence. The curation system we use in this series treats that consensus signal as a strong indicator: when different communities, with different criteria, all agree on recommending the same tool, something is going on.
What's going on with PyTorch is that it won the deep learning framework war — and it won it in the most convincing way possible: winning research first, then bleeding into production. Today the majority of papers at NeurIPS, ICML and similar conferences publish code in PyTorch. HuggingFace, which is basically the most important model hub in the world, is built on PyTorch. That creates a brutal flywheel: more researchers → more papers → more code → more adoption → more researchers.
Compared to TensorFlow (which we covered in the previous post), PyTorch has a more pythonic API and a significantly more human debugging experience. TensorFlow clawed back ground with Keras and eager execution, but the research community's perception was already set. For teams that build and experiment fast, PyTorch is the option with the least friction.
Meta's backing guarantees serious development resources. This isn't a hobby project at risk of being abandoned — it's critical infrastructure for one of the biggest players in the AI ecosystem.
When NOT to use it
First and foremost: if you're not doing deep learning, you probably don't need it. For classification, regression, decision trees, clustering — scikit-learn will get you the same result with a tenth of the complexity. PyTorch is a cannon, and not every problem is an elephant.
Second: production deployment has historically been its Achilles' heel. TensorFlow with TFLite or TensorFlow Serving has a longer, more battle-tested track record for serving models at the edge or in high-scale APIs. PyTorch improved this with TorchScript (for serializing models) and ONNX (for exporting to other runtimes), but those tools add real friction — and if you came from the m2cgen post, you already know that sometimes the most elegant solution to deployment is to not bring the framework to production at all.
Third: GPU memory consumption for large models is a world of its own. Without knowledge of the internals — gradient checkpointing, mixed precision, data parallelism — it's easy to run out of VRAM and have no idea why.
Wrapping up
PyTorch is one of those tools that has community consensus not because of marketing but because it solved a real problem better than the competition. The dynamic graph, the pythonic API, the ecosystem that grew around it — everything points in the same direction. If you're getting into deep learning, it's the most reasonable starting point that exists today.
This was entry #5 of Awesome Curated: The Tools. The series continues — every tool that shows up here went through a curation process that combines signal from multiple awesome lists, AI analysis, and my own human verdict. If you want to see the full journey from Docker to here, start from the first post. The next tool is already in the pipeline.
This article was originally published on juanchi.dev
Top comments (0)