DEV Community: Devansh Bajaj

PyTorch vs. JAX: The Skier and the Highway (Understanding Execution Paradigms)

Devansh Bajaj — Sat, 20 Jun 2026 11:41:44 +0000

If you skip the standard video tutorials and dive straight into the source code to build neural networks from the ground up, you quickly realize a fundamental truth: PyTorch and JAX are not just different syntax libraries. They represent entirely opposite philosophies on how to talk to a GPU.

One is an object-oriented tape recorder. The other is a functional compiler.

To really understand Eager Execution versus XLA Compilation, forget the math for a second and look at the shape of the computation graph. Think of it as The Skier versus The Highway.

PyTorch: The Skier (Define-by-Run) (img 1)

PyTorch relies on dynamic eager execution. When you run a forward pass, PyTorch acts like a tape recorder, logging your mathematical operations instantly as Python executes them.

The Analogy: You are a skier carving down a mountain, but you are building the trail exactly where your skis touch the snow. If your data triggers a Python if statement or a for loop midway through the network, PyTorch doesn't care—it just dynamically draws a new path to the left. You have absolute freedom to change routes mid-flight based on real-time conditions.

The Bottleneck: Because you demand the freedom to alter the graph dynamically, you are forced to carry the Python interpreter on your back. It has to shout directions to the GPU at every single turn. That constant Python-to-GPU communication creates an overhead bottleneck.

JAX: The Highway (Define-and-Run) (img 2)

JAX strips away statefulness. It demands pure, functional Python code because it doesn't build graphs on the fly. It uses a tracer.

The Analogy: When you call jax.grad, JAX drives down your route once with a dummy "tracer" vehicle. It maps every tensor shape, memory allocation, and mathematical step into an intermediate static graph. Then, it hands that map to XLA (Accelerated Linear Algebra). XLA acts as a paving crew. It fuses separate operations together into a massive, straight autobahn.

When your actual data flows through, there are no traffic lights, no dynamic intersections, and absolutely zero Python interpreter overhead. The hardware just executes a single, highly optimized binary block.

The Bottleneck: The tracer absolutely hates standard Python control flow. If you put a standard if condition: inside a JAX function, the compiler crashes because it doesn't know which branch to bake into the static highway.

The Verdict: Which to Choose?

Framework selection at the highest engineering tiers isn't about API preference; it's about the shape of your computation.

Choose PyTorch if you are prototyping highly dynamic systems. If you are building Agentic AI routing, complex data-dependent loops, or architectures where the graph changes unpredictably, you need the flexibility of the skier.

Choose JAX if your architecture is static. If you are building a massive Transformer model where the data flows through the exact same matrix multiplications every single iteration, XLA compilation will maximize your hardware utilization.

Stop fighting the frameworks. Understand how they compile, and pick the right tool for the terrain.

From NumPy to JAX: My First "Aha!" Moments with Accelerated AI

Devansh Bajaj — Sat, 30 May 2026 07:39:12 +0000

Building open-source solutions for my 100 Days of AI Agents challenge meant I needed to start looking at frameworks that scale better than standard NumPy and PyTorch. That inevitably led me to JAX.

Transitioning to JAX requires a bit of a paradigm shift. If you are used to the standard Python data science stack, JAX forces you to rewire how you think about array operations, memory, and hardware execution.

I spent today digging into the core mechanics, and I want to share my top 3 takeaways and the exact code snippets that made it click for me.

1. Immutability is a Feature, Not a Bug

This was my first major roadblock. In standard NumPy, if you want to change an element in an array, you just reassign it in place.

Python

import numpy as np
x = np.arange(10)
x[0] = 10
print(x) # Output: [10 1 2 3 4 5 6 7 8 9]
If you try the exact same thing in JAX, it screams at you: TypeError: JAX arrays are immutable.

JAX arrays (jax.Array) cannot be changed once created. This is a core design principle that enables JAX's functional programming nature and automatic differentiation. To update an array, JAX provides an indexed update syntax that returns an updated copy:

Python

**import jax.numpy as jnp
x = jnp.arange(10)
y = x.at[0].set(10)

print(y) # Output: [10 1 2 3 4 5 6 7 8 9]
print(x) # Output: 0 1 2 3 4 5 6 7 8 9.**

The Catch: This does create memory overhead since you are creating copies, but it completely eliminates the side-effects that make distributed computing a nightmare.

2. Native Hardware Awareness & Sharding

JAX arrays inherently know where they live. You don't have to jump through hoops to figure out if your data is on the CPU, GPU, or TPU.

By default, JAX pushes operations to the fastest available accelerator. Running this locally on my MSI Raider, I can easily inspect exactly where my array is stored using .devices():

Python

x.devices()

Output: {CpuDevice(id=0)}

More importantly, JAX arrays can be sharded across multiple devices for parallel execution. You can inspect this via the .sharding attribute:

Python

x.sharding

** Output: SingleDeviceSharding(device=CpuDevice(id=0), memory_kind=device)**

It feels built from the ground up for modern hardware scaling.

3. The Magic of JIT Compilation

By default, JAX executes operations one at a time, in sequence (just like standard Python). But if you wrap a function with Just-In-Time (jax.jit) compilation, JAX optimizes the entire sequence of operations and runs them all at once.

I wrote a simple normalization function to test this:

Python

**from jax import jit
import jax.numpy as jnp
import numpy as np

def norm(X):
X = X - X.mean(0)
return X / X.std(0)

norm_compiled = jit(norm)
**

Generate some dummy data

**np.random.seed(22)
X = jnp.array(np.random.rand(100000, 10))
I benchmarked both functions using %timeit (adding .block_until_ready() to account for JAX's asynchronous dispatch). The results were immediate:

Standard Execution: 1.52 ms ± 16.3 μs per loop

JIT Execution: 1.16 ms ± 26.2 μs per loop**

Because the compiler knows the exact blueprint of the execution beforehand, it speeds things up significantly. The only limitation? Not all JAX code can be JIT compiled—it requires array shapes to be static and known at compile time.

What's Next?
This is just scratching the surface. My next deep dive is going to cover functional randomness (jax.random), automatic differentiation (jax.grad), and automatic vectorization (jax.vmap).

Has anyone else here made the jump to JAX recently? What was your biggest learning curve? Drop a comment below!

ShinigamiFlanker0208 / JAX

Defeating the Whiteboard Dead-End: How I Built FlowGenix

Devansh Bajaj — Fri, 29 May 2026 14:43:12 +0000

The "tax on creativity" is a real problem for developers and students alike. We all know the feeling: you finish a massive brainstorming session, the whiteboard is covered in brilliant system designs, and you snap a quick photo before erasing it.

The problem? That photo usually becomes a digital dead-end. It sits in your gallery—static, unstructured, and impossible to interact with.

That is exactly why I built FlowGenix—an AI-Powered Whiteboard to Digital Canvas Platform. I wanted to build a tool capable of transforming those ephemeral whiteboard photos into structured, completely digital assets.

The System Architecture & AI Foundation

Building FlowGenix required much more than just slapping an OCR wrapper onto an image.

As a Software Engineer focused on Generative AI and scalable system architecture, I needed to ensure the AI could actually understand the spatial relationships of what was drawn.

I broke the core development down into two main technical pillars:

1.The Intelligent Ingestion Pipeline:
I architected an end-to-end pipeline that leverages the Google Vision model. Instead of just reading text, this pipeline accurately identifies handwriting and extracts precise spatial bounding boxes from the raw image.

2.The Context-Aware Reasoning Engine:
Raw bounding boxes aren't useful without logic. To bridge this gap, I developed a context-aware reasoning engine. This engine takes the unstructured spatial inputs from the Vision model and maps them into a systematic, spatially arranged knowledge graph. It intelligently connects the nodes and edges so that the final digital output perfectly reflects the original paper or whiteboard layout.

The Outcome
Turning unstructured ideas into a clean, digital canvas was an intense technical grind, but delivering high-performance, user-centric solutions is what I am passionate about.

The architecture proved its worth when FlowGenix won 1st Place at IBM Day, taking top honors for both the project's presentation and its underlying technical architecture.