DEV Community

Cover image for When Time Became a Variable — Notes From My Journey With Numba ⚡
Shreyan Ghosh
Shreyan Ghosh

Posted on

When Time Became a Variable — Notes From My Journey With Numba ⚡

I wasn’t chasing performance at first.

I was deep inside some heavy computation — image processing, remote sensing, NumPy-heavy workflows — and things were taking too long.

While everyone’s sleeping, I’m out here crunching heat maps and chasing anomalies at 3 AM on christmas. Santa didn’t bring gifts this year — he brought publication-worthy datas. 🎅🔥

That’s when I stumbled upon Numba.

What began as a normal experimentation loop slowly turned into a waiting game. Iterations stretched. Feedback slowed curiosity down. And Numba didn’t enter my workflow as a “speed hack” — it entered as a way to bring thinking and computation back into sync.

And that changed how I work with performance entirely.


🧠 Why Numba Feels Different To Use

NumPy is already powerful, but some workloads naturally gravitate toward loops:

  • pixel / cell-level transformations
  • iterative grid passes
  • rolling & stencil-style operations
  • custom kernels that don’t exist in libraries

These are mathematically honest — but painfully slow in Python.

Numba compiles those functions to optimized machine code through LLVM (via @njit), which means:

  • Python syntax stays
  • compiled execution takes over
  • the bottleneck disappears

To make it happy, I had to:

  • keep data shapes predictable
  • avoid Python objects in hot paths
  • think about memory as something physical

That discipline didn’t just make things faster.

It made the code clearer.


⚡ What The Numbers Look Like (From Numba Benchmarks)

From Numba’s documentation and example workloads, parallel compilation can deliver dramatic CPU-scale gains:

Variant Time Notes
NumPy implementation ~5.8s Interpreter overhead + limited parallelism
@njit single-threaded ~700ms Big win already
@njit(parallel=True) ~112ms Multithreaded + vectorized

That’s ~5× faster than NumPy, and significantly faster than non-parallel JIT on CPU-bound loops.

But I wanted to see what this looked like in my own environment.

So I benchmarked it.


🧪 My Local Benchmark (20,000,000-element loop)

Same logic. Same data. Three execution models:

Variant Median Runtime Min Runtime Speedup vs Python
Python + NumPy loop (GIL-bound) 2.5418 s 2.5327 s
Numba (@njit, single-threaded) 0.0150 s 0.0147 s ~170×
Numba Parallel (@njit(parallel=True)) 0.0057 s 0.0054 s ~445×

local machine ss

I stared at that table for a second and just laughed — the difference is wild.

The pattern was impossible to ignore:

  • Python loop = fine for logic, terrible for math
  • Numba JIT = removes interpreter overhead
  • Parallel Numba = unleashes full CPU cores

And the biggest effect wasn’t just speed.

It was shortened feedback cycles.


🧵 Why Numba Beats Normal Python For CPU Workloads

Pure Python is limited by the GIL.

Even if you create threads, only one runs Python bytecode at a time. Multiprocessing helps, but adds IPC + serialization overhead.

Inside a compiled Numba function:

  • the GIL is released
  • operations run as native machine code
  • loops scale across CPU cores (when safe to parallelize)

Conceptually:

Approach Threads Behavior
Pure Python loop 🚫 GIL-bound Slow
NumPy ufuncs ✅ Multithreaded internally Fast enough
@njit ❗ Single-thread machine code Much faster
@njit(parallel=True) ✅ Multithreaded + SIMD Fastest

When your workload lives inside numeric loops, parallel=True feels like adding oxygen.


🧩 “Interactive” Comparison Block

🔍 Before: Pure Python Loop

Slow. Interpreter overhead. GIL-bound.

Best used for logic, not computation.

⚙️ After: Numba JIT-Compiled Loop

  • compiled via LLVM
  • CPU-native execution
  • predictable performance

Feels like Python, behaves like C.

🚀 Parallel Numba (prange + parallel=True)

  • spreads work across CPU cores
  • releases the GIL inside hot loops
  • ideal for pixel / grid workloads

Where Numba truly shines on CPUs.


🎁 Underrated Numba Features I Learned To Appreciate

cache=True
Reuse compiled code across runs.

nopython=True
Forces discipline. Reveals hidden Python objects.

parallel=True + prange
Turns heavy loops into multithreaded kernels.

fastmath=True
Lets the compiler vectorize aggressively (when numerics allow).

But the biggest gift wasn’t raw performance.

It was momentum.

Research cycles shifted from:

write → run → wait → context-switch

into:

write → run → iterate

And curiosity stayed in motion.


⚖️ Real-World Caveats That Matter

Numba isn’t a silver bullet.

  • first call includes compile warm-up
  • debugging inside JIT code can sting
  • sometimes NumPy is already optimal
  • chaotic control-flow doesn’t JIT well

It works best when:

  • logic is numeric
  • loops are intentional
  • computation is meaningful

It isn’t glitter.

It’s a performance contract.


🧭 What Numba Changed In How I Write Code

It nudged me to:

  • separate meaningful loops from accidental ones
  • design transformations with purpose
  • treat performance as part of expression

Somewhere between algorithms and hardware, Numba didn’t just make my code faster.

It made exploration lighter.

Top comments (0)