Rami Kronbi

Posted on Feb 2

Seeing in the Dark: Real-Time Thermal Super-Resolution (That Actually Runs on Edge Devices)

#deeplearning #iot #machinelearning #performance

Thermal cameras are practically superpowers. But there's a catch: unless you have $20,000 to drop on military-grade hardware, thermal vision looks like a blurry, low-res mess.

A few months ago, I was working on a drone project. The goal was simple: strap a thermal camera to a drone and detect objects in real-time. It sounds like something out of a sci-fi movie, flying at night, spotting heat signatures, perfect situational awareness.

However, the "objects" in question were just glowing blobs. A person looked like a smudge; a car looked like a slightly larger smudge. The resolution on affordable thermal sensors is horribly low. For a computer vision model trying to do object detection, this is a nightmare. You can't classify what you can't see.

I had two options:

Buy a high-resolution thermal camera (and I live off of noodles for the rest of my life).
Fix the hardware limitations with software.

I decided to build a Deep Learning model to upscale these low-res thermal images into crisp, high-definition video in real-time.

The Problem: Why Standard AI Failed Me

If you've messed around in image upscaling, you've probably heard of ESRGAN or similar "Super-Resolution" (SR) models. They are fantastic at taking a tiny JPEG and turning it into a 4K wallpaper.

So, why not just use that?

They are too slow. Most state-of-the-art super-resolution models are heavy. They utilize millions of parameters. On a massive GPU, they might run at 5 or 7 FPS. That's fine for photos, but for a drone flying at 15kph? That latency is fatal. By the time the frame is processed, the drone has already crashed into the tree it didn't see.
Thermal is not RGB. Thermal images don't have "colors" in the traditional sense; they have temperature gradients. Standard models trained on ImageNet (cats, dogs, and cars) hallucinate textures that don't exist in heat maps. They try to add "fur" to a heat blob.

I needed an architecture that was lightweight, incredibly fast, and understood the physics of heat.

The Solution: Enter IMDN (and a lot of coffee)

I settled on an architecture called IMDN (Information Multi-Distillation Network).

Without getting too bogged down in the math (the code is onsuper resolution GitHub if you want the nerdy-details), the brilliance of IMDN is that it doesn't try to reconstruct the entire image at every single layer.

Instead, it uses a "distillation" process. It extracts features, keeps what's useful, and passes the rest down the line. This drastically reduces the computational cost.

What is also interesting about this model is that you can train it to upscale to any scale you want (2x, 3x, 4x, 5x, etc.). You aren't limited to fixed increments, giving you the flexibility to balance resolution and speed exactly how your project needs it.

Implementing the architecture was tricky, specifically adapting the Information Distillation Blocks (IDB) to handle single-channel thermal data effectively without losing the high-frequency details (the sharp edges where hot meets cold).

But the architecture was only half the battle.

The Secret Struggle: The Data Nightmare

In Deep Learning, everyone talks about the model, but the real war is won in the dataset.

There is no "ImageNet for Thermal Super-Resolution" that you can just download and hit train. I had to get creative. I spent weeks pulling data from widely different sources, and manually curating a massive mixed dataset.

This was the hardest part of the project. Thermal data is noisy and the resolutions vary wildly. I had to clean, normalize, and align thousands of images to create a "Ground Truth" that the model could actually learn from.

I also used a transfer learning trick: leveraging weights from RGB domains and "teaching" them to interpret thermal gradients, which gave the model a head start on understanding edges and shapes.

The Results: Breaking the Real-Time Barrier

After weeks of training and tweaking the loss functions to prioritize thermal contrast, the results were… honestly, better than I expected.

The Metrics:

PSNR: 34.2 dB (This is the signal-to-noise ratio. Anything above 30 is considered excellent quality).
SSIM: 0.840 (Structural Similarity - meaning the upscaled image actually looks like the original scene, not a hallucination).

You can see the difference immediately. The "blob" on the left becomes a distinct object with edges and shape on the right.

The Speed Test

This is where the IMDN architecture shines. On my laptop (RTX 3070), the model achieves:

~130 FPS at 2x scale
~60 FPS at 4x scale

That is absurdly fast. That's not just "real-time"; that's "faster than the camera can record."

The "Whoa" Moment: Edge Deployment

Here's the thing, the drone can't carry my laptop :) However, it carried an Nvidia Jetson Orin

Before delving into how it ran on the Jetson, it is important to note that what is considered real-time for thermal images differs from RGB. A thermal camera has at best 20 FPS acquisition rate, so running the model at 20–30 FPS is considered realtime, since you're utilizing all the bandwith of the camera.

To Achieve 20–30 FPS on Jetson, the following tweaks were made:

Pipeline was implemented in C++
Model was converted to TensorRT (~97% conversion accuracy)
Multithreaded Inference with some optimizations

30 FPS on an edge device is the holy grail. It means you can run this super-resolution model inline with your object detection model. The drone sees the low-res thermal frame, upscales it to HD, and detects the object , all in less than 20 milliseconds.

Why This Matters

This isn't just about making cooler-looking images. This is about accessibility.

High-resolution thermal cameras cost a fortune. By using efficient AI, we can take a cheap, low-res sensor and simulate the performance of a sensor that costs 10x as much.

For search and rescue drones, autonomous vehicles driving at night, or industrial monitoring, this is a game changer. We can finally have high-fidelity thermal vision without the high-fidelity price tag.

Attribution

I have open-sourced the code for this, however a bit of attribution would be nice :)

Portfolio: Rami Kronbi
LinkedIn: Rami Kronbi
GitHub: Kronbii
SRC: Thermal Super Resolution

Top comments (1)

shemith mohanan • Feb 2

This is genuinely impressive. The focus on speed + physics-aware design instead of chasing bigger models really shows. Hitting real-time on Jetson while avoiding hallucinated thermal detail is no small feat. Also appreciate the honest take on data being the real bottleneck — that part resonated a lot. Great work making something practical, not just academic.