DEV Community

Cover image for From 80-Second APIs to Sub-Second: Rebuilding a Geospatial Backend with Async Pipelines
Rahim Ranxx
Rahim Ranxx

Posted on

From 80-Second APIs to Sub-Second: Rebuilding a Geospatial Backend with Async Pipelines

From 80-Second APIs to Sub-Second: Fixing Latency with Async Pipelines (Django + Celery)

Introduction

At some point, every backend engineer hits this wall:

The API works perfectly… until it doesn’t.

I hit that wall with a farm analytics endpoint computing NDVI (Normalized Difference Vegetation Index) from satellite imagery. The system was correct, the logic was sound, and the results were accurate.

But the numbers told a different story:

P95 latency: 1.25 minutes
Enter fullscreen mode Exit fullscreen mode

That’s not an API. That’s a blocking compute job pretending to be one.

This is the story of how I redesigned the system—from a synchronous request-driven model to an asynchronous data pipeline—and brought latency down to sub-second performance (P95 ≈ 725ms).


The Original Architecture (The Hidden Problem)

At first glance, the system looked clean:

[Client]
   ↓
[Django API]
   ↓
[STAC API → Satellite Data]
   ↓
[Raster Processing (NDVI)]
   ↓
[Response]
Enter fullscreen mode Exit fullscreen mode

What happened on each request?

  1. Query satellite imagery via STAC
  2. Fetch raster bands (Red & NIR) from remote storage
  3. Process NDVI using rasterio
  4. Aggregate coverage
  5. Return result

Why this seemed fine

  • It worked locally
  • It returned correct data
  • It followed a “pure API” mindset

But under the hood:

  • Remote I/O (S3-backed satellite data)
  • Heavy raster decoding (JPEG2000)
  • Sequential band reads
  • Full computation per request

The Breaking Point

Logs told the truth.

Each request looked like:

STAC request → ~5s
Raster read (B04) → ~5–10s
Raster read (B08) → ~5–10s
Processing → ~5s+
Total → ~80+ seconds
Enter fullscreen mode Exit fullscreen mode

And the key realization:

I wasn’t building an API—I was executing a geospatial compute pipeline on every request.


The Core Insight

This is the shift that changes everything:

APIs should serve data, not compute it on demand.

The problem wasn’t Python.
The problem wasn’t Django.
The problem was architecture.


The New Architecture (Async Pipeline)

I redesigned the system around asynchronous computation + caching:

             (Scheduled / Triggered)
                    ↓
             [Celery Worker]
                    ↓
         [NDVI Computation Pipeline]
                    ↓
             [Redis / Database]
                    ↓
[Client] → [Django API] → [Cache Lookup]
Enter fullscreen mode Exit fullscreen mode

Key changes

  • NDVI computation moved out of the request path
  • Results cached in Redis
  • Background jobs compute and refresh data
  • API returns instantly (no heavy compute)

Diagram 1 — Before vs After

Before (Request-driven)

Request
   ↓
STAC API
   ↓
Raster I/O
   ↓
NDVI Compute
   ↓
Response (80s)
Enter fullscreen mode Exit fullscreen mode

After (Pipeline-driven)

Request → Cache → Response (~725ms P95)
              ↓ (miss)
         Async Task
              ↓
       Compute + Store
Enter fullscreen mode Exit fullscreen mode

Implementation

1. Fast API Path (Non-blocking)

from django.core.cache import cache
from ndvi.tasks import compute_farm_state_coverage

def get_farm_state(farm_id: int) -> dict:
    cache_key = f"farm_state:{farm_id}"

    data = cache.get(cache_key)
    if data:
        return data

    compute_farm_state_coverage.delay(farm_id=farm_id)

    return {
        "coverage_pct": None,
        "status": "processing"
    }
Enter fullscreen mode Exit fullscreen mode

2. Celery Task (Async Compute)

from celery import shared_task
from django.core.cache import cache

@shared_task(bind=True, autoretry_for=(Exception,), retry_backoff=True)
def compute_farm_state_coverage(self, farm_id: int) -> None:
    coverage = compute_ndvi_coverage(farm_id)

    cache.set(
        f"farm_state:{farm_id}",
        {
            "coverage_pct": coverage,
            "status": "ready"
        },
        timeout=60 * 60 * 6,
    )
Enter fullscreen mode Exit fullscreen mode

3. Daily Backfill (Critical)

from celery import shared_task

@shared_task
def enqueue_daily_farm_state_coverage():
    farm_ids = get_active_farm_ids()

    for farm_id in farm_ids:
        compute_farm_state_coverage.delay(farm_id=farm_id)
Enter fullscreen mode Exit fullscreen mode

Observability (The Real Upgrade)

Metrics added:

  • Task duration
  • Task success/failure
  • Queue depth

Metrics (Grafana Observations)

📊 Grafana Screenshots

1. Latency Graph

725ms on farm get endpoint


Before

  • P95 latency: ~1.25 minutes

After

  • API latency: ~725ms (P95)
  • Background tasks: 60–90s

Before vs After Summary

Metric Before After
API latency 1.25 min ~725 ms (P95)
System type Request-driven Pipeline-driven
Scalability Poor Strong
Observability Minimal Improved

Final Thought

I stopped treating my API like a calculator and started treating my system like a data pipeline.

That’s when everything changed.


Top comments (0)