DEV Community

Cover image for Building a Resilient NDVI Pipeline with Redis Streams (Event-Driven Architecture)
Rahim Ranxx
Rahim Ranxx

Posted on

Building a Resilient NDVI Pipeline with Redis Streams (Event-Driven Architecture)

A practical breakdown of moving an NDVI processing pipeline from a synchronous design to an event-driven architecture using Redis Streams — including concurrency challenges, distributed locking pitfalls, and production-safe patterns.


Introduction

Most pipelines work — until concurrency and failure expose their limits.

At first, processing NDVI (Normalized Difference Vegetation Index) data seems straightforward:

receive a request

process imagery

return results

But once you introduce:

concurrent jobs

long-running processing

distributed components

you’re no longer building a simple pipeline.

You’re designing a distributed system.

This article walks through how I transformed an NDVI processing pipeline from a synchronous model into an event-driven architecture using Redis Streams, and the real-world engineering challenges that came with it.


System Overview

The system is built using:

Django REST Framework (backend API)

Nextcloud (client-facing integration layer)

Celery (asynchronous task processing)

Redis Streams (event ingestion and coordination)


The Initial Architecture (Synchronous Design)

Client → API → Celery Task → NDVI Processing → Result

This design works well at small scale, but it introduces hidden risks when the system grows.


The Core Problems

  1. Tight Coupling

The request lifecycle is directly tied to processing.

If processing fails:

the request fails

the user experiences errors

retries become difficult


  1. Concurrency Issues

When multiple requests target the same job:

Request A ─┐
├──> Same Job → Duplicate Processing
Request B ─┘

This leads to:

duplicated work

inconsistent outputs

race conditions


  1. Fragile Execution Model

Without coordination:

jobs execute immediately

no buffering exists

failure handling is reactive, not controlled


The Shift to Event-Driven Architecture

To solve these issues, I introduced Redis Streams and redesigned the system into an event-driven model.


New Architecture (Event-Driven Pipeline)

Client → API → Redis Stream → Consumer → Celery → Processing


Why Redis Streams?

Redis Streams provide:

Event buffering (decouples ingestion from execution)

At-least-once delivery (ensures reliability)

Ordered processing

Scalability for distributed systems


What Changed

Instead of executing tasks immediately:

The API publishes events to a Redis Stream

A stream consumer controls task execution

Celery workers process jobs asynchronously

This separates:

ingestion

scheduling

execution


Distributed Locking: The Critical Bug

To prevent duplicate processing, a locking mechanism was introduced.

The naive approach:

cache.delete(lock_key)

This looks harmless — but in distributed systems, it’s dangerous.


Why This Fails

Consider this sequence:

  1. Process A acquires a lock

  2. The lock expires

  3. Process B acquires the same lock

  4. Process A deletes the lock

Now:

Process B is running without protection

This creates a race condition — one of the hardest problems in distributed systems.


The Fix: Token-Based Distributed Locking

To solve this, each lock is assigned a unique token.

SET lock_key = token_A (TTL)

Release only if:
stored_token == token_A

Key Principles

Only the owner of the lock can release it

If ownership does not match → do nothing

TTL ensures eventual cleanup

This ensures:

safe concurrency

no accidental unlocks

predictable system behavior


Stream Consumer Design

Redis Streams operate with:

At-least-once delivery semantics

This means:

messages can be delivered more than once

consumers must be idempotent


Consumer Processing Flow

Read → Validate → Enqueue → Acknowledge

Critical Rule

Never acknowledge a message before it is safely enqueued.


Idempotency and Reliability

To handle duplicate events:

processing must be idempotent

tasks must tolerate retries

state transitions must be safe

This is essential in any event-driven system.


Final Architecture (Layered System Design)

The system now operates in clear layers:

  1. Ingestion Layer

receives requests

publishes events

  1. Stream Layer

buffers and orders events

decouples system components

  1. Consumer Layer

controls execution

validates and dispatches tasks

  1. Execution Layer

Celery workers process NDVI jobs

  1. Coordination Layer

distributed locking

idempotency

concurrency control


Key Lessons from Building an Event-Driven System

  1. Event-Driven Architecture Does Not Reduce Complexity

It shifts complexity into:

coordination

state management

failure handling


  1. Concurrency Is the Real Challenge

Not performance.
Not frameworks.

Concurrency.


  1. Safety Must Be Designed Explicitly

Small shortcuts (like naive lock deletion) can lead to major production issues.


  1. Idempotency Is Non-Negotiable

In systems with retries and event delivery:

duplicate execution is expected

safe handling is required


  1. Observability Becomes Critical

In asynchronous systems, you must answer:

“What happened to this job?”

This requires:

structured logging

tracing across components

visibility into system flow


Conclusion

This shift changed the system from:

"Run this task now"

to:

"This event will be processed safely"

That difference is fundamental.

Because in distributed systems:

You don’t design for success.
You design for failure.


What’s Next

The next phase is observability-driven engineering:

tracing event lifecycles

monitoring stream lag

correlating logs across services

Because once a system becomes event-driven:

Visibility is what makes it understandable.


Top comments (0)