A practical breakdown of moving an NDVI processing pipeline from a synchronous design to an event-driven architecture using Redis Streams — including concurrency challenges, distributed locking pitfalls, and production-safe patterns.
Introduction
Most pipelines work — until concurrency and failure expose their limits.
At first, processing NDVI (Normalized Difference Vegetation Index) data seems straightforward:
receive a request
process imagery
return results
But once you introduce:
concurrent jobs
long-running processing
distributed components
you’re no longer building a simple pipeline.
You’re designing a distributed system.
This article walks through how I transformed an NDVI processing pipeline from a synchronous model into an event-driven architecture using Redis Streams, and the real-world engineering challenges that came with it.
System Overview
The system is built using:
Django REST Framework (backend API)
Nextcloud (client-facing integration layer)
Celery (asynchronous task processing)
Redis Streams (event ingestion and coordination)
The Initial Architecture (Synchronous Design)
Client → API → Celery Task → NDVI Processing → Result
This design works well at small scale, but it introduces hidden risks when the system grows.
The Core Problems
- Tight Coupling
The request lifecycle is directly tied to processing.
If processing fails:
the request fails
the user experiences errors
retries become difficult
- Concurrency Issues
When multiple requests target the same job:
Request A ─┐
├──> Same Job → Duplicate Processing
Request B ─┘
This leads to:
duplicated work
inconsistent outputs
race conditions
- Fragile Execution Model
Without coordination:
jobs execute immediately
no buffering exists
failure handling is reactive, not controlled
The Shift to Event-Driven Architecture
To solve these issues, I introduced Redis Streams and redesigned the system into an event-driven model.
New Architecture (Event-Driven Pipeline)
Client → API → Redis Stream → Consumer → Celery → Processing
Why Redis Streams?
Redis Streams provide:
Event buffering (decouples ingestion from execution)
At-least-once delivery (ensures reliability)
Ordered processing
Scalability for distributed systems
What Changed
Instead of executing tasks immediately:
The API publishes events to a Redis Stream
A stream consumer controls task execution
Celery workers process jobs asynchronously
This separates:
ingestion
scheduling
execution
Distributed Locking: The Critical Bug
To prevent duplicate processing, a locking mechanism was introduced.
The naive approach:
cache.delete(lock_key)
This looks harmless — but in distributed systems, it’s dangerous.
Why This Fails
Consider this sequence:
Process A acquires a lock
The lock expires
Process B acquires the same lock
Process A deletes the lock
Now:
Process B is running without protection
This creates a race condition — one of the hardest problems in distributed systems.
The Fix: Token-Based Distributed Locking
To solve this, each lock is assigned a unique token.
SET lock_key = token_A (TTL)
Release only if:
stored_token == token_A
Key Principles
Only the owner of the lock can release it
If ownership does not match → do nothing
TTL ensures eventual cleanup
This ensures:
safe concurrency
no accidental unlocks
predictable system behavior
Stream Consumer Design
Redis Streams operate with:
At-least-once delivery semantics
This means:
messages can be delivered more than once
consumers must be idempotent
Consumer Processing Flow
Read → Validate → Enqueue → Acknowledge
Critical Rule
Never acknowledge a message before it is safely enqueued.
Idempotency and Reliability
To handle duplicate events:
processing must be idempotent
tasks must tolerate retries
state transitions must be safe
This is essential in any event-driven system.
Final Architecture (Layered System Design)
The system now operates in clear layers:
- Ingestion Layer
receives requests
publishes events
- Stream Layer
buffers and orders events
decouples system components
- Consumer Layer
controls execution
validates and dispatches tasks
- Execution Layer
Celery workers process NDVI jobs
- Coordination Layer
distributed locking
idempotency
concurrency control
Key Lessons from Building an Event-Driven System
- Event-Driven Architecture Does Not Reduce Complexity
It shifts complexity into:
coordination
state management
failure handling
- Concurrency Is the Real Challenge
Not performance.
Not frameworks.
Concurrency.
- Safety Must Be Designed Explicitly
Small shortcuts (like naive lock deletion) can lead to major production issues.
- Idempotency Is Non-Negotiable
In systems with retries and event delivery:
duplicate execution is expected
safe handling is required
- Observability Becomes Critical
In asynchronous systems, you must answer:
“What happened to this job?”
This requires:
structured logging
tracing across components
visibility into system flow
Conclusion
This shift changed the system from:
"Run this task now"
to:
"This event will be processed safely"
That difference is fundamental.
Because in distributed systems:
You don’t design for success.
You design for failure.
What’s Next
The next phase is observability-driven engineering:
tracing event lifecycles
monitoring stream lag
correlating logs across services
Because once a system becomes event-driven:
Visibility is what makes it understandable.
Top comments (0)