When handling asynchronous tasks in distributed systems, the combination of Celery and Redis is often the go-to choice. I also chose Celery for the initial design of my KYC (Know Your Customer) orchestrator due to its familiarity. However, as the service grew in complexity, I hit a massive wall: guaranteeing idempotency and managing complex states.
In this post, I want to share my technical journey of why I moved away from Celery to Temporal and how I ensured idempotency during that process.
1. Limitations of Celery/Redis: Why Change Was Necessary?
Difficulties in Idempotency Management
While Celery is excellent for "Fire and Forget" tasks, there's a high risk of duplicate execution during retries caused by network failures or worker downs. Especially for face recognition tasks that consume significant GPU resources, duplicate execution was critical in terms of both cost and performance.
Fragmentation of State
The KYC process follows this sequence:
- User uploads an ID card image.
- User uploads a selfie video.
- Compare face similarity once both files exist.
In a Celery environment, since I didn't know when images and videos would be uploaded, I needed complex logic to query the DB every time or store intermediate states in Redis. The logic to check "Are all files collected?" was scattered across multiple places, making maintenance difficult.
2. Introducing Temporal: A Paradigm Shift in Orchestration
Temporal is not just a message queue; it's a Stateful Workflows engine.
Workflow Logic Must Be "Deterministic"
Since Temporal workflow code is based on the premise of "Replay," it must always produce the same sequence of workflow API calls for the same input and history. Therefore, you should not directly perform "external-world-dependent operations" like network I/O, file I/O, system time (e.g., DateTime.now), randomness, or threading within a workflow. These side effects should be pushed to activities, while the workflow focuses solely on orchestration.
Official Docs: https://docs.temporal.io/develop/python/core-application#workflow-logic-requirements
Workflow-Centric Design
The first thing that changed after introducing Temporal was the visibility of business logic. FaceSimilarityWorkflow now gracefully waits until files are ready.
# Core logic of FaceSimilarityWorkflow
@workflow.run
async def run(self, data: SimilarityData) -> SimilarityResult:
# Wait up to 1 hour until both image and video are collected
await workflow.wait_condition(
lambda: any(f["type"] == "image" for f in self._files)
and any(f["type"] == "video" for f in self._files),
timeout=timedelta(hours=1),
)
# Execute GPU activity once all files are ready
result = await workflow.execute_activity(
check_face_similarity_activity,
activity_data,
retry_policy=RetryPolicy(maximum_attempts=3)
)
return result
This code uses workflow.wait_condition to suspend the workflow until the condition is met without blocking the event loop. In Celery, this would have required complex polling or webhook logic.
3. Idempotency Strategy: Building a Double Defense
Even with Temporal, idempotency at the activity level remains crucial. I established a double defense strategy as follows.
Step 1: Temporal's Basic Guarantee
Temporal records the progress of a workflow as event history. Therefore, even if a worker restarts, it resumes exactly from the last successful point.
Step 2: Explicit Checks within Activities
Since Temporal activities follow an "at-least-once" execution model, an activity might be retried if a worker crashes after successfully performing it but before notifying the server. Thus, official documentation strongly recommends making activities idempotent.
Official Docs: https://docs.temporal.io/develop/python/error-handling#make-activities-idempotent
In practice, I use the following two together:
- For external system calls, pass an idempotency key combined from the workflow execution and activity identifiers.
- Internally, use unique keys (or check for existing results) in the DB to prevent duplicate storage/processing.
@activity.defn
async def check_face_similarity_activity(data: SimilarityData) -> SimilarityResult:
info = activity.info()
idempotency_key = f"{info.workflow_run_id}-{info.activity_id}"
session_id = data["session_id"]
with get_db_context() as db:
existing = (
db.query(FaceSimilarity)
.filter(FaceSimilarity.idempotency_key == idempotency_key)
.first()
)
if existing:
return SimilarityResult(success=True, message="Already processed.")
# Perform actual GPU-intensive work...
4. Results: What Has Changed?
| Comparison Item | Celery/Redis Based | Temporal Based |
|---|---|---|
| State Management | Manual storage in DB/Redis | Automatically managed by engine |
| Retry Strategy | Manual exponential backoff | Declarative Retry Policy |
| Visibility | Must dig through logs | Check history in Temporal UI |
| Idempotency | Very difficult to guarantee | Structurally achievable |
Conclusion
The transition from Celery to Temporal was not just about changing tools; it was about changing how I define business processes in code. Especially in financial/authentication systems where idempotency is paramount, Temporal provided irreplaceable stability.
If you are losing sleep over complex asynchronous logic and idempotency issues, I strongly recommend migrating to Temporal.



Top comments (0)