When I first set up the AI pipeline that generates content for my own site, I assumed everything would work perfectly in a linear flow. One morning, I woke up to find three different drafts generated for the same title in the backend logs, with Gemini Flash tokens spent separately for each. The problem was simple, but the solution touched one of the cornerstones of system architecture: Idempotency.
In the software world, this concept—defined as "the property of certain operations whereby they can be applied multiple times without changing the result beyond the initial application"—is vital, especially in costly and long-running AI operations. If an API request is interrupted due to a network error and your system treats it as "never happened" and retries, you lose both money and data integrity.
The Problem: Why Are We Paying Twice?
The AI content generation process has a nature quite different from a typical CRUD operation. When you send a prompt to an LLM (Large Language Model), it can take anywhere from 10 seconds to 2 minutes for the response to arrive. During this time, the HTTP connection might drop, the gunicorn worker might time out, or even if the provider you use (OpenRouter, Groq, etc.) sends the response, a ConnectionResetError might occur on your end.
That’s exactly what happened to me last month. I started a content generation task, the system sent the request to Gemini Flash, the model generated the content, but a packet loss occurred at the network layer while the response was returning to me. My retry mechanism kicked in and sent the request again for the same content. The result? Duplicate records in the database and wasted processing power.
⚠️ Cost Warning
Especially in RAG (Retrieval-Augmented Generation) operations using high context windows, a lack of idempotency means thousands of tokens are re-read and billed with every retry.
Idempotency Key Strategy
To solve this problem, the first thing I did was assign a unique idempotency_key to every content generation request. I usually obtain this key by creating a hash from the content's core parameters (such as title, target language, and date). If the system receives a new request with the same hash, it checks the status of the existing operation instead of starting a new one.
When implementing this strategy, using a UNIQUE constraint at the database level is the most robust way. While working on PostgreSQL, you might even need an EXCLUSION constraint or a PARTIAL INDEX to protect specific logic rather than just a single column. But the simplest way is to write the key generated specifically for the operation to the database, effectively saying "I got here first."
Database Level Protection
The ON CONFLICT statement in PostgreSQL is a lifesaver. Before I start generating content, I create a "task record." If this record already exists, the database returns an error, and I terminate the process, saying "this is already being done or has been completed."
INSERT INTO content_tasks (idempotency_key, status, title)
VALUES ('hsh_827364812', 'processing', 'On System Architecture')
ON CONFLICT (idempotency_key)
DO NOTHING;
This simple SQL query prevents two workers triggered at the same time from doing the same job. If the first row was inserted, the second worker quietly disperses thanks to DO NOTHING. However, there is a point to be careful about: what happens if the process is interrupted?
State Management and "Zombie" Tasks
The process starts (processing), but the server restarts or a kernel panic occurs. In this case, that idempotency_key remains as "processing" in the database forever. I call these "zombie tasks." To overcome this problem, I use an updated_at column and a timeout mechanism.
If a task has been in the "processing" state for more than 10 minutes, the system accepts it as "failed" and makes it retryable. When doing this, using a distributed lock on Redis is essential to prevent conflicts in scenarios where there are multiple servers (nodes).
ℹ️ Practical Tip
If you use
SystemDtimers in your systems, you can automate these cleanup tasks with a script that runs every 5 minutes. Monitoring how often these cleanups are performed viajournalctltells you a lot about the health of the system.
Multi-Provider Fallback and State Confusion
In my own pipeline, I don't stick to just a single AI provider. If Gemini Flash doesn't respond, I fallback to another model via OpenRouter, and if that fails, I go to Groq. This makes idempotency management even more complex because every provider has different timeout periods and error codes.
The most important lesson I learned while working on a production ERP was this: "Software architecture is often organizational flow, not just code." This is also true for the AI side. If you don't track which model failed at which stage, your system might produce inconsistent data even if it is "idempotent." Therefore, I log every provider attempt as a separate attempt.
| Stage | Provider | Status | Duration (sec) |
|---|---|---|---|
| Attempt 1 | Gemini Flash | Timeout | 30 |
| Attempt 2 | Groq (Llama 3) | Success | 4.2 |
| Attempt 3 | - | Skipped | - |
A Practical Application: FastAPI and SQLAlchemy Example
Let me share a code snippet of how I manage this on the Python (FastAPI) side. The key here is to manage the database operation within a transaction and mark the status as "started" before going to the external service (AI API).
async def generate_ai_content(db: Session, request: ContentRequest):
# Create idempotency key
id_key = generate_hash(request.title, request.context)
# Check existing task
task = db.query(ContentTask).filter_by(idempotency_key=id_key).first()
if task:
if task.status == "completed":
return task.result
if task.status == "processing" and not is_timed_out(task):
raise HTTPException(status_code=409, detail="Operation in progress")
# Create new task or update old one
task = upsert_task(db, id_key, status="processing")
try:
# Actual AI call is made here
content = await ai_provider.call(request.prompt)
task.status = "completed"
task.result = content
db.commit()
return content
except Exception as e:
task.status = "failed"
task.error_log = str(e)
db.commit()
raise e
The upsert_task function here should pull the task back to "processing" if it exists and has timed out. This provides an "at-least-once" delivery guarantee. That is, the content will be produced at least once, but thanks to idempotency, it will never be duplicated.
Monitoring and Observability
Setting up the system and leaving it isn't enough. Issues like PostgreSQL WAL (Write-Ahead Logging) bloat can sometimes stem from this rapid insert-update traffic. Especially if you have a high-volume content production line, you need to monitor vacuum operations closely. While I monitor file integrity with auditd in my own system, I also push every failed idempotency check as a metric to Prometheus at the application level.
If "Conflict" errors (HTTP 409) suddenly increase, this usually indicates that a retry mechanism somewhere is running too aggressively or a worker has fallen into a "deadlock" state. The biggest mistake I've seen in my 20 years of field experience is ignoring these types of metrics by saying "it's already working."
Conclusion
Building an AI content pipeline is not just about sending a prompt and saving the response. In real-world conditions; networks drop, APIs crash, disks fill up. Idempotency is the seatbelt that keeps your data clean amidst this chaos. Whenever I've skipped this principle in my side products or corporate projects, I've inevitably had to clean up the database with a midnight operation.
In the next post, I will talk about my vector database choices in the RAG architecture I built on top of this pipeline and why I avoided some popular solutions.
Top comments (0)