I'm probably not the only one who's been told "always release your locks in a finally block." It's one of those conventions we follow without much thought, and for most situations it's completely right. But I recently ran into a case where doing the opposite was actually the better call.
The Problem I Didn't See Coming: How Releasing Locks Cost Me Money
My job queue was simple: user submits a document → AI evaluates it → result gets stored.
The issue was that AI calls can fail. Rarely, but they do. Out of nowhere, the model might ignore the expected output format or a rate limit kicks in. So I'd catch the exception, log it, mark the job as failed, and very responsibly release the lock in the finally block.
Then the user would hit retry.
And the AI would fail again.
And they'd hit retry again.
Each retry triggered another LLM call, and each one cost real money.
What I had was essentially a retry storm hitting my API at exactly the moment my system was already struggling.
Using Lock Expiration as a Cooldown Mechanism
Here's what I ended up doing:
lock_key = f"ai_job_lock:{job_id}"
# Try to acquire lock with TTL
acquired = redis.set(lock_key, "1", nx=True, ex=300)
# If the lock exists, the job is either running or cooling down
if not acquired:
return
try:
result = await call_llm_api(data)
save_result(result)
# Release lock only on success
redis.delete(lock_key)
except Exception as e:
log.error(f"Failed: {e}")
# Do NOT release the lock
# The TTL becomes the cooldown window
The lock just stays there. For five minutes. Then Redis evicts it automatically.
I know what you're thinking: "That's a memory leak!" When you see a lock that doesn't get released, the mental model most developers reach for is "something that accumulates indefinitely."
But here, it's a TTL with a 5-minute expiration. So, the worst case in this scenario is one orphaned key per job, and it self-destructs in 5 minutes. Not a leak.
Without TTL: fail → retry → retry → retry → 20 calls in 60s
With TTL: fail → blocked → blocked → retry at t=5min
Why This Works
Let us think about the common reasons AI calls fail. Rate limiting means you'll fail again immediately if you retry. Network issues are often temporary but not instant to resolve. Prompt problems won't fix themselves no matter how many times you retry.
When your system is already struggling, piling on more load is the last thing you want, especially when each LLM call is very expensive. The TTL acts as a forced cooldown. Five minutes sounds like a lot, but in practice it can be right for some use cases: long enough to recover from rate limits, short enough that for most async workflows users barely notice.
The Part I Actually Like
It felt wrong at first but there's no retry logic. No exponential backoff. No complex state machine. Just time.
# This is literally all the code
acquired = redis.set(lock_key, "1", nx=True, ex=300)
# On failure, just let it ride
# Lock expires naturally
And the system still recovers cleanly. If a worker crashes mid-job, the lock expires and the recovery service picks it up. The TTL handles both failure cooldown and crash recovery with zero extra code. (When a retry request arrives during the cooldown window, the worker simply fails to acquire the lock and exits early without calling the LLM.)
When This Doesn't Apply
Of course, this pattern isn't universal.
It's a bad fit for:
Cheap operations where retries are basically free.
Jobs that legitimately need immediate retry.
Situations where users expect a synchronous, instant response.
But for expensive and slow AI calls where failure usually means "try again later, not right now", this approach can quietly save you from expensive retry storms you didn't know you were building.
Risks worth mentioning
Lock duration mismatch
If a job runs longer than the TTL, the lock expires while the job is still running and another worker picks it up. Make sure TTL > worst-case runtime, or implement a heartbeat that refreshes the lock while the job is active.
Deterministic failures
The cooldown works well for transient failures like rate limits or network hiccups. It doesn't really help when the failure is caused by bad input or a broken prompt because they will just fail every 5 minutes until someone notices. I believe that it is worth classifying your failures and marking deterministic ones as permanently failed rather than letting them loop.
User-facing feedback
If a user hits retry and nothing happens, it feels broken. Surface the state somewhere: a "retry available in X minutes" message, a job status indicator, anything that tells them the system is aware and waiting rather than silently stuck.
Wrapping Up
We spend a lot of time building complex retry mechanisms, circuit breakers, and fallback systems. Sometimes the right answer is just: let it fail, wait a bit, try again later.
The Redis TTL is the "wait a bit" part, except it's automatic, requires no extra code, and can't be accidentally bypassed.
Curious if anyone else has hit this or found a better way to handle it.
Top comments (1)
The leaky lock reframing is clever — using TTL expiry as an intentional cooldown instead of a cleanup mechanism. Curious if you considered adaptive TTL based on failure count, like doubling the cooldown window on consecutive failures.