Michiharu Ono

Posted on Mar 13

The Case for Leaky Locks: Redis TTL as Failure Cooldown for Expensive AI Jobs

#llm #python #fastapi #redis

I'm probably not the only one who's been told "always release your locks in a finally block." It's one of those conventions we follow without much thought, and for most situations it's completely right. But I recently ran into a case where doing the opposite was actually the better call.

The Problem I Didn't See Coming: How Releasing Locks Cost Me Money

My job queue was simple: user submits a document → AI evaluates it → result gets stored.

The issue was that AI calls can fail. Rarely, but they do. Out of nowhere, the model might ignore the expected output format or a rate limit kicks in. So I'd catch the exception, log it, mark the job as failed, and very responsibly release the lock in the finally block.

Then the user would hit retry.

And the AI would fail again.

And they'd hit retry again.

Each retry triggered another LLM call, and each one cost real money.

What I had was essentially a retry storm hitting my API at exactly the moment my system was already struggling.

Using Lock Expiration as a Cooldown Mechanism

Here's what I ended up doing:

lock_key = f"ai_job_lock:{job_id}"

# Try to acquire lock with TTL
acquired = redis.set(lock_key, "1", nx=True, ex=300)

# If the lock exists, the job is either running or cooling down
if not acquired:
    return

try:
    result = await call_llm_api(data)
    save_result(result)

    # Release lock only on success
    redis.delete(lock_key)

except Exception as e:
    log.error(f"Failed: {e}")

    # Do NOT release the lock
    # The TTL becomes the cooldown window

The lock just stays there. For five minutes. Then Redis evicts it automatically.

I know what you're thinking: "That's a memory leak!" When you see a lock that doesn't get released, the mental model most developers reach for is "something that accumulates indefinitely."

But here, it's a TTL with a 5-minute expiration. So, the worst case in this scenario is one orphaned key per job, and it self-destructs in 5 minutes. Not a leak.

Without TTL:  fail → retry → retry → retry → 20 calls in 60s
With TTL:     fail → blocked → blocked → retry at t=5min

Why This Works

Let us think about the common reasons AI calls fail. Rate limiting means you'll fail again immediately if you retry. Network issues are often temporary but not instant to resolve. Prompt problems won't fix themselves no matter how many times you retry.

When your system is already struggling, piling on more load is the last thing you want, especially when each LLM call is very expensive. The TTL acts as a forced cooldown. Five minutes sounds like a lot, but in practice it can be right for some use cases: long enough to recover from rate limits, short enough that for most async workflows users barely notice.

The Part I Actually Like

It felt wrong at first but there's no retry logic. No exponential backoff. No complex state machine. Just time.

# This is literally all the code
acquired = redis.set(lock_key, "1", nx=True, ex=300)

# On failure, just let it ride
# Lock expires naturally

And the system still recovers cleanly. If a worker crashes mid-job, the lock expires and the recovery service picks it up. The TTL handles both failure cooldown and crash recovery with zero extra code. (When a retry request arrives during the cooldown window, the worker simply fails to acquire the lock and exits early without calling the LLM.)

When This Doesn't Apply

Of course, this pattern isn't universal.

It's a bad fit for:

Cheap operations where retries are basically free.
Jobs that legitimately need immediate retry.
Situations where users expect a synchronous, instant response.

But for expensive and slow AI calls where failure usually means "try again later, not right now", this approach can quietly save you from expensive retry storms you didn't know you were building.

Risks worth mentioning

Lock duration mismatch
If a job runs longer than the TTL, the lock expires while the job is still running and another worker picks it up. Make sure TTL > worst-case runtime, or implement a heartbeat that refreshes the lock while the job is active.

Deterministic failures
The cooldown works well for transient failures like rate limits or network hiccups. It doesn't really help when the failure is caused by bad input or a broken prompt because they will just fail every 5 minutes until someone notices. I believe that it is worth classifying your failures and marking deterministic ones as permanently failed rather than letting them loop.

User-facing feedback
If a user hits retry and nothing happens, it feels broken. Surface the state somewhere: a "retry available in X minutes" message, a job status indicator, anything that tells them the system is aware and waiting rather than silently stuck.

Wrapping Up

We spend a lot of time building complex retry mechanisms, circuit breakers, and fallback systems. Sometimes the right answer is just: let it fail, wait a bit, try again later.

The Redis TTL is the "wait a bit" part, except it's automatic, requires no extra code, and can't be accidentally bypassed.

Curious if anyone else has hit this or found a better way to handle it.

Top comments (2)

klement Gunndu • Mar 13

The leaky lock reframing is clever — using TTL expiry as an intentional cooldown instead of a cleanup mechanism. Curious if you considered adaptive TTL based on failure count, like doubling the cooldown window on consecutive failures.

Michiharu Ono • Mar 13

Thanks for reading it and giving me an interesting question ! Yeah, I think we can make argument for adaptive ttl especially for persistent but temporary failures like API rate limits and provider instability(which happens often for OpenAI) where the problem might last longer than a fixed cooldown. Pairing it with randomized backoff (jitter) also helps avoid the thundering herd problem, where multiple clients would otherwise synchronize their retries and spike pressure on an already struggling dependency. For my use case particularly, I avoided it to keep the mechanism stateless and simple. For deterministic failures (bad prompts, invalid input), I prefer classifying and marking them permanently failed, since adaptive TTL would only slow the inevitable loop without solving it.