DEV Community

Cover image for Building a Resilience Engine in Python: Internals of LimitPal (Part 2)
Ilya Masliev
Ilya Masliev

Posted on

Building a Resilience Engine in Python: Internals of LimitPal (Part 2)

How the executor pipeline, clock abstraction, and circuit breaker architecture actually work.

If you haven’t read Part 1, the short version:

Resilience shouldn’t be a pile of decorators.
It should be a system.

Part 1 explained the motivation.

This post is about how the system is built.


The core design constraint

I started with one rule:

Every resilience feature must compose cleanly with others.

Most libraries solve a single concern well.

But composition is where systems break.

Retry + rate limiting + circuit breaker is not additive.
It’s architectural.

So LimitPal is built around one idea:

👉 A single execution pipeline

Everything plugs into it.


The executor pipeline

Every call flows through the same stages:

Circuit breaker → Rate limiter → Retry loop → Result recording
Enter fullscreen mode Exit fullscreen mode

Not arbitrary order.

This ordering is deliberate.

Step 1: Circuit breaker first

Fail fast.

If the upstream service is already down,
don’t waste tokens,
don’t trigger retries,
don’t create load.

This protects your own system.

Step 2: Rate limiter

Only after we know execution is allowed
do we consume capacity.

This ensures:

  • breaker failures don’t eat quota
  • retries still respect rate limits
  • burst behavior stays predictable

Step 3: Retry loop

Retry lives inside the limiter window.

Not outside.

This is important.

If retry lived outside,
one logical call could consume infinite capacity.

Inside the window:

A call is a budgeted operation.

That constraint keeps systems stable under stress.

Step 4: Result recording

Success/failure feedback feeds the breaker.

This closes the loop.

The executor isn’t just running code —
it’s adapting to system health.


Why decorators fail here

Decorators look composable.

They aren’t.

Each decorator:

  • owns its own time model
  • owns its own retry logic
  • owns its own failure semantics

Stack them and you get:

emergent behavior you didn’t design

The executor forces:

  • a shared clock
  • a shared failure model
  • a shared execution lifecycle

That’s what makes the system predictable.


The clock abstraction (the hidden hero)

Time is the hardest dependency in resilience systems.

Retries depend on time.
Rate limiting depends on time.
Circuit breakers depend on time.

If every component calls time.time() directly:

You lose control.

LimitPal introduces a pluggable clock:

class Clock(Protocol):
    def now(self) -> float: ...
    def sleep(self, seconds: float) -> None: ...
    async def sleep_async(self, seconds: float) -> None: ...
Enter fullscreen mode Exit fullscreen mode

Everything uses this.

Not system time.

Production clock

Uses monotonic time:

  • immune to system clock jumps
  • safe under NTP sync
  • stable under container migrations

MockClock

Tests become deterministic:

clock.advance(5.0)
Enter fullscreen mode Exit fullscreen mode

No waiting.
No flakiness.
No race conditions.

You can simulate minutes of retry behavior instantly.

This isn’t a testing trick.

It’s architectural control over time.


Circuit breaker architecture

The breaker is a state machine:

CLOSED → OPEN → HALF_OPEN → CLOSED
Enter fullscreen mode Exit fullscreen mode

But the tricky part isn’t the states.

It’s transition discipline.

CLOSED

Normal operation.

Failures increment a counter.
Success resets it.

When threshold reached → OPEN.

OPEN

All calls fail immediately.

No retry.
No limiter usage.

Just fast rejection.

After recovery timeout → HALF_OPEN.

HALF_OPEN

Limited probing phase.

We allow a small number of calls.

If they succeed → CLOSED.
If they fail → back to OPEN.

This prevents retry storms after recovery.

The breaker is not just protection.

It’s a stability regulator.


Why retry must be jittered

Exponential backoff without jitter is dangerous.

If 1,000 clients retry at the same time:

You get a synchronized spike.

You kill the service again.

Jitter spreads retries across time.

Instead of:

all retry at t=1s
Enter fullscreen mode Exit fullscreen mode

You get:

retry in [0.9s, 1.1s]
Enter fullscreen mode Exit fullscreen mode

Small randomness → large stability gain.

This is one of those details that separates toy resilience
from production resilience.


Key-based isolation

Limiters operate per key:

user:123
tenant:acme
ip:10.0.0.1
Enter fullscreen mode Exit fullscreen mode

Each key gets its own bucket.

This prevents one bad actor
from starving everyone else.

Internally this means:

  • dynamic bucket allocation
  • TTL eviction
  • bounded memory
  • optional LRU trimming

Without this,
rate limiting becomes a memory leak.


Sync + async parity

Most Python libraries choose:

  • sync OR async

LimitPal enforces parity.

Same API.
Different executor.

executor.run(...)
await executor.run(...)
Enter fullscreen mode Exit fullscreen mode

No hidden behavior differences.

This matters when codebases mix:

  • background workers
  • HTTP servers
  • CLI tools

One mental model everywhere.


The real goal

LimitPal isn’t about rate limiting.

Or retry.

Or circuit breakers.

It’s about:

making failure behavior explicit and composable

Resilience stops being ad-hoc glue
and becomes architecture.

That’s the difference between:

“I added retry”

and

“I designed a failure strategy.”


What’s next

Planned work:

  • observability hooks
  • adaptive rate limiting
  • Redis backend
  • bulkhead pattern
  • framework integrations

Because resilience doesn’t end at execution.
It extends into operations.


Closing thought

Distributed systems fail.

That’s not optional.

What’s optional is whether failure behavior is:

  • accidental
  • or engineered

LimitPal is an attempt to engineer it.

Docs:
https://limitpal.readthedocs.io/

Repo:
https://github.com/Guli-vali/limitpal

If you like deep infrastructure tools — feedback welcome.

Top comments (0)