Ilya Masliev

Posted on Feb 6

Building a Resilience Engine in Python: Internals of LimitPal (Part 2)

#python #internals #performance #microservices

How the executor pipeline, clock abstraction, and circuit breaker architecture actually work.

If you haven’t read Part 1, the short version:

Resilience shouldn’t be a pile of decorators.
It should be a system.

Part 1 explained the motivation.

This post is about how the system is built.

The core design constraint

I started with one rule:

Every resilience feature must compose cleanly with others.

Most libraries solve a single concern well.

But composition is where systems break.

Retry + rate limiting + circuit breaker is not additive.
It’s architectural.

So LimitPal is built around one idea:

👉 A single execution pipeline

Everything plugs into it.

The executor pipeline

Every call flows through the same stages:

Circuit breaker → Rate limiter → Retry loop → Result recording

Not arbitrary order.

This ordering is deliberate.

Step 1: Circuit breaker first

Fail fast.

If the upstream service is already down,
don’t waste tokens,
don’t trigger retries,
don’t create load.

This protects your own system.

Step 2: Rate limiter

Only after we know execution is allowed
do we consume capacity.

This ensures:

breaker failures don’t eat quota
retries still respect rate limits
burst behavior stays predictable

Step 3: Retry loop

Retry lives inside the limiter window.

Not outside.

This is important.

If retry lived outside,
one logical call could consume infinite capacity.

Inside the window:

A call is a budgeted operation.

That constraint keeps systems stable under stress.

Step 4: Result recording

Success/failure feedback feeds the breaker.

This closes the loop.

The executor isn’t just running code —
it’s adapting to system health.

Why decorators fail here

Decorators look composable.

They aren’t.

Each decorator:

owns its own time model
owns its own retry logic
owns its own failure semantics

Stack them and you get:

emergent behavior you didn’t design

The executor forces:

a shared clock
a shared failure model
a shared execution lifecycle

That’s what makes the system predictable.

The clock abstraction (the hidden hero)

Time is the hardest dependency in resilience systems.

Retries depend on time.
Rate limiting depends on time.
Circuit breakers depend on time.

If every component calls time.time() directly:

You lose control.

LimitPal introduces a pluggable clock:

class Clock(Protocol):
    def now(self) -> float: ...
    def sleep(self, seconds: float) -> None: ...
    async def sleep_async(self, seconds: float) -> None: ...

Everything uses this.

Not system time.

Production clock

Uses monotonic time:

immune to system clock jumps
safe under NTP sync
stable under container migrations

MockClock

Tests become deterministic:

clock.advance(5.0)

No waiting.
No flakiness.
No race conditions.

You can simulate minutes of retry behavior instantly.

This isn’t a testing trick.

It’s architectural control over time.

Circuit breaker architecture

The breaker is a state machine:

CLOSED → OPEN → HALF_OPEN → CLOSED

But the tricky part isn’t the states.

It’s transition discipline.

CLOSED

Normal operation.

Failures increment a counter.
Success resets it.

When threshold reached → OPEN.

OPEN

All calls fail immediately.

No retry.
No limiter usage.

Just fast rejection.

After recovery timeout → HALF_OPEN.

HALF_OPEN

Limited probing phase.

We allow a small number of calls.

If they succeed → CLOSED.
If they fail → back to OPEN.

This prevents retry storms after recovery.

The breaker is not just protection.

It’s a stability regulator.

Why retry must be jittered

Exponential backoff without jitter is dangerous.

If 1,000 clients retry at the same time:

You get a synchronized spike.

You kill the service again.

Jitter spreads retries across time.

Instead of:

all retry at t=1s

You get:

retry in [0.9s, 1.1s]

Small randomness → large stability gain.

This is one of those details that separates toy resilience
from production resilience.

Key-based isolation

Limiters operate per key:

user:123
tenant:acme
ip:10.0.0.1

Each key gets its own bucket.

This prevents one bad actor
from starving everyone else.

Internally this means:

dynamic bucket allocation
TTL eviction
bounded memory
optional LRU trimming

Without this,
rate limiting becomes a memory leak.

Sync + async parity

Most Python libraries choose:

sync OR async

LimitPal enforces parity.

Same API.
Different executor.

executor.run(...)
await executor.run(...)

No hidden behavior differences.

This matters when codebases mix:

background workers
HTTP servers
CLI tools

One mental model everywhere.

The real goal

LimitPal isn’t about rate limiting.

Or retry.

Or circuit breakers.

It’s about:

making failure behavior explicit and composable

Resilience stops being ad-hoc glue
and becomes architecture.

That’s the difference between:

“I added retry”

and

“I designed a failure strategy.”

What’s next

Planned work:

observability hooks
adaptive rate limiting
Redis backend
bulkhead pattern
framework integrations

Because resilience doesn’t end at execution.
It extends into operations.

Closing thought

Distributed systems fail.

That’s not optional.

What’s optional is whether failure behavior is:

accidental
or engineered

LimitPal is an attempt to engineer it.

Docs:
https://limitpal.readthedocs.io/

Repo:
https://github.com/Guli-vali/limitpal

If you like deep infrastructure tools — feedback welcome.

DEV Community