Full resiliency guide for Spring Boot microservices — using all Resilience4j annotations

#java #microservices #springboot #tutorial

Nice — let’s build a practical, copy-pasteable guide you can use right away: patterns, pom.xml/ build.gradle snippets, application.yml examples, complete code samples (CircuitBreaker, Semaphore & ThreadPool Bulkhead, Retry, TimeLimiter, RateLimiter), combining annotations, testing tips, monitoring, tuning, and deployment notes.

Assumptions: you’re using Spring Boot with the resilience4j-spring-boot2/ resilience4j-spring-boot3 integration (Resilience4j 1.x/2.x work similarly). I'll show plain Java + Spring examples (non-reactive). If you want reactive examples later, I can add them.

Choose versions compatible with your Spring Boot. (I avoided locking to a single Boot version.)

Resilience4j configs are either per instance or global defaults. Here are examples for each annotation/feature:

Also enable actuator endpoints:

Circuit Breaker : stop calling a failing downstream service to fail fast and let it recover.
Bulkhead (Semaphore): protect CPU/memory by limiting concurrent calls within the same process.
Bulkhead (ThreadPool): isolate blocking calls by running them on a dedicated thread pool.
Retry : retry transient errors but with backoff and limited attempts.
TimeLimiter : bound latency for async calls (integrates with Timeouts).
RateLimiter : limit throughput to a downstream service (or limit your own outgoing).
Combine : common pattern — RateLimiter → Bulkhead → CircuitBreaker → TimeLimiter → Retry (order depends on semantics; retries usually around network calls, but be careful to not retry in ways that worsen load).

This example demonstrates combining annotations plus fallback.

ExternalClient.java - a thin HTTP client (using RestTemplate)

ResilientService.java - apply Resilience4j annotations

Important notes about fallback signatures:

Method name must match fallbackMethod (case-sensitive).
Fallback method parameters must be the original method parameters plus optionally a final Throwable/Exception parameter.
Return type must match.

AsyncResilientService.java - asynchronous pattern (TimeLimiter + ThreadPool Bulkhead)

Notes:

@TimeLimiter works on CompletionStage / CompletableFuture.
@Bulkhead with Type.THREADPOOL expects an async return type (e.g., CompletionStage / Future / CompletableFuture).

Order can matter. A typical ordering for outbound calls:

RateLimiter — avoid hitting downstream too frequently.
Bulkhead — limit concurrency so your service doesn’t exhaust resources.
CircuitBreaker — prevent repeated calls to failing service.
TimeLimiter — bound call latency (for async calls).
Retry — apply retries only when appropriate (often after circuit/bulkhead/timeouts depending on the semantics you want).

But reality is nuanced. Example:

Be careful: retries can amplify load — combine with circuit breakers and backoffs.

You can configure default circuit breaker settings via a @Configuration:

(Resilience4j also supports configuring defaults via application.yml which is simpler for most teams.)

Add micrometer-registry-prometheus and spring-boot-starter-actuator.
Resilience4j exposes meters that Micrometer picks up. Prometheus can scrape /actuator/prometheus.
CircuitBreaker state (OPEN/HALF_OPEN/CLOSED)
Failure rate, slow-call rate
Bulkhead queue sizes and rejected calls
Retry calls count and successes/failures
Timeouts

Unit test : mock the external client and simulate failures.

Integration test : use WireMock to simulate downstream behavior (timeouts, slow responses, 500s) and test circuit transitions and metrics.

Load test : use Gatling/jMeter to exercise Fault-injection and measure how circuit/bulkhead behave under load.

Start conservative : permit enough calls for early testing, then tighten thresholds with real telemetry.
MinimumNumberOfCalls : set minimumNumberOfCalls so the circuit doesn't open on a tiny sample.
Half-open trials : allow a small number of calls to probe downstream (permittedNumberOfCallsInHalfOpenState).
Retries : use exponential backoff (customize) and avoid retrying idempotent-unsafe operations by mistake.
Bulkheads : prefer semaphore for low-latency operations and threadpool for blocking calls (DB, legacy blocking HTTP).
TimeLimiter : don’t rely solely on TimeLimiter; combine with proper threadpool management to avoid exhaustion.
Fallbacks : return cached values or degrade gracefully. Avoid heavy logic in fallback methods.
Metrics : instrument the system and use alerts (e.g., circuit open > X minutes, failure rate > Y%).
Observability : trace distributed calls with OpenTelemetry/Zipkin and tag traces with circuit/bulkhead outcomes.
Retry + Bulkhead : retries inside the same process can exhaust concurrency — be careful with combining retry and semaphore bulkhead.
Retrying non-idempotent operations : can cause side effects (e.g., duplicate payments).
Fallback signature mismatch : causes runtime exceptions; ensure parameter order and types are correct.
Blocking calls on main server threads : if you use threadpool bulkhead but your fallback or calling code blocks the calling thread, you may still exhaust connectors.
Overly aggressive thresholds : opening circuits too early causes unnecessary failures.

src/main/java/com/example/resilience/ - config/ - Resilience4jConfig.java - client/ - ExternalClient.java - service/ - ResilientService.java - AsyncResilientService.java - web/ - DemoController.java src/test/java/... application.yml pom.xml

Cache Fallback : keep a small cache (Caffeine) for last-known-good responses and return in fallbacks.
Bulkhead metrics exporter : create a scheduled job to emit bulkhead queue metrics if you want fine-grained alerts.
Circuit breaker event listener : subscribe to events for logging/alerts
@CircuitBreaker(name = "myCb", fallbackMethod = "fallback")
@Bulkhead(name = "b1", type = Bulkhead.Type.SEMAPHORE, fallbackMethod = "fb")
@Bulkhead(name = "tpb", type = Bulkhead.Type.THREADPOOL, fallbackMethod = "fb") - method should be async (CompletableFuture)
@Retry(name = "r1", fallbackMethod = "fb")
@TimeLimiter(name = "tl1", fallbackMethod = "fb") - for async methods returning CompletionStage
@RateLimiter(name = "rl1", fallbackMethod = "fb")