DEV Community

nk sk
nk sk

Posted on

Full resiliency guide for Spring Boot microservices — using all Resilience4j annotations

Nice — let’s build a practical, copy-pasteable guide you can use right away: patterns, pom.xml/build.gradle snippets, application.yml examples, complete code samples (CircuitBreaker, Semaphore & ThreadPool Bulkhead, Retry, TimeLimiter, RateLimiter), combining annotations, testing tips, monitoring, tuning, and deployment notes.


Assumptions: you’re using Spring Boot with the resilience4j-spring-boot2/resilience4j-spring-boot3 integration (Resilience4j 1.x/2.x work similarly). I’ll show plain Java + Spring examples (non-reactive). If you want reactive examples later, I can add them.


1) Dependencies

Maven (pom.xml)

<!-- core Spring Boot dependencies omitted for brevity -->
<dependencies>
  <!-- Spring Boot starter web -->
  <dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-web</artifactId>
  </dependency>

  <!-- Resilience4j starters -->
  <dependency>
    <groupId>io.github.resilience4j</groupId>
    <artifactId>resilience4j-spring-boot3</artifactId><!-- or resilience4j-spring-boot2 for boot2 -->
    <version>1.7.1</version> <!-- pick a compatible version for your Spring Boot -->
  </dependency>
  <dependency>
    <groupId>io.github.resilience4j</groupId>
    <artifactId>resilience4j-all</artifactId>
    <version>1.7.1</version>
  </dependency>

  <!-- Optionally for metrics -->
  <dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-registry-prometheus</artifactId>
  </dependency>

  <!-- Spring Boot actuator to expose metrics -->
  <dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
  </dependency>

  <!-- For tests -->
  <dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-test</artifactId>
    <scope>test</scope>
  </dependency>

  <!-- Optional: WireMock for integration tests -->
  <dependency>
    <groupId>com.github.tomakehurst</groupId>
    <artifactId>wiremock-jre8</artifactId>
    <scope>test</scope>
  </dependency>
</dependencies>
Enter fullscreen mode Exit fullscreen mode

Gradle (Kotlin DSL)

dependencies {
  implementation("org.springframework.boot:spring-boot-starter-web")
  implementation("io.github.resilience4j:resilience4j-spring-boot3:1.7.1")
  implementation("io.github.resilience4j:resilience4j-all:1.7.1")
  implementation("io.micrometer:micrometer-registry-prometheus")
  implementation("org.springframework.boot:spring-boot-starter-actuator")
  testImplementation("org.springframework.boot:spring-boot-starter-test")
  testImplementation("com.github.tomakehurst:wiremock-jre8:2.27.2")
}
Enter fullscreen mode Exit fullscreen mode

Choose versions compatible with your Spring Boot. (I avoided locking to a single Boot version.)


2) application.yml — configuration examples

Resilience4j configs are either per instance or global defaults. Here are examples for each annotation/feature:

resilience4j:
  circuitbreaker:
    configs:
      default:
        registerHealthIndicator: true
        slidingWindowType: COUNT_BASED
        slidingWindowSize: 20
        minimumNumberOfCalls: 10
        permittedNumberOfCallsInHalfOpenState: 5
        waitDurationInOpenState: 30s
        failureRateThreshold: 50
        automaticTransitionFromOpenToHalfOpenEnabled: false
    instances:
      externalServiceCB:
        baseConfig: default
        waitDurationInOpenState: 10s
        failureRateThreshold: 40

  retry:
    instances:
      externalServiceRetry:
        maxAttempts: 3
        waitDuration: 500ms
        retryExceptions:
          - java.io.IOException
          - java.util.concurrent.TimeoutException

  timelimiter:
    instances:
      externalServiceTL:
        timeoutDuration: 2s
        cancelRunningFuture: true

  bulkhead:
    configs:
      default:
        maxConcurrentCalls: 10
        maxWaitDuration: 0ms    # for semaphore bulkhead
      threadpool-default:
        maxThreadPoolSize: 10
        coreThreadPoolSize: 5
        queueCapacity: 50
        keepAliveDuration: 30s
    instances:
      semaphoreBulkhead:
        baseConfig: default
        maxConcurrentCalls: 20
      threadPoolBulkhead:
        baseConfig: threadpool-default

  ratelimiter:
    instances:
      externalServiceRateLimiter:
        limitForPeriod: 10
        limitRefreshPeriod: 1s
        timeoutDuration: 0
Enter fullscreen mode Exit fullscreen mode

Also enable actuator endpoints:

management:
  endpoints:
    web:
      exposure:
        include: health,info,prometheus
  endpoint:
    health:
      show-details: always
Enter fullscreen mode Exit fullscreen mode

3) Pattern explanation — when to use what (short)

  • Circuit Breaker: stop calling a failing downstream service to fail fast and let it recover.
  • Bulkhead (Semaphore): protect CPU/memory by limiting concurrent calls within the same process.
  • Bulkhead (ThreadPool): isolate blocking calls by running them on a dedicated thread pool.
  • Retry: retry transient errors but with backoff and limited attempts.
  • TimeLimiter: bound latency for async calls (integrates with Timeouts).
  • RateLimiter: limit throughput to a downstream service (or limit your own outgoing).
  • Combine: common pattern — RateLimiterBulkheadCircuitBreakerTimeLimiterRetry (order depends on semantics; retries usually around network calls, but be careful to not retry in ways that worsen load).

4) Example: service calling external HTTP API (synchronous)

This example demonstrates combining annotations plus fallback.

ExternalClient.java — a thin HTTP client (using RestTemplate)

@Service
public class ExternalClient {

    private final RestTemplate restTemplate;

    public ExternalClient(RestTemplateBuilder builder) {
        this.restTemplate = builder
            .setReadTimeout(Duration.ofSeconds(5))
            .setConnectTimeout(Duration.ofSeconds(2))
            .build();
    }

    public String getRemoteData(String id) {
        String url = "https://external.service/api/resource/" + id;
        return restTemplate.getForObject(url, String.class);
    }
}
Enter fullscreen mode Exit fullscreen mode

ResilientService.java — apply Resilience4j annotations

@Service
public class ResilientService {

    private final ExternalClient externalClient;

    public ResilientService(ExternalClient externalClient) {
        this.externalClient = externalClient;
    }

    // Use CircuitBreaker + Semaphore Bulkhead + Retry + RateLimiter
    @RateLimiter(name = "externalServiceRateLimiter", fallbackMethod = "rateLimiterFallback")
    @Bulkhead(name = "semaphoreBulkhead", type = Bulkhead.Type.SEMAPHORE, fallbackMethod = "bulkheadFallback")
    @Retry(name = "externalServiceRetry", fallbackMethod = "retryFallback")
    @CircuitBreaker(name = "externalServiceCB", fallbackMethod = "circuitFallback")
    public String getData(String id) {
        return externalClient.getRemoteData(id);
    }

    // Fallback signatures: same return type and either same args + Throwable or same args + Exception
    public String circuitFallback(String id, Throwable t) {
        // fallback behavior for circuit breaker
        return "circuit-fallback: cached-or-default";
    }

    public String retryFallback(String id, Throwable t) {
        return "retry-fallback: sorry";
    }

    public String bulkheadFallback(String id, BulkheadFullException ex) {
        return "bulkhead-fallback: overloaded";
    }

    public String rateLimiterFallback(String id, RequestNotPermitted ex) {
        return "rate-limited-fallback: try-later";
    }
}
Enter fullscreen mode Exit fullscreen mode

Important notes about fallback signatures:

  • Method name must match fallbackMethod (case-sensitive).
  • Fallback method parameters must be the original method parameters plus optionally a final Throwable/Exception parameter.
  • Return type must match.

5) TimeLimiter (async) + ThreadPoolBulkhead example

AsyncResilientService.java — asynchronous pattern (TimeLimiter + ThreadPool Bulkhead)

@Service
public class AsyncResilientService {

    private final ExternalClient externalClient;
    private final ExecutorService executor = Executors.newFixedThreadPool(10);

    public AsyncResilientService(ExternalClient externalClient) {
        this.externalClient = externalClient;
    }

    // TimeLimiter expects a CompletableFuture (async)
    @Bulkhead(name = "threadPoolBulkhead", type = Bulkhead.Type.THREADPOOL, fallbackMethod = "tpbFallback")
    @TimeLimiter(name = "externalServiceTL", fallbackMethod = "timeLimiterFallback")
    @CircuitBreaker(name = "externalServiceCB", fallbackMethod = "circuitFallback")
    public CompletableFuture<String> getDataAsync(String id) {
        // run blocking/rest call in CompletableFuture using executor
        return CompletableFuture.supplyAsync(() -> externalClient.getRemoteData(id), executor);
    }

    public CompletableFuture<String> tpbFallback(String id, BulkheadFullException ex) {
        return CompletableFuture.completedFuture("threadpool-bulkhead-fallback");
    }

    public CompletableFuture<String> timeLimiterFallback(String id, Throwable t) {
        return CompletableFuture.completedFuture("time-limiter-fallback");
    }

    public CompletableFuture<String> circuitFallback(String id, Throwable t) {
        return CompletableFuture.completedFuture("circuit-fallback-async");
    }
}
Enter fullscreen mode Exit fullscreen mode

Notes:

  • @TimeLimiter works on CompletionStage / CompletableFuture.
  • @Bulkhead with Type.THREADPOOL expects an async return type (e.g., CompletionStage / Future / CompletableFuture).

6) Combining annotations — recommended order & rationale

Order can matter. A typical ordering for outbound calls:

  1. RateLimiter — avoid hitting downstream too frequently.
  2. Bulkhead — limit concurrency so your service doesn't exhaust resources.
  3. CircuitBreaker — prevent repeated calls to failing service.
  4. TimeLimiter — bound call latency (for async calls).
  5. Retry — apply retries only when appropriate (often after circuit/bulkhead/timeouts depending on the semantics you want).

But reality is nuanced. Example:

@RateLimiter(...)
@Bulkhead(...)
@CircuitBreaker(...)
@TimeLimiter(...)
@Retry(...)
public CompletableFuture<String> call() { ... }
Enter fullscreen mode Exit fullscreen mode

Be careful: retries can amplify load — combine with circuit breakers and backoffs.


7) Example controller wiring it all together

@RestController
@RequestMapping("/api")
public class DemoController {

    private final ResilientService resilientService;
    private final AsyncResilientService asyncResilientService;

    public DemoController(ResilientService r, AsyncResilientService ar) {
        this.resilientService = r;
        this.asyncResilientService = ar;
    }

    @GetMapping("/sync/{id}")
    public ResponseEntity<String> sync(@PathVariable String id) {
        return ResponseEntity.ok(resilientService.getData(id));
    }

    @GetMapping("/async/{id}")
    public CompletableFuture<ResponseEntity<String>> async(@PathVariable String id) {
        return asyncResilientService.getDataAsync(id)
                .thenApply(ResponseEntity::ok);
    }
}
Enter fullscreen mode Exit fullscreen mode

8) Customizing defaults programmatically

You can configure default circuit breaker settings via a @Configuration:

@Configuration
public class Resilience4jConfig {

    @Bean
    public Customizer<io.github.resilience4j.circuitbreaker.configure.CircuitBreakerConfiguration> circuitBreakerCustomizer() {
        return builder -> {
            // Depending on Resilience4j Spring Boot integration you can register default configs.
        };
    }
}
Enter fullscreen mode Exit fullscreen mode

(Resilience4j also supports configuring defaults via application.yml which is simpler for most teams.)


9) Metrics and observability

  1. Expose metrics:
  • Add micrometer-registry-prometheus and spring-boot-starter-actuator.
  • Resilience4j exposes meters that Micrometer picks up. Prometheus can scrape /actuator/prometheus.
  1. Key metrics to monitor:
  • CircuitBreaker state (OPEN/HALF_OPEN/CLOSED)
  • Failure rate, slow-call rate
  • Bulkhead queue sizes and rejected calls
  • Retry calls count and successes/failures
  • Timeouts
  1. Health indicators:
  • resilience4j.circuitbreaker.instances.*.registerHealthIndicator=true will register health details.
  1. Dashboards:
  • Use Grafana + Prometheus. Visualize CB states, failure rate trends, latency percentiles.

10) Tests

Unit test: mock the external client and simulate failures.

@SpringBootTest
class ResilientServiceTest {

    @MockBean
    ExternalClient externalClient;

    @Autowired
    ResilientService service;

    @Test
    void whenExternalFailsCircuitOpensAndFallbackUsed() {
        when(externalClient.getRemoteData(anyString()))
            .thenThrow(new RuntimeException("down"));

        String result = service.getData("1");
        assertTrue(result.contains("fallback"));
    }
}
Enter fullscreen mode Exit fullscreen mode

Integration test: use WireMock to simulate downstream behavior (timeouts, slow responses, 500s) and test circuit transitions and metrics.

Load test: use Gatling/jMeter to exercise Fault-injection and measure how circuit/bulkhead behave under load.


11) Tuning tips & best practices

  • Start conservative: permit enough calls for early testing, then tighten thresholds with real telemetry.
  • MinimumNumberOfCalls: set minimumNumberOfCalls so the circuit doesn't open on a tiny sample.
  • Half-open trials: allow a small number of calls to probe downstream (permittedNumberOfCallsInHalfOpenState).
  • Retries: use exponential backoff (customize) and avoid retrying idempotent-unsafe operations by mistake.
  • Bulkheads: prefer semaphore for low-latency operations and threadpool for blocking calls (DB, legacy blocking HTTP).
  • TimeLimiter: don't rely solely on TimeLimiter; combine with proper threadpool management to avoid exhaustion.
  • Fallbacks: return cached values or degrade gracefully. Avoid heavy logic in fallback methods.
  • Metrics: instrument the system and use alerts (e.g., circuit open > X minutes, failure rate > Y%).
  • Observability: trace distributed calls with OpenTelemetry/Zipkin and tag traces with circuit/bulkhead outcomes.

12) Common pitfalls

  • Retry + Bulkhead: retries inside the same process can exhaust concurrency — be careful with combining retry and semaphore bulkhead.
  • Retrying non-idempotent operations: can cause side effects (e.g., duplicate payments).
  • Fallback signature mismatch: causes runtime exceptions; ensure parameter order and types are correct.
  • Blocking calls on main server threads: if you use threadpool bulkhead but your fallback or calling code blocks the calling thread, you may still exhaust connectors.
  • Overly aggressive thresholds: opening circuits too early causes unnecessary failures.

13) Example project layout (suggested)

src/main/java/com/example/resilience/
  - config/
    - Resilience4jConfig.java
  - client/
    - ExternalClient.java
  - service/
    - ResilientService.java
    - AsyncResilientService.java
  - web/
    - DemoController.java
src/test/java/...
application.yml
pom.xml
Enter fullscreen mode Exit fullscreen mode

14) Handy utilities & patterns

  • Cache Fallback: keep a small cache (Caffeine) for last-known-good responses and return in fallbacks.
  • Bulkhead metrics exporter: create a scheduled job to emit bulkhead queue metrics if you want fine-grained alerts.
  • Circuit breaker event listener: subscribe to events for logging/alerts
@Component
public class CircuitBreakerEventListener {

    @Autowired
    public CircuitBreakerEventListener(CircuitBreakerRegistry registry) {
        registry.getAllCircuitBreakers()
            .forEach(cb -> cb.getEventPublisher()
                .onStateTransition(event -> {
                    // log or alert
                    System.out.println("CB " + cb.getName() + " -> " + event.getStateTransition());
                }));
    }
}
Enter fullscreen mode Exit fullscreen mode

15) Quick reference: annotation usage examples

  • @CircuitBreaker(name = "myCb", fallbackMethod = "fallback")
  • @Bulkhead(name = "b1", type = Bulkhead.Type.SEMAPHORE, fallbackMethod = "fb")
  • @Bulkhead(name = "tpb", type = Bulkhead.Type.THREADPOOL, fallbackMethod = "fb") — method should be async (CompletableFuture)
  • @Retry(name = "r1", fallbackMethod = "fb")
  • @TimeLimiter(name = "tl1", fallbackMethod = "fb") — for async methods returning CompletionStage
  • @RateLimiter(name = "rl1", fallbackMethod = "fb")

16) Example: circuitbreaker + event listener + Prometheus metrics (snippet)

@Bean
public MeterRegistryCustomizer<MeterRegistry> metricsCommonTags() {
    return registry -> registry.config().commonTags("application", "resilience-demo");
}
Enter fullscreen mode Exit fullscreen mode

Add micrometer-registry-prometheus dependency and ensure /actuator/prometheus is exposed.


17) Final checklist before shipping to production

  • ✅ Feature flags for toggling aggressive resilience settings
  • ✅ End-to-end tests with injected downstream failures
  • ✅ Metrics & dashboards set up (Prometheus + Grafana)
  • ✅ Alerts on circuit open duration and failure rate thresholds
  • ✅ Observability (tracing) to correlate client and server traces
  • ✅ Documented fallback behaviors (what the system returns when degraded)
  • ✅ Load testing to validate bulkhead and threadpool sizing

Top comments (0)