<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: srinivas reddy gouru</title>
    <description>The latest articles on DEV Community by srinivas reddy gouru (@srinivas_gouru_d26dc31f21).</description>
    <link>https://dev.to/srinivas_gouru_d26dc31f21</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3952982%2F742e0569-8b20-4ec8-8f6c-acb226355f7f.png</url>
      <title>DEV Community: srinivas reddy gouru</title>
      <link>https://dev.to/srinivas_gouru_d26dc31f21</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/srinivas_gouru_d26dc31f21"/>
    <language>en</language>
    <item>
      <title>How Kafka Handles Backpressure: Producer Buffers, Broker Quotas, and Consumer Flow Control</title>
      <dc:creator>srinivas reddy gouru</dc:creator>
      <pubDate>Mon, 08 Jun 2026 02:33:05 +0000</pubDate>
      <link>https://dev.to/srinivas_gouru_d26dc31f21/how-kafka-handles-backpressure-producer-buffers-broker-quotas-and-consumer-flow-control-39oh</link>
      <guid>https://dev.to/srinivas_gouru_d26dc31f21/how-kafka-handles-backpressure-producer-buffers-broker-quotas-and-consumer-flow-control-39oh</guid>
      <description>&lt;p&gt;Backpressure is the pressure that builds in a pipeline when the slow end cannot absorb what the fast end produces. In a reactive stream, the solution is explicit: the subscriber signals its demand upstream, and the publisher sends only as many items as it requested. Kafka takes a different approach. There is no protocol-level demand signal flowing from consumers back to producers. Instead, Kafka distributes the problem across three separate layers, each with its own configuration surface, and leaves the integration of those layers to you.&lt;/p&gt;

&lt;p&gt;Understanding which layer to reach for, and when, is the practical skill this article develops.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Producer Side: A Bounded Buffer That Blocks
&lt;/h2&gt;

&lt;p&gt;When your application calls &lt;code&gt;producer.send()&lt;/code&gt;, the record does not go directly to the broker. It lands in the &lt;code&gt;RecordAccumulator&lt;/code&gt;, an in-memory buffer the producer client maintains internally. The accumulator organizes records into per-partition deques of &lt;code&gt;ProducerBatch&lt;/code&gt; objects. A background &lt;code&gt;Sender&lt;/code&gt; thread drains ready batches to the appropriate brokers over the network.&lt;/p&gt;

&lt;p&gt;This separation is where producer-side backpressure lives. The accumulator has a finite size, controlled by &lt;code&gt;buffer.memory&lt;/code&gt; (default: 32 MB). As long as the &lt;code&gt;Sender&lt;/code&gt; drains batches faster than your application produces records, everything stays well under that limit. When it does not, because the broker is slow, the network is saturated, or you are simply writing more than the brokers can accept, the buffer fills up.&lt;/p&gt;

&lt;p&gt;At that point, &lt;code&gt;send()&lt;/code&gt; blocks. The calling thread waits inside &lt;code&gt;BufferPool.allocate()&lt;/code&gt; until the &lt;code&gt;Sender&lt;/code&gt; frees enough memory or &lt;code&gt;max.block.ms&lt;/code&gt; (default: 60 seconds) expires. If the timeout fires first, the client throws a &lt;code&gt;BufferExhaustedException&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The blocking behavior is the implicit backpressure signal on the producer side. Your application slows down because &lt;code&gt;send()&lt;/code&gt; will not return until there is room. The 60-second default means a slow broker can cause your producers to stall quietly for up to a minute before surfacing an error. This is worth knowing when diagnosing latency spikes that seem to appear without warning.&lt;/p&gt;

&lt;p&gt;In practice, lowering &lt;code&gt;max.block.ms&lt;/code&gt; to 5-10 seconds and handling the resulting exception explicitly gives you faster feedback and a clear place to shed load, rather than waiting for Kafka to silently unblock.&lt;/p&gt;




&lt;h2&gt;
  
  
  How Batching Softens Burst Pressure
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;RecordAccumulator&lt;/code&gt; does not forward records to the &lt;code&gt;Sender&lt;/code&gt; one at a time. It accumulates them into batches, flushing a batch when either it reaches &lt;code&gt;batch.size&lt;/code&gt; bytes or &lt;code&gt;linger.ms&lt;/code&gt; milliseconds have passed since the first record arrived in that batch, whichever comes first.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;batch.size&lt;/code&gt; defaults to 16 KB and &lt;code&gt;linger.ms&lt;/code&gt; defaults to 5 ms as of Kafka 4.0 (it was 0 in earlier versions, which caused the producer to send records immediately). A &lt;code&gt;linger.ms&lt;/code&gt; of 0 is right for low-latency use cases but leaves throughput on the table for anything bulk-oriented. A &lt;code&gt;linger.ms&lt;/code&gt; of 5-50 ms allows the accumulator to fill batches more completely, reducing the number of network round trips the &lt;code&gt;Sender&lt;/code&gt; has to make and easing the per-request load on brokers during traffic spikes.&lt;/p&gt;

&lt;p&gt;Compression works at the batch level, so the fuller the batch, the better the ratio. Enabling &lt;code&gt;compression.type&lt;/code&gt; (&lt;code&gt;lz4&lt;/code&gt; or &lt;code&gt;zstd&lt;/code&gt; being the practical choices) on top of sensible batching settings can meaningfully reduce the bytes-per-second your producers push to the network, leaving more headroom before the buffer fills.&lt;/p&gt;




&lt;h2&gt;
  
  
  Broker Quota Throttling
&lt;/h2&gt;

&lt;p&gt;Producer-side buffering protects individual producers from overwhelming themselves. Broker quotas protect the cluster from being overwhelmed by any single client.&lt;/p&gt;

&lt;p&gt;Kafka supports three quota types, each applied per user or client ID on a per-broker basis:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Quota type&lt;/th&gt;
&lt;th&gt;What it limits&lt;/th&gt;
&lt;th&gt;Available since&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Produce quota&lt;/td&gt;
&lt;td&gt;Bytes per second written&lt;/td&gt;
&lt;td&gt;Kafka 0.9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fetch quota&lt;/td&gt;
&lt;td&gt;Bytes per second read&lt;/td&gt;
&lt;td&gt;Kafka 0.9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Request quota&lt;/td&gt;
&lt;td&gt;Percentage of broker request-handler CPU time&lt;/td&gt;
&lt;td&gt;Kafka 0.11&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;When a client exceeds its quota, the broker does not reject the request. Instead, it computes a throttle delay based on how much the client overshot its limit across a sliding window of 30 one-second buckets. For a produce quota of 10 MB/s, a client that sends 15 MB in one second receives a throttle delay of roughly 500 ms.&lt;/p&gt;

&lt;p&gt;For produce requests, the broker returns the response immediately but includes a non-zero &lt;code&gt;ThrottleTimeMs&lt;/code&gt; field. The client SDK reads this value and pauses outgoing requests for that duration. For fetch requests, the response contains no data and the client backs off before its next poll. The broker also mutes the client's network channel during the delay window, so clients that ignore &lt;code&gt;ThrottleTimeMs&lt;/code&gt; still cannot push more requests through.&lt;/p&gt;

&lt;p&gt;You configure quotas at runtime with no broker restart required:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Set a produce quota of 10 MB/s for a specific client ID&lt;/span&gt;
kafka-configs.sh &lt;span class="nt"&gt;--bootstrap-server&lt;/span&gt; localhost:9092 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--alter&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--add-config&lt;/span&gt; &lt;span class="s1"&gt;'producer_byte_rate=10485760'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--entity-type&lt;/span&gt; clients &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--entity-name&lt;/span&gt; my-producer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One thing to account for in quota math: quotas are per broker, not per cluster. A quota of 50 MB/s on a 6-broker cluster allows up to 300 MB/s of aggregate throughput for that client if its writes are spread evenly across brokers.&lt;/p&gt;




&lt;h2&gt;
  
  
  Consumer Flow Control: Pause, Resume, and Poll Sizing
&lt;/h2&gt;

&lt;p&gt;On the consumption side, Kafka's pull-based model is itself a form of flow control. Your consumer calls &lt;code&gt;poll()&lt;/code&gt; at its own pace; the broker sends back available records up to &lt;code&gt;fetch.max.bytes&lt;/code&gt; (default: 50 MB) and &lt;code&gt;max.partition.fetch.bytes&lt;/code&gt; (default: 1 MB per partition). You control throughput entirely by controlling when and how often you call &lt;code&gt;poll()&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Two configs bound how much work each poll cycle returns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;max.poll.records&lt;/code&gt; (default: 500): the maximum number of records returned per &lt;code&gt;poll()&lt;/code&gt; call&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;max.poll.interval.ms&lt;/code&gt; (default: 5 minutes): the maximum time the broker allows between consecutive polls before treating the consumer as dead and triggering a rebalance&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your downstream processing is slow, &lt;code&gt;max.poll.records&lt;/code&gt; is the first lever to reach for. Dropping it from 500 to 50 means each poll cycle hands you less work, you return to calling &lt;code&gt;poll()&lt;/code&gt; faster, and you stay well inside the &lt;code&gt;max.poll.interval.ms&lt;/code&gt; window. This alone prevents a large class of rebalance storms that masquerade as "Kafka being slow."&lt;/p&gt;

&lt;p&gt;For more dynamic control, &lt;code&gt;KafkaConsumer&lt;/code&gt; exposes &lt;code&gt;pause(Collection&amp;lt;TopicPartition&amp;gt;)&lt;/code&gt; and &lt;code&gt;resume(Collection&amp;lt;TopicPartition&amp;gt;)&lt;/code&gt;. When you pause a partition, subsequent &lt;code&gt;poll()&lt;/code&gt; calls return no records from that partition. The consumer remains in the group and continues sending heartbeats; it just stops fetching. This is the right tool when your processing is async (thread pools, HTTP calls, database writes) and you need to prevent a downstream queue from growing without bound.&lt;/p&gt;

&lt;p&gt;A practical pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;ConsumerRecords&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;records&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;poll&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Duration&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ofMillis&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ConsumerRecord&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;records&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;processingQueue&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;put&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// blocks naturally when the queue is full&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;processingQueue&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;size&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="no"&gt;HIGH_WATERMARK&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;pause&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;assignment&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;processingQueue&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;size&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;LOW_WATERMARK&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;resume&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;assignment&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The one thing to watch: paused partitions still count against your session. If the processing thread takes longer than &lt;code&gt;max.poll.interval.ms&lt;/code&gt; to drain below &lt;code&gt;LOW_WATERMARK&lt;/code&gt;, the broker removes the consumer from the group. Set &lt;code&gt;HIGH_WATERMARK&lt;/code&gt; generously enough that draining happens well within that window.&lt;/p&gt;




&lt;h2&gt;
  
  
  Kafka Streams: Backpressure for Free
&lt;/h2&gt;

&lt;p&gt;Kafka Streams applications are simultaneously consumers and producers: they read from input topics, transform records, and write results to output topics. Because they use the standard consumer client under the hood, backpressure emerges naturally from the pull-based poll rhythm.&lt;/p&gt;

&lt;p&gt;If processing slows, the Streams thread takes longer per poll cycle. This means fewer records flow through per unit of time, without any explicit pause/resume logic on your part. The application never accumulates more records in memory than one poll cycle returns.&lt;/p&gt;

&lt;p&gt;You still care about &lt;code&gt;max.poll.records&lt;/code&gt; and &lt;code&gt;commit.interval.ms&lt;/code&gt;. A slow processing stage increases the uncommitted offset window; if a Streams application restarts, it replays all uncommitted records. Shorter commit intervals reduce that replay window, at the cost of more offset commits to the &lt;code&gt;__consumer_offsets&lt;/code&gt; topic.&lt;/p&gt;




&lt;h2&gt;
  
  
  Techniques at a Glance
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Mechanism&lt;/th&gt;
&lt;th&gt;Where it lives&lt;/th&gt;
&lt;th&gt;What it protects&lt;/th&gt;
&lt;th&gt;Key config&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Producer buffer blocking&lt;/td&gt;
&lt;td&gt;Producer client&lt;/td&gt;
&lt;td&gt;Slows the caller when broker is slow&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;buffer.memory&lt;/code&gt;, &lt;code&gt;max.block.ms&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Batch accumulation&lt;/td&gt;
&lt;td&gt;Producer client&lt;/td&gt;
&lt;td&gt;Smooths throughput spikes&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;batch.size&lt;/code&gt;, &lt;code&gt;linger.ms&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Compression&lt;/td&gt;
&lt;td&gt;Producer client&lt;/td&gt;
&lt;td&gt;Reduces wire bytes, extends buffer headroom&lt;/td&gt;
&lt;td&gt;&lt;code&gt;compression.type&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Broker quotas&lt;/td&gt;
&lt;td&gt;Broker&lt;/td&gt;
&lt;td&gt;Prevents noisy clients from monopolizing resources&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;producer_byte_rate&lt;/code&gt;, &lt;code&gt;consumer_byte_rate&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Poll sizing&lt;/td&gt;
&lt;td&gt;Consumer client&lt;/td&gt;
&lt;td&gt;Bounds records per processing cycle&lt;/td&gt;
&lt;td&gt;&lt;code&gt;max.poll.records&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pause/Resume&lt;/td&gt;
&lt;td&gt;Consumer client&lt;/td&gt;
&lt;td&gt;Dynamic per-partition flow control&lt;/td&gt;
&lt;td&gt;&lt;code&gt;KafkaConsumer.pause()&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kafka Streams pull model&lt;/td&gt;
&lt;td&gt;Streams runtime&lt;/td&gt;
&lt;td&gt;Implicit throttling via poll cadence&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;max.poll.records&lt;/code&gt;, &lt;code&gt;commit.interval.ms&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Where to Start
&lt;/h2&gt;

&lt;p&gt;Begin on the consumer side. Measure your per-message processing time, then set &lt;code&gt;max.poll.records&lt;/code&gt; so one poll cycle's work fits inside &lt;code&gt;max.poll.interval.ms&lt;/code&gt; with room to spare. This one change resolves most consumer-lag issues before you need to touch anything else.&lt;/p&gt;

&lt;p&gt;Add &lt;code&gt;pause()&lt;/code&gt;/&lt;code&gt;resume()&lt;/code&gt; if your processing is async and you need to protect a downstream queue from growing without bound. Keep your watermarks far enough apart that the processing thread can drain between poll cycles.&lt;/p&gt;

&lt;p&gt;On the producer side, decide how you want failures to surface. The 60-second &lt;code&gt;max.block.ms&lt;/code&gt; default means a stalled producer is invisible for up to a minute. Lowering it to 5-10 seconds and treating &lt;code&gt;BufferExhaustedException&lt;/code&gt; as a load signal gives you faster, actionable feedback. Pair this with appropriate &lt;code&gt;linger.ms&lt;/code&gt; and a compression codec to increase throughput before the buffer becomes the constraint.&lt;/p&gt;

&lt;p&gt;Broker quotas earn their configuration cost in multi-tenant clusters or anywhere you have a mix of batch and interactive workloads sharing the same brokers. A runaway batch ingest job should not be able to saturate the cluster and introduce latency spikes for latency-sensitive consumers.&lt;/p&gt;

&lt;p&gt;Consumer lag, visible through &lt;code&gt;kafka-consumer-groups.sh --describe&lt;/code&gt; or the JMX metric &lt;code&gt;records-lag-max&lt;/code&gt;, is the single health signal that ties all these mechanisms together. Sustained, growing lag tells you the consumers are falling behind before the system enters a failure mode. The root cause might be anywhere in this stack, but the lag metric is where you start the investigation.&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>backend</category>
      <category>distributedsystems</category>
      <category>systemdesign</category>
    </item>
    <item>
      <title>Five Spring Boot Patterns That Prevent Production Failures</title>
      <dc:creator>srinivas reddy gouru</dc:creator>
      <pubDate>Sat, 06 Jun 2026 00:33:18 +0000</pubDate>
      <link>https://dev.to/srinivas_gouru_d26dc31f21/five-spring-boot-patterns-that-prevent-production-failures-configuration-shutdown-async-circuit-50e6</link>
      <guid>https://dev.to/srinivas_gouru_d26dc31f21/five-spring-boot-patterns-that-prevent-production-failures-configuration-shutdown-async-circuit-50e6</guid>
      <description>&lt;h2&gt;
  
  
  The Gap Between Tutorial Code and Production Reality
&lt;/h2&gt;

&lt;p&gt;Spring Boot is an opinionated framework for building Java services quickly. It auto-configures your web server, data source, and application context so you can focus on writing business logic rather than plumbing. This article is for engineers who have a Spring Boot service running and want to know which additional patterns separate a service that survives real traffic from one that quietly misbehaves under it.&lt;/p&gt;

&lt;p&gt;Consider a concrete example. A downstream API starts responding slowly. Tomcat threads pile up waiting for a response that never comes, because Spring Boot sets no default read timeout. [1] With 60 threads hung on the slow dependency, only 140 of the default 200 remain to serve real traffic, and throughput drops accordingly. [1] In a busier environment, the pool exhausts entirely, and the service stops accepting new requests without throwing a single exception.&lt;/p&gt;

&lt;p&gt;This failure mode does not appear in any beginner Spring Boot tutorial, and that's not an accident of curriculum design. Tutorials optimise for getting something running. Production optimises for keeping it running when the environment misbehaves: when a downstream service hangs, when a deployment interrupts in-flight requests, when configuration differs between staging and prod, when a dependency starts failing intermittently overnight.&lt;/p&gt;

&lt;p&gt;Five patterns address five distinct categories of that production reality. Externalised configuration with profiles keeps environment-specific values out of your artefact, making changes auditable and safe. Graceful shutdown with lifecycle hooks ensures that a rolling deployment doesn't drop mid-flight requests when the process receives a termination signal. [2] Structured async processing with &lt;code&gt;@Async&lt;/code&gt; and explicit thread pool configuration moves slow work off the request thread, so a sluggish job can't starve the rest of the application. Circuit breaking with Resilience4j gives the service a way to stop calling a failing dependency, rather than letting failures cascade. Health and readiness probes with Actuator tell your orchestration layer whether the service is actually ready to receive traffic, not just whether the process started.&lt;/p&gt;

&lt;p&gt;Each of these patterns is already available in the standard Spring Boot dependency tree. None requires a major architectural change. What they require is knowing they exist, understanding what failure they prevent, and wiring them in before traffic is real enough to expose the gap. The rest of this article covers each pattern in turn: what it does, how to configure it, and the specific failure scenario it prevents.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pattern 1, Externalised Configuration with Profiles: Eliminating Environment Mismatch
&lt;/h2&gt;

&lt;p&gt;A quieter but common failure happens before the service ever receives real traffic: the application runs fine locally, ships to production, and immediately misbehaves because it's still pointed at the dev database or using in-memory H2 instead of Postgres. The fix isn't discipline alone; it's understanding how Spring Boot resolves configuration values and building your config structure around that resolution order.&lt;/p&gt;

&lt;p&gt;Both &lt;code&gt;.properties&lt;/code&gt; and &lt;code&gt;.yml&lt;/code&gt; formats are supported for Spring Boot configuration files, and the two are functionally equivalent. The examples below use &lt;code&gt;.properties&lt;/code&gt; throughout; if your project uses &lt;code&gt;.yml&lt;/code&gt;, the same keys and values apply with YAML's indentation syntax instead.&lt;/p&gt;

&lt;p&gt;Spring Boot 2 evaluates property sources in a well-defined order of priority. Command-line arguments sit at the top, overriding everything below them. OS environment variables come next. Below those sit the profile-specific property files, and at the bottom sits the base &lt;code&gt;application.properties&lt;/code&gt; bundled inside your jar. This layering is the whole point: the jar is immutable, and every environment customises it by injecting values above the jar's baseline rather than by modifying what's baked in.&lt;/p&gt;

&lt;p&gt;Profile-specific files named &lt;code&gt;application-{profile}.properties&lt;/code&gt; placed &lt;em&gt;outside&lt;/em&gt; the jar take precedence over their counterparts &lt;em&gt;inside&lt;/em&gt; the jar. A Kubernetes deployment can mount an environment-specific &lt;code&gt;application-prod.properties&lt;/code&gt; as a ConfigMap volume, and Spring Boot will pick it up without any code change or rebuild. This is what allows a single deployable artefact to run correctly across dev, staging, and production. &lt;/p&gt;

&lt;p&gt;A typical split looks like this. The base &lt;code&gt;application.properties&lt;/code&gt; holds values that genuinely don't change across environments:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="c"&gt;# application.properties
&lt;/span&gt;&lt;span class="py"&gt;spring.application.name&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;order-service&lt;/span&gt;
&lt;span class="py"&gt;spring.jpa.open-in-view&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;false&lt;/span&gt;
&lt;span class="py"&gt;management.endpoints.web.exposure.include&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;health,info,prometheus&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The dev profile keeps things frictionless locally:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="c"&gt;# application-dev.properties
&lt;/span&gt;&lt;span class="py"&gt;spring.datasource.url&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;jdbc:h2:mem:orderdb&lt;/span&gt;
&lt;span class="py"&gt;spring.datasource.driver-class-name&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;org.h2.Driver&lt;/span&gt;
&lt;span class="py"&gt;spring.jpa.show-sql&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;true&lt;/span&gt;
&lt;span class="py"&gt;logging.level.com.example&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;DEBUG&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Production tightens everything down:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="c"&gt;# application-prod.properties
&lt;/span&gt;&lt;span class="py"&gt;spring.datasource.url&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;jdbc:postgresql://${DB_HOST}:5432/orderdb&lt;/span&gt;
&lt;span class="py"&gt;spring.datasource.username&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;${DB_USER}&lt;/span&gt;
&lt;span class="py"&gt;spring.datasource.password&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;${DB_PASSWORD}&lt;/span&gt;
&lt;span class="py"&gt;spring.jpa.show-sql&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;false&lt;/span&gt;
&lt;span class="py"&gt;logging.level.com.example&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;WARN&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The prod file deliberately delegates secrets to environment variables using &lt;code&gt;${...}&lt;/code&gt; placeholders. If &lt;code&gt;DB_PASSWORD&lt;/code&gt; isn't set at startup, Spring Boot fails fast with a clear error rather than starting with a null password that only fails on the first query.&lt;/p&gt;

&lt;p&gt;Profile activation is straightforward. Locally, you add &lt;code&gt;spring.profiles.active=dev&lt;/code&gt; to your IDE run configuration or to a &lt;code&gt;.env&lt;/code&gt; file. In a container, you set the environment variable &lt;code&gt;SPRING_PROFILES_ACTIVE=prod&lt;/code&gt;. A command-line override, &lt;code&gt;--spring.profiles.active=staging&lt;/code&gt;, works for one-off deployments without touching any file.&lt;/p&gt;

&lt;p&gt;One thing worth being explicit about: the profile mechanism is not a secret store. Credentials should come through environment variables or a secrets manager like Vault, with the profile file holding only the placeholder reference. Teams that skip this step often end up with database passwords in version control, which is a much worse problem than a misconfigured datasource URL.&lt;/p&gt;

&lt;p&gt;With the environment mismatch handled, the next failure category is what happens when a pod is killed mid-request, and that requires a different approach entirely.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F08i0hdz7u8gzlatwqi3u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F08i0hdz7u8gzlatwqi3u.png" alt=" " width="800" height="1726"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Pattern 2, Graceful Shutdown with Lifecycle Hooks: Preventing Dirty Mid-Request Kills
&lt;/h2&gt;

&lt;p&gt;When Kubernetes rolls out a new version of your service, it sends SIGTERM to the old pod. Without any additional configuration, the JVM exits immediately on that signal. Any request that was mid-flight, a database write half-committed, a payment being authorised, a file being streamed, gets cut off at the socket level. The client receives a connection-reset error and has no way to know whether the operation succeeded.&lt;/p&gt;

&lt;p&gt;Spring Boot 2.3 introduced a built-in graceful shutdown to address exactly this. [2] Enabling it takes two lines in &lt;code&gt;application.properties&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="py"&gt;server.shutdown&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;graceful&lt;/span&gt;
&lt;span class="py"&gt;spring.lifecycle.timeout-per-shutdown-phase&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;30s&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With &lt;code&gt;server.shutdown=graceful&lt;/code&gt;, the embedded server (Tomcat, Jetty, or Undertow) stops accepting new connections the moment SIGTERM arrives, then waits for active requests to complete before allowing the JVM to exit. [2] The &lt;code&gt;timeout-per-shutdown-phase&lt;/code&gt; property sets the ceiling: if in-flight requests haven't finished within 30 seconds, Spring proceeds with the shutdown anyway rather than hanging indefinitely. Tune that timeout to match your slowest plausible request, not your average one.&lt;/p&gt;

&lt;p&gt;Graceful shutdown handles the HTTP layer, but your application likely has other resources that need orderly cleanup: database connection pools, scheduled jobs mid-execution, Kafka consumer loops, open file handles. This is where &lt;code&gt;@PreDestroy&lt;/code&gt; and &lt;code&gt;ContextClosedEvent&lt;/code&gt; earn their place. [3] Annotate a method with &lt;code&gt;@PreDestroy&lt;/code&gt;, and Spring calls it during the bean's destruction phase, after the server has drained. For cross-cutting shutdown logic, flushing a metrics buffer, signalling worker threads to stop, and implementing an &lt;code&gt;ApplicationListener&amp;lt;ContextClosedEvent&amp;gt;&lt;/code&gt; gives you a single, centralised place to coordinate cleanup.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Component&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MetricsFlushListener&lt;/span&gt; &lt;span class="kd"&gt;implements&lt;/span&gt; &lt;span class="nc"&gt;ApplicationListener&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;ContextClosedEvent&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

 &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;MetricsBuffer&lt;/span&gt; &lt;span class="n"&gt;buffer&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

 &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nf"&gt;MetricsFlushListener&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;MetricsBuffer&lt;/span&gt; &lt;span class="n"&gt;buffer&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
 &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;buffer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;buffer&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
 &lt;span class="o"&gt;}&lt;/span&gt;

 &lt;span class="nd"&gt;@Override&lt;/span&gt;
 &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;onApplicationEvent&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ContextClosedEvent&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
 &lt;span class="n"&gt;buffer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;flush&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
 &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There is one gap that Spring's built-in graceful shutdown cannot close on its own. Kubernetes removes a pod from the service endpoints list asynchronously after sending SIGTERM. There is a brief window, typically a few seconds, during which the pod is already refusing new connections, but the load balancer is still routing traffic to it. Requests that arrive in that window hit a closed socket. &lt;/p&gt;

&lt;p&gt;The standard mitigation is a Kubernetes &lt;code&gt;preStop&lt;/code&gt; hook that introduces a deliberate delay before the JVM begins shutting down. In distroless images where &lt;code&gt;sleep&lt;/code&gt; is unavailable, an Actuator-backed endpoint solves the same problem: a &lt;code&gt;GET /actuator/preStopHook/{delayInMillis}&lt;/code&gt; endpoint holds the preStop hook open for the specified duration, giving the control plane time to drain the pod from all endpoint slices before SIGTERM actually reaches the application. [4] [4] A delay of 10 to 15 seconds covers most Kubernetes environments; the right value depends on your cluster's endpoint propagation latency.&lt;/p&gt;

&lt;p&gt;Once your service shuts down cleanly, the next failure category is what happens during normal operation when a slow downstream dependency starts consuming all your threads.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F98r5h8v4on5zy4ek6vd1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F98r5h8v4on5zy4ek6vd1.png" alt=" " width="800" height="603"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Pattern 3, Structured Async Processing with &lt;a class="mentioned-user" href="https://dev.to/async"&gt;@async&lt;/a&gt; and Thread Pool Tuning: Defeating Thread Starvation
&lt;/h2&gt;

&lt;p&gt;The thread-exhaustion scenario from the introduction had a specific cause: every slow downstream call was blocking a Tomcat worker thread, and the application had no way to offload that work elsewhere. Spring's &lt;code&gt;@Async&lt;/code&gt; mechanism exists precisely for this situation, but enabling it naively introduces a different failure mode that is just as damaging.&lt;/p&gt;

&lt;p&gt;When you add &lt;code&gt;@EnableAsync&lt;/code&gt; to a configuration class and annotate a service method with &lt;code&gt;@Async&lt;/code&gt;, Spring wraps the bean in a proxy. Calls to that method from outside the bean are intercepted and submitted to an executor, freeing the calling thread immediately. [5] The Tomcat request thread returns to the pool and can accept the next incoming request while the async work runs separately.&lt;/p&gt;

&lt;p&gt;The danger is in which executor Spring uses by default. Without an explicit executor bean, &lt;code&gt;@Async&lt;/code&gt; falls back to &lt;code&gt;SimpleAsyncTaskExecutor&lt;/code&gt;, which spawns a brand-new OS thread for every invocation and imposes no upper bound on how many can exist simultaneously. [6] Under load, this trades Tomcat thread exhaustion for JVM thread exhaustion; you get &lt;code&gt;OutOfMemoryError: unable to create new native thread&lt;/code&gt; instead of a timeout, which is harder to diagnose and just as fatal. [6]&lt;/p&gt;

&lt;p&gt;The fix is a named &lt;code&gt;ThreadPoolTaskExecutor&lt;/code&gt; bean with explicit bounds:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Configuration&lt;/span&gt;
&lt;span class="nd"&gt;@EnableAsync&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AsyncConfig&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

 &lt;span class="nd"&gt;@Bean&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"taskExecutor"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
 &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;Executor&lt;/span&gt; &lt;span class="nf"&gt;taskExecutor&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
 &lt;span class="nc"&gt;ThreadPoolTaskExecutor&lt;/span&gt; &lt;span class="n"&gt;executor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ThreadPoolTaskExecutor&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
 &lt;span class="n"&gt;executor&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setCorePoolSize&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
 &lt;span class="n"&gt;executor&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setMaxPoolSize&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;25&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
 &lt;span class="n"&gt;executor&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setQueueCapacity&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
 &lt;span class="n"&gt;executor&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setThreadNamePrefix&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"async-worker-"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
 &lt;span class="n"&gt;executor&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setRejectedExecutionHandler&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ThreadPoolExecutor&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;CallerRunsPolicy&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
 &lt;span class="n"&gt;executor&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;initialize&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
 &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;executor&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
 &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The three sizing parameters interact in a way that surprises many engineers. [7] The pool starts with &lt;code&gt;corePoolSize&lt;/code&gt; threads and keeps them alive even when idle. New tasks go to the queue once all core threads are busy. Only after the queue fills up does the pool grow toward &lt;code&gt;maxPoolSize&lt;/code&gt;. Setting &lt;code&gt;queueCapacity&lt;/code&gt; to a generous but finite value, 200 in the example above, means excess work is buffered rather than immediately spawning new threads. The &lt;code&gt;CallerRunsPolicy&lt;/code&gt; rejection handler applies natural back-pressure by running rejected tasks on the submitting thread instead of throwing an exception, so when the pool and queue are both full, the caller slows down rather than failing outright. [8]&lt;/p&gt;

&lt;p&gt;One subtlety worth understanding: &lt;code&gt;@Async&lt;/code&gt; works via Spring's proxy mechanism, which means the proxy is only engaged when the call crosses a bean boundary. If a method inside &lt;code&gt;NotificationService&lt;/code&gt; calls another &lt;code&gt;@Async&lt;/code&gt; method on the same &lt;code&gt;NotificationService&lt;/code&gt; instance via &lt;code&gt;this.sendEmail(...)&lt;/code&gt;, the proxy is bypassed entirely, and the method runs synchronously on the caller's thread. To get async dispatch from within the same class, inject a reference to &lt;code&gt;self&lt;/code&gt; via &lt;code&gt;@Autowired&lt;/code&gt; and call through that reference, or move the async method to a separate bean.&lt;/p&gt;

&lt;p&gt;With a bounded executor in place, your async service methods look straightforward:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Service&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;NotificationService&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

 &lt;span class="nd"&gt;@Async&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"taskExecutor"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
 &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;CompletableFuture&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Void&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;sendEmail&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;recipient&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
 &lt;span class="c1"&gt;// I/O-bound work happens here, off the Tomcat thread&lt;/span&gt;
 &lt;span class="n"&gt;emailClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;send&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;recipient&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
 &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;CompletableFuture&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;completedFuture&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
 &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Naming the executor explicitly in the annotation (&lt;code&gt;"taskExecutor"&lt;/code&gt;) avoids any ambiguity if you later define multiple executors for different workloads, such as a separate pool for CPU-intensive tasks versus I/O-bound ones.&lt;/p&gt;

&lt;p&gt;Thread pool sizing is not a one-size-fits-all choice. For I/O-bound async work, a larger &lt;code&gt;corePoolSize&lt;/code&gt; relative to your CPU count makes sense because threads spend most of their time waiting on network or disk. For CPU-bound work, setting &lt;code&gt;corePoolSize&lt;/code&gt; to roughly the number of available processors prevents context-switching overhead from eating the gains. Exposing &lt;code&gt;corePoolSize&lt;/code&gt; and &lt;code&gt;maxPoolSize&lt;/code&gt; through &lt;code&gt;application.properties&lt;/code&gt; lets you tune these values per environment without redeploying.&lt;/p&gt;

&lt;p&gt;Bounded async processing resolves the thread starvation problem, but it does not help when the downstream service you're calling starts returning errors rather than slow responses. That failure mode, a dependency that is up but broken, is where circuit breaking becomes essential.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pattern 4, Circuit Breaking with Resilience4j: Stopping Cascading Failures at the Boundary
&lt;/h2&gt;

&lt;p&gt;The previous section showed how unbounded thread pools can cause thread starvation when a downstream call is slow. Circuit breaking is what stops that slow call from becoming everyone's problem. Instead of letting threads pile up waiting for a timeout that may never come, a circuit breaker tracks the error rate of outbound calls and, once failures exceed a threshold, stops making the call at all and returns a fast failure immediately. &lt;/p&gt;

&lt;p&gt;Resilience4j models this with three states. In the &lt;strong&gt;CLOSED&lt;/strong&gt; state, calls pass through normally and outcomes are recorded in a sliding window. Once the failure rate inside that window crosses the configured threshold, the breaker transitions to &lt;strong&gt;OPEN&lt;/strong&gt;, where every call is rejected immediately with a &lt;code&gt;CallNotPermittedException&lt;/code&gt;, no network round-trip, no waiting. After &lt;code&gt;waitDurationInOpenState&lt;/code&gt; elapses, the breaker moves to &lt;strong&gt;HALF_OPEN&lt;/strong&gt; and admits a small number of probe requests. If those succeed, the circuit closes again. If they fail, it opens immediately, and the timer resets. [9] [9]&lt;/p&gt;

&lt;p&gt;To add this to a Spring Boot 3 service, you need three dependencies: the Resilience4j starter, AOP support for the annotations to work, and Actuator for observability. [10]&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
 &lt;span class="nt"&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;io.github.resilience4j&lt;span class="nt"&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
 &lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;resilience4j-spring-boot3&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
 &lt;span class="nt"&gt;&amp;lt;version&amp;gt;&lt;/span&gt;2.2.0&lt;span class="nt"&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
 &lt;span class="nt"&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.springframework.boot&lt;span class="nt"&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
 &lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;spring-boot-starter-aop&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
 &lt;span class="nt"&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.springframework.boot&lt;span class="nt"&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
 &lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;spring-boot-starter-actuator&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Configure the breaker's behavior in &lt;code&gt;application.yml&lt;/code&gt;. The two parameters that matter most in production are &lt;code&gt;slidingWindowSize&lt;/code&gt;, how many recent calls are evaluated, and &lt;code&gt;waitDurationInOpenState&lt;/code&gt;, how long the breaker stays open before probing for recovery. [10]&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;resilience4j&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
 &lt;span class="na"&gt;circuitbreaker&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
 &lt;span class="na"&gt;instances&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
 &lt;span class="na"&gt;paymentService&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
 &lt;span class="na"&gt;slidingWindowType&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;COUNT_BASED&lt;/span&gt;
 &lt;span class="na"&gt;slidingWindowSize&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
 &lt;span class="na"&gt;minimumNumberOfCalls&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;
 &lt;span class="na"&gt;failureRateThreshold&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;50&lt;/span&gt;
 &lt;span class="na"&gt;waitDurationInOpenState&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;30s&lt;/span&gt;
 &lt;span class="na"&gt;permittedNumberOfCallsInHalfOpenState&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;
 &lt;span class="na"&gt;automaticTransitionFromOpenToHalfOpenEnabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On the Java side, annotate the method that crosses the service boundary, the connector or repository layer, not the controller. [11] Pair it with a fallback method that returns a degraded-but-valid response rather than propagating an exception to the caller.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Service&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;PaymentConnector&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

 &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;RestTemplate&lt;/span&gt; &lt;span class="n"&gt;restTemplate&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

 &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nf"&gt;PaymentConnector&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;RestTemplate&lt;/span&gt; &lt;span class="n"&gt;restTemplate&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
 &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;restTemplate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;restTemplate&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
 &lt;span class="o"&gt;}&lt;/span&gt;

 &lt;span class="nd"&gt;@CircuitBreaker&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"paymentService"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fallbackMethod&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"paymentFallback"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
 &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;PaymentResponse&lt;/span&gt; &lt;span class="nf"&gt;charge&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;PaymentRequest&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
 &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;restTemplate&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;postForObject&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
 &lt;span class="s"&gt;"https://fraud-api.example.com/check"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;PaymentResponse&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
 &lt;span class="o"&gt;}&lt;/span&gt;

 &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="nc"&gt;PaymentResponse&lt;/span&gt; &lt;span class="nf"&gt;paymentFallback&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;PaymentRequest&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Exception&lt;/span&gt; &lt;span class="n"&gt;ex&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
 &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;PaymentResponse&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;orderId&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getOrderId&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt;
&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"PENDING_REVIEW"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Payment queued for manual review"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
 &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The fallback method's signature must match the protected method exactly, with an &lt;code&gt;Exception&lt;/code&gt; or &lt;code&gt;Throwable&lt;/code&gt; parameter appended. Get this wrong and Resilience4j will silently fail to wire the fallback, propagating the original exception instead. [11]&lt;/p&gt;

&lt;p&gt;One configuration detail worth getting right early: &lt;code&gt;minimumNumberOfCalls&lt;/code&gt;. Without it, a single failed request at startup can trip a breaker that has only seen one call. Setting it to at least 5 ensures the failure rate is calculated from a meaningful sample before the circuit opens. [12] Similarly, &lt;code&gt;slowCallDurationThreshold&lt;/code&gt; lets you treat calls that hang for more than, say, two seconds as failures, catching a degraded dependency before it starts actively returning errors. [12]&lt;/p&gt;

&lt;p&gt;Resilience4j publishes the circuit breaker state via the Actuator automatically when you enable it. In &lt;code&gt;application.yml&lt;/code&gt;, expose the &lt;code&gt;circuitbreakers&lt;/code&gt; and &lt;code&gt;health&lt;/code&gt; endpoints and set &lt;code&gt;management.health.circuitbreakers.enabled: true&lt;/code&gt;. A GET to &lt;code&gt;/actuator/circuitbreakers&lt;/code&gt; will show the current state, failure rate, and buffered call count for each named instance, useful for confirming that a circuit has opened in production before you start chasing logs. [10]&lt;/p&gt;

&lt;p&gt;Circuit breaking solves the cascading-failure problem, but it only helps if you can see when it fires. The next section covers how Actuator health and readiness probes make that degradation visible to both your orchestration layer and your on-call team.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb5e08j9ema0ohqs3fk9o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb5e08j9ema0ohqs3fk9o.png" alt=" " width="800" height="657"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Pattern 5, Health and Readiness Probes with Actuator: Making Degradation Visible
&lt;/h2&gt;

&lt;p&gt;A Spring Boot service can be simultaneously "running" and completely broken. The JVM is up, the process responds to pings, Kubernetes sees a healthy pod, and every request returns a 500 because the database connection pool was exhausted twenty minutes ago. Without health probes, this failure is invisible to the platform, and all it can do is keep routing traffic into the problem.&lt;/p&gt;

&lt;p&gt;Spring Boot Actuator addresses this by exposing &lt;code&gt;/actuator/health&lt;/code&gt;, &lt;code&gt;/actuator/health/liveness&lt;/code&gt;, and &lt;code&gt;/actuator/health/readiness&lt;/code&gt; as structured endpoints that aggregate the status of every registered health indicator, database, message broker, disk space, Redis, and more, into a single machine-readable response. The overall status follows a logical AND: if any one component reports DOWN, the aggregate reports DOWN and returns HTTP 503.&lt;/p&gt;

&lt;p&gt;The liveness and readiness distinction matters in practice. A liveness probe should be narrow, checking only that the JVM is responsive, not that downstream dependencies are healthy. If your liveness probe checks the database and the database has a thirty-second hiccup, Kubernetes will restart the pod, which does nothing to fix the database and adds unnecessary churn. [13] The readiness probe is where you add dependency checks, because a NOT READY pod is simply removed from the service's endpoint list rather than restarted. Traffic stops arriving; the pod stays alive and recovers when the dependency comes back. [13]&lt;/p&gt;

&lt;p&gt;Enable both probe endpoints explicitly in &lt;code&gt;application.yml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;management&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
 &lt;span class="na"&gt;endpoint&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
 &lt;span class="na"&gt;health&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
 &lt;span class="na"&gt;show-details&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;always&lt;/span&gt;
 &lt;span class="na"&gt;probes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
 &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
 &lt;span class="na"&gt;group&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
 &lt;span class="na"&gt;liveness&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
 &lt;span class="na"&gt;include&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;livenessState&lt;/span&gt;
 &lt;span class="na"&gt;readiness&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
 &lt;span class="na"&gt;include&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;readinessState,db,diskSpace&lt;/span&gt;
 &lt;span class="na"&gt;health&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
 &lt;span class="na"&gt;livenessstate&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
 &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
 &lt;span class="na"&gt;readinessstate&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
 &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then point your Kubernetes deployment at the right paths:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;livenessProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
 &lt;span class="na"&gt;httpGet&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
 &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/actuator/health/liveness&lt;/span&gt;
 &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8080&lt;/span&gt;
 &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
 &lt;span class="na"&gt;failureThreshold&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;
&lt;span class="na"&gt;readinessProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
 &lt;span class="na"&gt;httpGet&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
 &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/actuator/health/readiness&lt;/span&gt;
 &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8080&lt;/span&gt;
 &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;
 &lt;span class="na"&gt;failureThreshold&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;
&lt;span class="na"&gt;startupProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
 &lt;span class="na"&gt;httpGet&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
 &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/actuator/health/liveness&lt;/span&gt;
 &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8080&lt;/span&gt;
 &lt;span class="na"&gt;initialDelaySeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
 &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;
 &lt;span class="na"&gt;failureThreshold&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A startup probe is worth treating as required for Spring Boot on Kubernetes. Java applications routinely need 30-60 seconds to initialise, and without one, Kubernetes may declare the pod broken before the application context has even finished loading. The startup probe above allows up to 150 seconds for initialisation; after it passes, the liveness probe takes over.&lt;/p&gt;

&lt;p&gt;Actuator also lets you expose your own health logic. Implement &lt;code&gt;HealthIndicator&lt;/code&gt;, annotate it with &lt;code&gt;@Component&lt;/code&gt;, and Spring will automatically include it in the aggregated response. The bean name minus the "HealthIndicator" suffix becomes the key in the JSON output, so &lt;code&gt;DatabaseHealthIndicator&lt;/code&gt; appears as &lt;code&gt;"database"&lt;/code&gt; in the response.&lt;/p&gt;




&lt;h3&gt;
  
  
  Go-Live Checklist: Which Patterns to Add First
&lt;/h3&gt;

&lt;p&gt;The order depends on what kind of service you are shipping.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Synchronous HTTP service (REST API, BFF, gateway)&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Externalised configuration with profiles, environment parity prevents the entire class of "works locally, broken in prod" incidents.&lt;/li&gt;
&lt;li&gt;Graceful shutdown prevents in-flight requests from being killed on every deploy.&lt;/li&gt;
&lt;li&gt;Circuit breaking with Resilience4j protects your thread pool from slow downstream APIs.&lt;/li&gt;
&lt;li&gt;Actuator health and readiness probes make degradation visible to load balancers and on-call engineers.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Async worker or event-driven service (Kafka consumer, job processor)&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Externalised configuration with profiles.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;@Async&lt;/code&gt; with a bounded &lt;code&gt;ThreadPoolTaskExecutor&lt;/code&gt;, an unbounded executor will exhaust memory under sustained load.&lt;/li&gt;
&lt;li&gt;Actuator health probes, at minimum, expose readiness so the pod can signal when it has fallen behind or lost its broker connection.&lt;/li&gt;
&lt;li&gt;Graceful shutdown, give in-flight messages time to complete before the consumer stops.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Any service deploying to Kubernetes&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;All five patterns apply, but start with readiness probes and graceful shutdown together. A pod that starts receiving traffic before its connection pool is ready, or that gets killed mid-request on every rolling deploy, will produce incidents that are genuinely hard to debug, because the failure window is short and the logs look normal. &lt;/p&gt;

&lt;p&gt;The fastest path forward: add &lt;code&gt;spring-boot-starter-actuator&lt;/code&gt; to your dependencies today, enable the probe endpoints, and wire the Kubernetes manifest to &lt;code&gt;/actuator/health/liveness&lt;/code&gt; and &lt;code&gt;/actuator/health/readiness&lt;/code&gt;. That single change makes your service's internal state legible to the platform, and legibility is the prerequisite for everything else.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>How Spring AI Works: LLMs, Embeddings, and RAG for Java Engineers</title>
      <dc:creator>srinivas reddy gouru</dc:creator>
      <pubDate>Fri, 05 Jun 2026 16:39:37 +0000</pubDate>
      <link>https://dev.to/srinivas_gouru_d26dc31f21/how-spring-ai-works-llms-embeddings-and-rag-for-java-engineers-4ck5</link>
      <guid>https://dev.to/srinivas_gouru_d26dc31f21/how-spring-ai-works-llms-embeddings-and-rag-for-java-engineers-4ck5</guid>
      <description>&lt;h2&gt;
  
  
  The Java Engineer's AI Problem, and Why Spring AI Exists
&lt;/h2&gt;

&lt;p&gt;Spring AI is a Spring Boot framework that connects Java applications to LLMs and vector stores using the same dependency-injection patterns Spring developers already know. Your team picks up a ticket to add an AI-powered feature to an existing Spring Boot service, maybe a document Q&amp;amp;A tool, or a chat assistant backed by company-internal data. You open a browser and start searching for tutorials. Nearly every result assumes you are working in Python, reaching for LangChain, and standing up a virtualenv. The code samples use FastAPI. The LLM client libraries are pip packages. The entire mental model is Python-first, and the message, whether intentional or not, is that AI development happens somewhere other than where you work.&lt;/p&gt;

&lt;p&gt;This is the friction Spring AI was built to remove. Java is a powerful, deeply established language running enormous amounts of production workloads, and there is no good reason Java engineers should have to detour through a different language just to call a language model. The question "how do I build an AI solution in Java?" deserves a real answer, not "port the Python tutorial yourself."&lt;/p&gt;

&lt;p&gt;Spring AI is that answer. It is an application framework for AI engineering whose explicit goal is to carry Spring ecosystem design principles, portability, modular design, and POJO-centric construction into the AI domain. At its core, it solves a concrete integration problem: connecting your enterprise data and APIs with AI models, using the same patterns you already reach for when connecting to a database or a message broker.&lt;/p&gt;

&lt;p&gt;The project drew inspiration from Python's LangChain and LlamaIndex, but it is not a port of either. The underlying conviction is that generative AI applications will not stay confined to one language. Spring AI was founded on the belief that the next wave of AI-powered software will be ubiquitous across programming languages, and that Java engineers deserve first-class tooling rather than a translation layer bolted on afterwards. &lt;/p&gt;

&lt;p&gt;What makes that practical is the mapping layer Spring AI provides between the AI world and the Spring world. Concepts that feel foreign in isolation, chat models, embeddings, vector stores, prompt templates, tool calls, each get a Spring-native abstraction: an auto-configured bean, a declarative client, a repository interface. You wire them with dependency injection the same way you wire a &lt;code&gt;JdbcTemplate&lt;/code&gt; or a &lt;code&gt;RestClient&lt;/code&gt;. The rest of this article walks through how each of those mappings works, how a request travels from your application code through Spring AI to an LLM and back, and how the framework connects to vector databases for retrieval-augmented generation. By the end, you should be able to read a Spring AI codebase, understand the event flow it describes, and know exactly which dependency to add for each piece of the puzzle.&lt;/p&gt;

&lt;h2&gt;
  
  
  Spring AI's Core Architecture: Familiar Abstractions, New Capabilities
&lt;/h2&gt;

&lt;p&gt;Spring AI is built on three interlocking design principles, and understanding how they stack together explains why the framework feels so natural to a Spring developer. The principles are not independent: each one depends on the one before it, and together they produce something more useful than any of them would alone.&lt;/p&gt;

&lt;p&gt;The foundation is &lt;strong&gt;portability&lt;/strong&gt;, expressed through a set of provider-agnostic interfaces: &lt;code&gt;ChatModel&lt;/code&gt;, &lt;code&gt;EmbeddingModel&lt;/code&gt;, &lt;code&gt;VectorStore&lt;/code&gt;, &lt;code&gt;ImageModel&lt;/code&gt;, and so on. Every supported AI provider, OpenAI, Anthropic, Google, Amazon Bedrock, Azure, Ollama, implements the same interfaces. Your application code refers only to these abstractions, never to a provider-specific class. Switching from Claude to GPT-4 means updating a dependency and two lines in &lt;code&gt;application.properties&lt;/code&gt;; the &lt;code&gt;ChatModel&lt;/code&gt; you injected into your service doesn't change. That portability only becomes practical, though, if something wires those implementations for you automatically.&lt;/p&gt;

&lt;p&gt;That something is &lt;strong&gt;auto-configuration&lt;/strong&gt;. Spring AI ships a Spring Boot starter for each supported provider, and adding one to your build is enough for the framework to register a fully wired AI client as a bean. You set &lt;code&gt;spring.ai.openai.api-key&lt;/code&gt; in &lt;code&gt;application.properties&lt;/code&gt;, and Spring Boot's auto-configuration creates a &lt;code&gt;ChatModel&lt;/code&gt;, an &lt;code&gt;EmbeddingModel&lt;/code&gt;, and a &lt;code&gt;ChatClient.Builder&lt;/code&gt;, all ready for injection, no manual SDK initialisation required. The same mechanism configures vector stores: add &lt;code&gt;spring-ai-pgvector-store-spring-boot-starter&lt;/code&gt; and the auto-configured &lt;code&gt;EmbeddingModel&lt;/code&gt; is wired into a &lt;code&gt;VectorStore&lt;/code&gt; bean automatically. Without the portable interfaces beneath it, this auto-configuration would produce a tangle of provider-specific beans. Because the interfaces are stable contracts, auto-configuration can produce beans that the rest of your application consumes without caring which provider sits behind them.&lt;/p&gt;

&lt;p&gt;Those two layers together create the conditions for the third principle: &lt;strong&gt;POJO-centric design&lt;/strong&gt;. An LLM response is free-form text, but your service needs typed data. Spring AI's Structured Outputs feature handles the conversion: the &lt;code&gt;ChatClient&lt;/code&gt; appends a system message instructing the model to respond in JSON, then deserialises the result directly into whatever Java type you specify.  The call reads exactly as it would if you were deserialising an HTTP response with &lt;code&gt;RestClient&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;record&lt;/span&gt; &lt;span class="nf"&gt;City&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;zipcode&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{}&lt;/span&gt;

&lt;span class="nc"&gt;City&lt;/span&gt; &lt;span class="n"&gt;capital&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chatClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;user&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"What is the capital of Germany?"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;call&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;entity&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;City&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without portability, you couldn't reuse this pattern across providers. Without auto-configuration, you'd need to manually construct the converter pipeline. The three principles stack, and the stack pays off in code that treats an LLM as just another external data source, one that returns typed objects, gets injected like any other bean, and can be swapped out through configuration.&lt;/p&gt;

&lt;p&gt;This layered architecture is what every subsequent Spring AI concept, &lt;code&gt;ChatClient&lt;/code&gt; prompts, Advisors, RAG pipelines, and tool calling, is built on top of. The diagram below shows how these layers relate at runtime, from your Spring beans down through the abstraction interfaces to the provider implementations and external services.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feajhpk5w4wo2wskcjynz.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feajhpk5w4wo2wskcjynz.jpg" alt=" " width="800" height="477"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Talking to LLMs: ChatClient, Models, and Prompt Templates
&lt;/h2&gt;

&lt;p&gt;The first thing most Spring developers want to do when they pick up Spring AI is send a message to an LLM and get a response back. The entry point for that is &lt;code&gt;ChatClient&lt;/code&gt;, a fluent API that feels instantly familiar if you've ever used &lt;code&gt;WebClient&lt;/code&gt; or &lt;code&gt;RestClient&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Getting there requires three things: the right starter on the classpath, a key in &lt;code&gt;application.properties&lt;/code&gt;, and an injected &lt;code&gt;ChatClient&lt;/code&gt;. For OpenAI, that looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="c"&gt;&amp;lt;!-- pom.xml --&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
 &lt;span class="nt"&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.springframework.ai&lt;span class="nt"&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
 &lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;spring-ai-openai-spring-boot-starter&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="c"&gt;# application.properties
&lt;/span&gt;&lt;span class="py"&gt;spring.ai.openai.api-key&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;${OPENAI_API_KEY}&lt;/span&gt;
&lt;span class="py"&gt;spring.ai.openai.chat.options.model&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Service&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SupportService&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

 &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;ChatClient&lt;/span&gt; &lt;span class="n"&gt;chatClient&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

 &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nf"&gt;SupportService&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ChatClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Builder&lt;/span&gt; &lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
 &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;chatClient&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
 &lt;span class="o"&gt;}&lt;/span&gt;

 &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="nf"&gt;answer&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
 &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;chatClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;user&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;call&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
 &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Auto-configuration registers the &lt;code&gt;ChatClient.Builder&lt;/code&gt; bean based on whatever provider starter is on the classpath. You inject the builder, call &lt;code&gt;.build()&lt;/code&gt;, and you're done. Swapping from OpenAI to Azure OpenAI or Mistral is a matter of changing the starter dependency and the corresponding properties; the &lt;code&gt;ChatClient&lt;/code&gt; call site above stays unchanged. &lt;/p&gt;

&lt;p&gt;That portability comes from the &lt;code&gt;ChatModel&lt;/code&gt; interface sitting underneath &lt;code&gt;ChatClient&lt;/code&gt;. Every provider adapter implements &lt;code&gt;ChatModel&lt;/code&gt;, so the same &lt;code&gt;call(Prompt)&lt;/code&gt; method signature works regardless of which backend you've configured. You can also inject &lt;code&gt;ChatModel&lt;/code&gt; directly if you need a lower-level access, but for most application code, &lt;code&gt;ChatClient&lt;/code&gt; is the better choice because it layers on convenience methods for streaming, system messages, and chat memory.&lt;/p&gt;

&lt;h3&gt;
  
  
  Building structured prompts with PromptTemplate
&lt;/h3&gt;

&lt;p&gt;Simple string questions work fine for exploration, but real features need structure. A customer support bot might need to inject the user's account tier and their most recent order number into every prompt. Hardcoding that with string concatenation gets messy fast. &lt;code&gt;PromptTemplate&lt;/code&gt; solves this the same way Thymeleaf solves HTML templating: you define a string with named placeholders using &lt;code&gt;{}&lt;/code&gt; syntax, then fill them at runtime with a &lt;code&gt;Map&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;PromptTemplate&lt;/span&gt; &lt;span class="n"&gt;template&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;PromptTemplate&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
 &lt;span class="s"&gt;"You are helping a {tier} customer. Their last order was {orderId}. Answer: {question}"&lt;/span&gt;
&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="nc"&gt;Prompt&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;template&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;create&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Map&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
 &lt;span class="s"&gt;"tier"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"premium"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
 &lt;span class="s"&gt;"orderId"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"ORD-9182"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
 &lt;span class="s"&gt;"question"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;userInput&lt;/span&gt;
&lt;span class="o"&gt;));&lt;/span&gt;

&lt;span class="nc"&gt;Generation&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chatModel&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;call&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;getResult&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Under the hood, Spring AI uses the StringTemplate engine by Terence Parr to perform the substitution. If curly braces clash with your template content, you can configure alternate delimiters via &lt;code&gt;StTemplateRenderer&lt;/code&gt;. You can also store prompt text in classpath resources and inject it with &lt;code&gt;@Value("classpath:/prompts/support.st")&lt;/code&gt;, which keeps long instructions out of your Java source files entirely.&lt;/p&gt;

&lt;p&gt;A &lt;code&gt;Prompt&lt;/code&gt; can carry multiple &lt;code&gt;Message&lt;/code&gt; objects, each tagged with a role. The system role gives the model its persona and constraints before the conversation begins; the user role carries the actual question; the assistant role records prior model responses for multi-turn exchanges. Spring AI provides &lt;code&gt;SystemPromptTemplate&lt;/code&gt; and &lt;code&gt;UserMessage&lt;/code&gt; as typed helpers for each role, so you rarely work with the raw enum directly. For stateful conversations where the model needs to remember earlier turns, &lt;code&gt;ChatClient&lt;/code&gt; supports chat memory out of the box; prior messages are automatically included in subsequent calls without manual prompt concatenation.&lt;/p&gt;

&lt;p&gt;Once you can send structured prompts to an LLM, the natural next question is how to give it access to your own data. That's where embeddings and vector stores come in.&lt;/p&gt;

&lt;h2&gt;
  
  
  Embeddings and Vector Stores: Giving the LLM Long-Term Memory
&lt;/h2&gt;

&lt;p&gt;An embedding is a fixed-length array of floating-point numbers that represents the meaning of a piece of text. When you ask an embedding model to process the sentence "What is the refund policy?", it returns something like a 1536-element vector of decimal values. That vector encodes the semantic content of the sentence, not its exact words. Two sentences that mean the same thing will produce vectors that sit close together in that high-dimensional space, even if they share no vocabulary. That proximity is what makes similarity search possible.&lt;/p&gt;

&lt;p&gt;You do not need to understand the mathematics behind this to use it in a Spring application. What matters is the operational picture: you have text, you convert it to a vector, and you store or compare that vector.&lt;/p&gt;

&lt;h3&gt;
  
  
  EmbeddingModel: The Spring Abstraction
&lt;/h3&gt;

&lt;p&gt;Spring AI wraps embedding providers behind the &lt;code&gt;EmbeddingModel&lt;/code&gt; interface, which follows the same design philosophy as &lt;code&gt;ChatModel&lt;/code&gt;. Auto-configuration wires a provider-specific implementation into your &lt;code&gt;ApplicationContext&lt;/code&gt;, and your code depends only on the interface. Calling &lt;code&gt;embeddingModel.embed("some text")&lt;/code&gt; returns a &lt;code&gt;float[]&lt;/code&gt; at whatever dimensionality the underlying model uses. For batch workloads, &lt;code&gt;embedForResponse(List&amp;lt;String&amp;gt; texts)&lt;/code&gt; returns all vectors in one round trip, which matters when you are processing thousands of documents at startup.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Service&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;DocumentIndexer&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

 &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;EmbeddingModel&lt;/span&gt; &lt;span class="n"&gt;embeddingModel&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

 &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nf"&gt;DocumentIndexer&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;EmbeddingModel&lt;/span&gt; &lt;span class="n"&gt;embeddingModel&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
 &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;embeddingModel&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;embeddingModel&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
 &lt;span class="o"&gt;}&lt;/span&gt;

 &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;float&lt;/span&gt;&lt;span class="o"&gt;[]&lt;/span&gt; &lt;span class="nf"&gt;vectorize&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
 &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;embeddingModel&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;embed&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
 &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Swapping from OpenAI's &lt;code&gt;text-embedding-ada-002&lt;/code&gt; to a locally hosted Ollama model is a one-line change in &lt;code&gt;application.properties&lt;/code&gt;. The dimensionality of the output will differ, but the interface stays identical.&lt;/p&gt;

&lt;h3&gt;
  
  
  VectorStore: One API, Many Databases
&lt;/h3&gt;

&lt;p&gt;Storing and searching vectors requires a database that understands cosine similarity or dot-product distance. Spring AI introduces &lt;code&gt;VectorStore&lt;/code&gt;, a single portable interface over a wide fleet of supported backends: PostgreSQL via PGVector, Redis, MongoDB Atlas, Chroma, Milvus, Pinecone, Weaviate, Neo4j, Oracle, Qdrant, Apache Cassandra, and Azure Vector Search.&lt;/p&gt;

&lt;p&gt;The API has two primary operations. &lt;code&gt;vectorStore.add(List&amp;lt;Document&amp;gt; documents)&lt;/code&gt; takes Spring AI &lt;code&gt;Document&lt;/code&gt; objects, embeds them automatically, and persists the resulting vectors alongside the original text and any metadata you attach. &lt;code&gt;vectorStore.similaritySearch(SearchRequest.query("refund policy").withTopK(5))&lt;/code&gt; retrieves the five documents whose vectors sit closest to the query vector, returning plain &lt;code&gt;Document&lt;/code&gt; objects ready to inject into a prompt.&lt;/p&gt;

&lt;p&gt;Beyond basic search, Spring AI adds a SQL-like metadata filter syntax that works portably across every supported provider. You can write &lt;code&gt;SearchRequest.query("refund policy").withFilterExpression("category == 'billing' &amp;amp;&amp;amp; year &amp;gt;= 2024")&lt;/code&gt;, and the library translates that expression into whatever native filter the underlying store supports, whether that is a Postgres &lt;code&gt;WHERE&lt;/code&gt; clause or a Pinecone metadata filter object. That portability means you can develop against an in-process Chroma instance and deploy against PGVector in production without changing application code.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why These Two Primitives Matter
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;EmbeddingModel&lt;/code&gt; converts meaning into numbers; &lt;code&gt;VectorStore&lt;/code&gt; stores those numbers and finds the closest ones on demand. Together, they give an LLM access to knowledge that was never in its training data, your documentation, your customer records, and your internal policies. In the next section, those two primitives become the first two steps in a Retrieval-Augmented Generation pipeline that feeds retrieved context directly into the model's prompt.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F94cq9cnvja812p9css6a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F94cq9cnvja812p9css6a.png" alt=" " width="800" height="388"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  RAG Pipelines: Wiring Documents into LLM Answers
&lt;/h2&gt;

&lt;p&gt;Retrieval-Augmented Generation (RAG) is the pattern that closes the biggest gap in stock LLM behaviour: a model trained on public data knows nothing about your internal documents, product catalogue, or support tickets. RAG solves this by retrieving the relevant text at request time and handing it to the model as context, so the answer is grounded in your data rather than in whatever the model happened to learn during pre-training. Spring AI makes this pattern a first-class citizen, and it maps naturally onto the embedding and vector-store primitives covered in the previous section.&lt;/p&gt;

&lt;p&gt;The pipeline splits into two distinct phases. In the ingestion phase, you load raw documents, split them into chunks, embed each chunk, and write the resulting vectors into a vector store. In the query phase, you embed the user's question, run a similarity search against those vectors to retrieve the top-K most relevant chunks, inject those chunks into the prompt as additional context, and send the augmented prompt to the LLM. Everything between "user typed a question" and "LLM produced a grounded answer" runs inside Spring-managed beans, which means you configure it exactly the way you'd configure a data source or a REST client.&lt;/p&gt;

&lt;h3&gt;
  
  
  Ingestion: Loading and Splitting Documents
&lt;/h3&gt;

&lt;p&gt;Spring AI ships a &lt;code&gt;DocumentReader&lt;/code&gt; abstraction with built-in implementations for PDF, plain text, Markdown, HTML, and JSON. You load documents, pipe them through a &lt;code&gt;TokenTextSplitter&lt;/code&gt; (or a custom splitter), and call &lt;code&gt;vectorStore.add(documents)&lt;/code&gt;. The &lt;code&gt;VectorStore&lt;/code&gt; implementation calls your configured &lt;code&gt;EmbeddingModel&lt;/code&gt; automatically; you don't wire the embedding call by hand.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Bean&lt;/span&gt;
&lt;span class="nc"&gt;ApplicationRunner&lt;/span&gt; &lt;span class="nf"&gt;ingest&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;DocumentReader&lt;/span&gt; &lt;span class="n"&gt;reader&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
 &lt;span class="nc"&gt;TokenTextSplitter&lt;/span&gt; &lt;span class="n"&gt;splitter&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
 &lt;span class="nc"&gt;VectorStore&lt;/span&gt; &lt;span class="n"&gt;vectorStore&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
 &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
 &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Document&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;docs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;reader&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
 &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Document&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;splitter&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;apply&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
 &lt;span class="n"&gt;vectorStore&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;add&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
 &lt;span class="nc"&gt;System&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;out&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;println&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Ingested "&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;size&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;" chunks."&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
 &lt;span class="o"&gt;};&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run this once at startup (or on a schedule), and the vector store is ready to serve queries.&lt;/p&gt;

&lt;h3&gt;
  
  
  Query: Retrieval and Prompt Augmentation
&lt;/h3&gt;

&lt;p&gt;On the query side, the &lt;code&gt;QuestionAnswerAdvisor&lt;/code&gt; is the highest-level abstraction Spring AI provides. You attach it to a &lt;code&gt;ChatClient&lt;/code&gt;, pass in the &lt;code&gt;VectorStore&lt;/code&gt;, and it handles the retrieve-then-augment flow automatically. The advisor embeds the incoming question, queries the vector store for similar chunks, and rewrites the prompt to include those chunks before the call reaches the model.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Bean&lt;/span&gt;
&lt;span class="nc"&gt;ChatClient&lt;/span&gt; &lt;span class="nf"&gt;ragClient&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ChatClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Builder&lt;/span&gt; &lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;VectorStore&lt;/span&gt; &lt;span class="n"&gt;vectorStore&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
 &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;builder&lt;/span&gt;
&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;defaultAdvisors&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;QuestionAnswerAdvisor&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vectorStore&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// At request time:&lt;/span&gt;
&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;answer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ragClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;user&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;userQuestion&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;call&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The advisor's default behaviour retrieves the top four chunks and appends them to the system message. You can tune &lt;code&gt;k&lt;/code&gt;, apply metadata filters (useful when you've segmented documents by tenant or product line), or replace the default prompt template entirely by supplying your own &lt;code&gt;PromptTemplate&lt;/code&gt; to the advisor constructor.&lt;/p&gt;

&lt;h3&gt;
  
  
  What the Embeddings Are Actually Doing
&lt;/h3&gt;

&lt;p&gt;When the advisor receives a question, it calls &lt;code&gt;embeddingModel.embed(question)&lt;/code&gt; to produce a vector, then calls &lt;code&gt;vectorStore.similaritySearch(SearchRequest.query(question).withTopK(4))&lt;/code&gt;. The vector store computes cosine similarity (or an equivalent metric, depending on the backend) between the query vector and all stored chunk vectors, and returns the closest matches. Because the embeddings encode semantic meaning rather than literal keyword overlap, a question about "cancellation policy" will match a chunk that uses the phrase "how to end your subscription", something a keyword search would miss entirely.&lt;/p&gt;

&lt;p&gt;The result is a system where your LLM's answers are constrained by evidence drawn from your own documents, which reduces hallucination and keeps the model from confidently citing things it simply doesn't know. For most production Java applications, this is the pattern that makes the difference between a demo and something you'd trust at scale.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F96o6a6uh4stz7pjny9po.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F96o6a6uh4stz7pjny9po.png" alt=" " width="799" height="259"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Tool Calling: Letting the LLM Invoke Your Spring Beans
&lt;/h2&gt;

&lt;p&gt;RAG pipelines give the LLM access to stored documents, but documents are static snapshots. When a user asks "what's the current balance on account 4821?" or "book a meeting for tomorrow at 2 pm," the model needs to reach into live systems. Tool calling is how that happens.&lt;/p&gt;

&lt;p&gt;The core idea is straightforward: instead of answering a question itself, the LLM can pause mid-response and say, "I need to call a function to answer this." Spring AI intercepts that signal, executes the corresponding Java method, and sends the result back to the model so it can complete its response. From the user's perspective, the answer just arrives. From the developer's perspective, the model is calling a Spring bean. &lt;/p&gt;

&lt;h3&gt;
  
  
  How the Round-Trip Works
&lt;/h3&gt;

&lt;p&gt;The sequence goes like this: your application sends a prompt along with a list of available tools, each described by name, purpose, and parameter schema. The model evaluates whether any tool is relevant to the request. If it decides to use one, it returns a structured tool-call request rather than a natural-language answer. Spring AI catches that response, looks up the registered Java method by name, deserialises the arguments, invokes the method, and then sends the result back to the model as a follow-up message. The model reads the result and continues generating the final answer. The whole loop is invisible to the caller.&lt;/p&gt;

&lt;h3&gt;
  
  
  Registering a Tool
&lt;/h3&gt;

&lt;p&gt;Spring AI lets you register tools in two ways: as annotated beans or as inline lambdas passed directly to the &lt;code&gt;ChatClient&lt;/code&gt;. The annotation approach maps cleanly to existing service classes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Service&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AccountTools&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

 &lt;span class="nd"&gt;@Tool&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"Returns the current balance for a given account ID"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
 &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;BigDecimal&lt;/span&gt; &lt;span class="nf"&gt;getAccountBalance&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;accountId&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
 &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;accountRepository&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;findById&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;accountId&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;map&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nl"&gt;Account:&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;getBalance&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;orElseThrow&lt;/span&gt;&lt;span class="o"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;IllegalArgumentException&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Unknown account: "&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;accountId&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;
 &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You then attach it when building the client call:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;ChatResponse&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chatClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;user&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"What is the balance on account 4821?"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;accountTools&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;call&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;chatResponse&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Spring AI reads the &lt;code&gt;@Tool&lt;/code&gt; annotation to generate the JSON schema that the model receives, so the model knows exactly what parameters to pass and what the function does. You do not write any schema by hand.&lt;/p&gt;

&lt;h3&gt;
  
  
  Guardrails Around Tool Calls
&lt;/h3&gt;

&lt;p&gt;Because the model is now driving real method invocations, safety controls matter. Spring AI supports input and output guardrails at the model-provider level, which means you can validate what the model is asking to call before the method executes, and you can filter or transform the result before it goes back to the model. In practice, this looks like middleware: you intercept the tool-call request, check it against your business rules, and either allow it, reject it with a safe fallback, or modify the parameters. That keeps a misbehaving model from passing malformed inputs straight into your database layer.&lt;/p&gt;

&lt;p&gt;Tool calling is also where Spring AI's model portability pays off most noticeably. The same &lt;code&gt;@Tool&lt;/code&gt;-annotated bean works against OpenAI, Anthropic, Google Gemini, or a local Ollama model. The framework translates the tool schema into whatever wire format each provider expects, so you are not locked into one vendor's function-calling API. Once your tools are registered, switching models is a configuration change, not a rewrite.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy447ra28gfir7zw9uwzn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy447ra28gfir7zw9uwzn.png" alt=" " width="800" height="343"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Agentic Patterns: Workflows vs. Autonomous Agents
&lt;/h2&gt;

&lt;p&gt;Tool calling, covered in the previous section, is the mechanism that enables agents. The next question is how much control you hand to the LLM once those tools are available.&lt;/p&gt;

&lt;p&gt;Spring AI draws a clear line between two types of agentic systems. In a &lt;strong&gt;workflow&lt;/strong&gt;, LLMs and tools are orchestrated through predefined code paths. Your Java code decides when to call the model, which tools are eligible, and what happens with the result. In an &lt;strong&gt;agent&lt;/strong&gt;, the LLM itself dynamically decides which tools to invoke and in what sequence, with your code acting more as an execution host than a director.&lt;/p&gt;

&lt;p&gt;The distinction matters practically. A workflow is deterministic by construction: given the same inputs, you follow the same execution path, which makes it easy to test, audit, and explain to stakeholders. An autonomous agent trades that predictability for flexibility. You describe a goal and let the model reason about how to reach it, which is powerful but introduces non-determinism that can be hard to manage in a production system with SLA requirements.&lt;/p&gt;

&lt;p&gt;Spring AI's approach here is deliberately aligned with Anthropic's research on building effective agents, which emphasises simplicity and composability over complex orchestration frameworks. The practical takeaway from that research is that a well-structured workflow better serves most production use cases than a fully autonomous agent, even when autonomous agents feel like the more exciting option. Spring AI's &lt;code&gt;agentic-patterns&lt;/code&gt; reference project implements both styles, giving you concrete starting points for each. &lt;/p&gt;

&lt;p&gt;For a backend Java engineer, this framing maps onto patterns you already know. A workflow resembles a service method with a clear call graph: call the LLM to classify intent, branch on the result, call a tool, and return a structured response. An agent resembles an event loop: the LLM produces a tool call, you execute it, you feed the result back to the model, and you repeat until the model signals it is done. Spring AI handles that loop through its &lt;code&gt;ChatClient&lt;/code&gt; and tool registration infrastructure, the same APIs you use for single-shot requests, just called iteratively inside a loop that the model controls.&lt;/p&gt;

&lt;p&gt;In practice, most teams start with workflow patterns because the control flow is visible in code and failures are easy to reproduce. The agentic loop becomes worthwhile when the task genuinely requires open-ended reasoning, a research assistant that decides which APIs to query based on partial results, for example, or a code-review agent that chooses which static-analysis tools to run based on the languages it detects in a PR. For everything else, the workflow keeps the model in a supporting role and your Java code in the driver's seat, which is usually where you want it in an enterprise context.&lt;/p&gt;

&lt;h2&gt;
  
  
  Multi-Model Configuration and Provider Portability
&lt;/h2&gt;

&lt;p&gt;Real enterprise applications rarely settle on a single LLM. You might use a fast, inexpensive model for short classification tasks and a more capable model for complex summarisation, or you might want a fallback path so that when one provider's API is unavailable, the application degrades gracefully rather than returning an error. Spring AI handles both scenarios through the same auto-configuration and dependency injection patterns you already use everywhere else in your Spring Boot application.&lt;/p&gt;

&lt;h3&gt;
  
  
  Switching Providers With Configuration Alone
&lt;/h3&gt;

&lt;p&gt;The most direct form of portability is the ability to change providers without touching your application code. Because &lt;code&gt;ChatClient&lt;/code&gt; and &lt;code&gt;ChatModel&lt;/code&gt; are interfaces, the concrete implementation is determined at startup by whichever starter is on the classpath and how &lt;code&gt;application.properties&lt;/code&gt; is set. Swapping from OpenAI to Anthropic's Claude is primarily a two-part change: replace the starter dependency in your build file, and update your properties.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="c"&gt;# OpenAI configuration (active)
&lt;/span&gt;&lt;span class="py"&gt;spring.ai.openai.api-key&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;${OPENAI_API_KEY}&lt;/span&gt;
&lt;span class="py"&gt;spring.ai.openai.chat.options.model&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;

&lt;span class="c"&gt;# Anthropic Claude configuration (alternative, comment out the OpenAI block above
# and uncomment these lines to switch providers; do not use both at once)
# spring.ai.anthropic.api-key=${ANTHROPIC_API_KEY}
# spring.ai.anthropic.chat.options.model=claude-3-5-sonnet-20241022
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your &lt;code&gt;@Service&lt;/code&gt; classes that inject &lt;code&gt;ChatModel&lt;/code&gt; or &lt;code&gt;ChatClient&lt;/code&gt; remain untouched. This is the same bet Spring has always made with its abstraction layers: write to the interface, configure the implementation externally.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>How Spring Security works with JWT, OAuth2, and SSO</title>
      <dc:creator>srinivas reddy gouru</dc:creator>
      <pubDate>Tue, 26 May 2026 17:43:05 +0000</pubDate>
      <link>https://dev.to/srinivas_gouru_d26dc31f21/how-spring-security-works-with-jwt-oauth2-and-sso-5fil</link>
      <guid>https://dev.to/srinivas_gouru_d26dc31f21/how-spring-security-works-with-jwt-oauth2-and-sso-5fil</guid>
      <description>&lt;h2&gt;
  
  
  The 401 You Never Wrote: What Spring Security Actually Does Out of the Box
&lt;/h2&gt;

&lt;p&gt;Spring Security is a framework that intercepts every HTTP request entering your application and enforces authentication and authorisation rules before that request ever reaches a controller. You do not write the interception logic yourself; adding the &lt;code&gt;spring-boot-starter-security&lt;/code&gt; dependency to a Spring Boot project is enough to activate a default configuration that locks down every endpoint and returns a 401 (or redirects to &lt;code&gt;/login&lt;/code&gt;) for any unauthenticated caller. No annotations, no filters, no &lt;code&gt;@PreAuthorize&lt;/code&gt;, just a single dependency, and suddenly your previously open API is asking for credentials.&lt;/p&gt;

&lt;p&gt;That out-of-the-box behaviour surprises a lot of engineers the first time they see it, but it reflects a deliberate design choice: security should be the default, not something you opt into. The alternative, checking credentials inside each controller method, means one forgotten check equals one unprotected route. Spring Security removes that category of mistake entirely by applying rules uniformly at the HTTP layer, before a single line of your business logic runs. &lt;/p&gt;

&lt;p&gt;What makes this possible is the &lt;strong&gt;security filter chain&lt;/strong&gt;. Spring Security is not a single interceptor sitting in front of your application; it is a carefully ordered sequence of servlet filters, each responsible for one concern. One filter handles CSRF token validation. Another manages session creation and lookup. Another reads the &lt;code&gt;Authorisation&lt;/code&gt; header and tries to resolve a credential. Another checks whether the resolved identity actually has permission to reach the requested URL. These filters run in a fixed order on every incoming request, and each one can either pass the request downstream or short-circuit the chain with an error response.&lt;/p&gt;

&lt;p&gt;At the centre of this pipeline sits the &lt;code&gt;SecurityContext&lt;/code&gt;. By the time the authorisation filters run, the chain expects someone to have placed an &lt;code&gt;Authentication&lt;/code&gt; object into the context, a wrapper that holds the verified identity (the &lt;em&gt;principal&lt;/em&gt;) and the roles or authorities it carries. If the context is empty when the authorisation filter runs, the chain treats the caller as anonymous. If the requested URL requires an authenticated user, the request is rejected before your controller is ever invoked.&lt;/p&gt;

&lt;p&gt;This is the mental model that makes all of Spring Security coherent: a pipeline of filters, a shared context object, and one job delegated to whichever filter is responsible for your chosen credential mechanism. Every token mechanism you'll configure, JWT, OAuth2, SSO, is simply a different strategy for putting that principal into the context. Get that strategy wrong, or register it at the wrong point in the chain, and you'll keep seeing 401s that your controllers never threw.&lt;/p&gt;

&lt;h2&gt;
  
  
  The FilterChain: Spring Security's Request Processing Pipeline
&lt;/h2&gt;

&lt;p&gt;Every HTTP request that reaches a Spring application passes through a structure called the &lt;code&gt;SecurityFilterChain&lt;/code&gt; before it ever touches a controller. Understanding this chain is what separates confident Spring Security configuration from trial-and-error annotation hunting.&lt;/p&gt;

&lt;p&gt;Spring Security enters the picture through a &lt;code&gt;DelegatingFilterProxy&lt;/code&gt;, a thin Servlet filter that the framework registers with the container at startup. [1] Its only job is to hand the request off to Spring's application context, where the real work happens inside a &lt;code&gt;SecurityFilterChain&lt;/code&gt;: an ordered list of Spring-managed filters, each responsible for one specific concern. [1] [2] Think of it as a pipeline where each stage either passes the request forward, modifies it, or short-circuits the whole chain by writing a response directly, returning a &lt;code&gt;401&lt;/code&gt;, for example, before the request ever reaches your business logic.&lt;/p&gt;

&lt;p&gt;The order of filters in that chain is not arbitrary. Spring Security ships with defaults that place CSRF protection early, session management in the middle, and the authorisation check (&lt;code&gt;AuthorizationFilter&lt;/code&gt;) near the end. Authentication filters, wherever you insert them, must run before that authorisation check, because authorisation reads from something the authentication filter writes: the &lt;code&gt;SecurityContextHolder&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;SecurityContextHolder&lt;/code&gt; is a thread-local container that holds the &lt;code&gt;SecurityContext&lt;/code&gt;, and the &lt;code&gt;SecurityContext&lt;/code&gt; holds the &lt;code&gt;Authentication&lt;/code&gt; object for the current request. [1] [1] When an authentication filter validates a credential and calls &lt;code&gt;SecurityContextHolder.getContext().setAuthentication(...)&lt;/code&gt;, it is essentially leaving a note for every filter that comes after it: "this request belongs to a principal with these roles." The authorisation filter at the end of the chain reads that note and decides whether the principal's roles satisfy the rules you configured for that endpoint. If no authentication filter populates the context, the authorisation filter finds an anonymous token and rejects the request accordingly.&lt;/p&gt;

&lt;p&gt;This design has a practical consequence for configuration. When you call &lt;code&gt;http.addFilterBefore(myFilter, UsernamePasswordAuthenticationFilter.class)&lt;/code&gt; or &lt;code&gt;addFilterAfter&lt;/code&gt;, you are specifying exactly where in this ordered pipeline your logic runs. [2] Getting that position wrong is one of the most common sources of mysterious &lt;code&gt;403&lt;/code&gt; responses, because a filter that runs after the authorisation check can no longer influence the outcome of that check.&lt;/p&gt;

&lt;p&gt;Each of the three token mechanisms covered in the next sections plugs into this chain at a different point and populates the &lt;code&gt;SecurityContextHolder&lt;/code&gt; in a different way. A JWT filter reads the &lt;code&gt;Authorisation&lt;/code&gt; header, validates the token cryptographically, and sets the authentication directly, all within a single filter. OAuth2's resource server support does something similar but delegates signature verification to a remote or locally cached JWK set. SSO flows are different again: they redirect the browser entirely, establish a session on return, and then rely on a session-reading filter to restore the &lt;code&gt;SecurityContext&lt;/code&gt; on subsequent requests. Same chain, same &lt;code&gt;SecurityContextHolder&lt;/code&gt; contract, three different plug-in points. That consistency is what makes Spring Security composable once you internalise the pipeline model.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy40c7oaucc42r3fn2vqr.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy40c7oaucc42r3fn2vqr.jpg" alt=" " width="800" height="1417"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  JWT Authentication: Stateless Token Validation in the Filter Chain
&lt;/h2&gt;

&lt;p&gt;With the filter chain's structure in mind, it's worth seeing exactly how a real token mechanism plugs into it. JWT is the most common starting point, partly because it requires no session storage and no round-trip to an authorisation server; the token itself carries everything the application needs to make an access decision.&lt;/p&gt;

&lt;p&gt;When a client sends a request to a secured endpoint, it includes a bearer token in the &lt;code&gt;Authorisation&lt;/code&gt; header. Spring Security doesn't know what to do with that header on its own; nothing in the default chain extracts a JWT. That gap is your extension point. You create a filter that extends &lt;code&gt;OncePerRequestFilter&lt;/code&gt;, extract the token from the header, validate its signature and expiry, and then construct an &lt;code&gt;Authentication&lt;/code&gt; object that you write into the &lt;code&gt;SecurityContextHolder&lt;/code&gt;. [1] From that moment forward in the chain, the request looks just like any other authenticated request; downstream filters and your controllers see a populated security context and never need to know a JWT was involved.&lt;/p&gt;

&lt;p&gt;Position matters here. Your custom filter must be registered &lt;em&gt;before&lt;/em&gt; &lt;code&gt;UsernamePasswordAuthenticationFilter&lt;/code&gt; in the chain, which you do with &lt;code&gt;addFilterBefore&lt;/code&gt; in your &lt;code&gt;SecurityFilterChain&lt;/code&gt; configuration. If it runs after the authorisation filters have already evaluated the request, the &lt;code&gt;SecurityContext&lt;/code&gt; will be empty at the moment it counts, and the request will be rejected as unauthenticated regardless of whether the token was valid.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Component&lt;/span&gt;
&lt;span class="nd"&gt;@RequiredArgsConstructor&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;JwtAuthenticationFilter&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;OncePerRequestFilter&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

 &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;JwtUtility&lt;/span&gt; &lt;span class="n"&gt;jwtUtility&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
 &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;UserDetailsService&lt;/span&gt; &lt;span class="n"&gt;userDetailsService&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

 &lt;span class="nd"&gt;@Override&lt;/span&gt;
 &lt;span class="kd"&gt;protected&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;doFilterInternal&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;HttpServletRequest&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
 &lt;span class="nc"&gt;HttpServletResponse&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
 &lt;span class="nc"&gt;FilterChain&lt;/span&gt; &lt;span class="n"&gt;filterChain&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
 &lt;span class="kd"&gt;throws&lt;/span&gt; &lt;span class="nc"&gt;ServletException&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;IOException&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

 &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;authHeader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getHeader&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;HttpHeaders&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;AUTHORIZATION&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
 &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;authHeader&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt; &lt;span class="o"&gt;||!&lt;/span&gt;&lt;span class="n"&gt;authHeader&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;startsWith&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Bearer "&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
 &lt;span class="n"&gt;filterChain&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;doFilter&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
 &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
 &lt;span class="o"&gt;}&lt;/span&gt;

 &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;authHeader&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;substring&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
 &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;username&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;jwtUtility&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;extractUsername&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

 &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nc"&gt;SecurityContextHolder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getContext&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;getAuthentication&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
 &lt;span class="nc"&gt;UserDetails&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;userDetailsService&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;loadUserByUsername&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
 &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;jwtUtility&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;isTokenValid&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
 &lt;span class="nc"&gt;UsernamePasswordAuthenticationToken&lt;/span&gt; &lt;span class="n"&gt;auth&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
 &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;UsernamePasswordAuthenticationToken&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getAuthorities&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
 &lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setDetails&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;WebAuthenticationDetailsSource&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;buildDetails&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;
 &lt;span class="nc"&gt;SecurityContextHolder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getContext&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;setAuthentication&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
 &lt;span class="o"&gt;}&lt;/span&gt;
 &lt;span class="o"&gt;}&lt;/span&gt;
 &lt;span class="n"&gt;filterChain&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;doFilter&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
 &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Because the server stores no session state, every request must be validated independently. The filter re-verifies the signature and the expiry claim on each call. This is what makes JWT genuinely stateless: the token is self-contained, and the server-side footprint is just the public key or shared secret used for verification.&lt;/p&gt;

&lt;p&gt;The filter is also the right place to handle more advanced concerns. Token revocation, for example, can be layered in by checking a blocklist inside the same &lt;code&gt;doFilterInternal&lt;/code&gt; method before the &lt;code&gt;Authentication&lt;/code&gt; object is written to the context. Compromised password detection follows the same pattern. The filter becomes the single choke point for every token lifecycle decision, which keeps your authorisation logic clean and your business controllers unaware of authentication mechanics.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh58vblu3jzskubntj34i.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh58vblu3jzskubntj34i.jpg" alt=" " width="799" height="508"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  OAuth2 Login &amp;amp; Resource Server: Delegating Trust to an Authorisation Server
&lt;/h2&gt;

&lt;p&gt;Where JWT authentication is a two-party handshake between your application and the token, OAuth2 introduces a third party: an authorisation server that your application has to coordinate with in real time. Spring Security handles all of that coordination, but you need to tell it which role your application is playing, because the configuration and the filter chain behaviour are meaningfully different depending on the answer.&lt;/p&gt;

&lt;p&gt;Spring Security's OAuth2 support draws a clean line between two roles. &lt;code&gt;oauth2Login()&lt;/code&gt; configures your application as an &lt;strong&gt;OAuth2 client&lt;/strong&gt;: it performs the redirect to the authorization server, receives the callback, exchanges the authorization code for tokens, and logs the user in. &lt;code&gt;oauth2ResourceServer()&lt;/code&gt; configures your application as an &lt;strong&gt;API backend&lt;/strong&gt; that accepts and validates bearer tokens issued by an external authorization server. These two modes are not interchangeable. A browser-facing web app typically uses &lt;code&gt;oauth2Login()&lt;/code&gt;; a REST API consumed by other services typically uses &lt;code&gt;oauth2ResourceServer()&lt;/code&gt;. Some architectures need both in separate security filter chains.&lt;/p&gt;

&lt;p&gt;In the Authorisation Code flow, &lt;code&gt;OAuth2LoginAuthenticationFilter&lt;/code&gt; does the heavy lifting on the client side. When the authorisation server redirects the browser back to your app with a &lt;code&gt;?code=...&lt;/code&gt; parameter, this filter intercepts the request, calls the authorisation server's token endpoint to exchange the code for an access token and ID token, loads the user's identity from the user-info endpoint or from the ID token directly, and populates the &lt;code&gt;SecurityContext&lt;/code&gt;. You do not write any of that exchange logic yourself. The minimal configuration to enable it looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Configuration&lt;/span&gt;
&lt;span class="nd"&gt;@EnableWebSecurity&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SecurityConfig&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

 &lt;span class="nd"&gt;@Bean&lt;/span&gt;
 &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;SecurityFilterChain&lt;/span&gt; &lt;span class="nf"&gt;filterChain&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;HttpSecurity&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="kd"&gt;throws&lt;/span&gt; &lt;span class="nc"&gt;Exception&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
 &lt;span class="n"&gt;http&lt;/span&gt;
&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;authorizeHttpRequests&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;auth&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;auth&lt;/span&gt;
&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;anyRequest&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;authenticated&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
 &lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;oauth2Login&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Customizer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;withDefaults&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
 &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
 &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Spring Boot auto-configuration picks up the client registration details from &lt;code&gt;application.yml&lt;/code&gt;, client ID, client secret, authorisation URI, and token URI, so the Java config stays short.&lt;/p&gt;

&lt;p&gt;On the resource server side, &lt;code&gt;oauth2ResourceServer(OAuth2ResourceServerConfigurer::jwt)&lt;/code&gt; swaps in a &lt;code&gt;BearerTokenAuthenticationFilter&lt;/code&gt; that reads the &lt;code&gt;Authorisation: Bearer...&lt;/code&gt; header, validates the JWT's signature against the authorisation server's published JWKS endpoint, and populates the &lt;code&gt;SecurityContext&lt;/code&gt; with the decoded claims. Introspection, calling the authorisation server's introspect endpoint on every request instead of validating locally, is also supported, and is the right choice when you need real-time token revocation checks rather than relying on a short expiry window.&lt;/p&gt;

&lt;p&gt;Token lifecycle is another area where Spring Security saves work. When a stored access token has expired, the OAuth2 client support automatically attempts a refresh token grant before forwarding the request. Your application layer never sees the expired token.&lt;/p&gt;

&lt;p&gt;One configuration pitfall is worth calling out explicitly. If you want to run custom logic after the token endpoint responds, setting a cookie from the token, for example, that logic must be placed inside the &lt;code&gt;SecurityFilterChain&lt;/code&gt; at the correct position using &lt;code&gt;addFilterAfter()&lt;/code&gt;, not in a plain servlet filter sitting outside Spring Security. Filters registered outside the chain execute at a different point in the request lifecycle and cannot reliably observe the outcome of Spring Security's token processing.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvq1a6i0cousezeo9reii.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvq1a6i0cousezeo9reii.jpg" alt=" " width="800" height="671"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  SSO with Spring Security: Session-Backed Identity Federation
&lt;/h2&gt;

&lt;p&gt;The sharpest difference between SSO and the JWT path covered earlier is not in the configuration, it's in what survives a request boundary. JWT authentication is stateless: every request carries its own proof of identity in the &lt;code&gt;Authorization&lt;/code&gt; header, and the filter validates that proof from scratch each time. SSO via &lt;code&gt;oauth2Login()&lt;/code&gt; works the opposite way. Once a user completes the redirect dance with the identity provider, Spring Security stores the resulting &lt;code&gt;Authentication&lt;/code&gt; object in the HTTP session, and every subsequent request is resolved against that session rather than against a fresh token. &lt;/p&gt;

&lt;p&gt;That session-backed model is what makes SSO feel seamless to the end user: they authenticate once with Google or GitHub, get redirected back to your application, and from that point on the browser's session cookie is the credential. The identity provider is not consulted again until the session expires or the user explicitly signs out.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;application.yml&lt;/code&gt; registration for a Google-backed SSO looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;spring&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
 &lt;span class="na"&gt;security&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
 &lt;span class="na"&gt;oauth2&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
 &lt;span class="na"&gt;client&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
 &lt;span class="na"&gt;registration&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
 &lt;span class="na"&gt;google&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
 &lt;span class="na"&gt;client-id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;YOUR_CLIENT_ID&lt;/span&gt;
 &lt;span class="na"&gt;client-secret&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;YOUR_CLIENT_SECRET&lt;/span&gt;
 &lt;span class="na"&gt;scope&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;openid, profile, email&lt;/span&gt;
 &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
 &lt;span class="na"&gt;google&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
 &lt;span class="na"&gt;authorization-uri&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://accounts.google.com/o/oauth2/v2/auth&lt;/span&gt;
 &lt;span class="na"&gt;token-uri&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://oauth2.googleapis.com/token&lt;/span&gt;
 &lt;span class="na"&gt;user-info-uri&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://www.googleapis.com/oauth2/v3/userinfo&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Java security configuration that activates this is identical to the OAuth2 Login example in the previous section, &lt;code&gt;http.oauth2Login()&lt;/code&gt; is all that's needed, with no additional filter registration. What changes is what happens inside the filter chain after the callback completes.&lt;/p&gt;

&lt;p&gt;In the JWT path, &lt;code&gt;SecurityContextHolderFilter&lt;/code&gt; reconstructs the &lt;code&gt;SecurityContext&lt;/code&gt; on every request by validating the bearer token. In the SSO path, the same filter reconstructs the &lt;code&gt;SecurityContext&lt;/code&gt; by reading it from the &lt;code&gt;HttpSession&lt;/code&gt;, no token validation occurs because the session already holds a fully populated &lt;code&gt;OAuth2AuthenticationToken&lt;/code&gt;. That is the meaningful delta: the persistence strategy changes from token-per-request to session-per-user, and &lt;code&gt;SecurityContextHolderFilter&lt;/code&gt; adapts accordingly through &lt;code&gt;HttpSessionSecurityContextRepository&lt;/code&gt; rather than the stateless repository wired in the JWT configuration.&lt;/p&gt;

&lt;p&gt;The older &lt;code&gt;@EnableOAuth2Sso&lt;/code&gt; annotation from the Spring Security OAuth2 legacy project achieved the same redirect-and-session behaviour, but required you to manually wire an &lt;code&gt;OAuth2ClientContext&lt;/code&gt; and build an &lt;code&gt;OAuth2ClientAuthenticationProcessingFilter&lt;/code&gt; yourself. Spring Security 5's native &lt;code&gt;oauth2Login()&lt;/code&gt; DSL absorbed all of that plumbing, which is why a single YAML block and one DSL method call are now sufficient for a fully working SSO integration against Google, GitHub, or any standards-compliant identity provider. &lt;/p&gt;

&lt;p&gt;One practical implication worth naming: because the session holds the &lt;code&gt;Authentication&lt;/code&gt;, horizontal scaling requires either sticky sessions or a shared session store like Redis. This is the trade-off you accept relative to the stateless JWT model, where any node can validate a request independently.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjz9dh10cal2uil6lf0k1.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjz9dh10cal2uil6lf0k1.jpg" alt=" " width="800" height="768"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Comparing the Three Mechanisms: Which Hook, Which Flow, Which Config
&lt;/h2&gt;

&lt;p&gt;The three token mechanisms are not strict alternatives to one another, they solve slightly different problems, and they plug into the filter chain at different points. Knowing which slot each one occupies is the fastest way to choose between them and to diagnose failures when they occur.&lt;/p&gt;

&lt;p&gt;JWT authentication is the right choice for stateless APIs and microservices where no session should be kept server-side. Your application is the sole verifier of the token: it checks the signature, extracts claims, and populates the &lt;code&gt;SecurityContext&lt;/code&gt;, all within a single &lt;code&gt;OncePerRequestFilter&lt;/code&gt; placed before &lt;code&gt;UsernamePasswordAuthenticationFilter&lt;/code&gt;. Nothing about the user's identity is stored between requests.&lt;/p&gt;

&lt;p&gt;OAuth2 resource server configuration is appropriate when a separate authorisation server issues tokens, and your application should validate them against a well-known JWKS endpoint. Spring's built-in &lt;code&gt;BearerTokenAuthenticationFilter&lt;/code&gt; handles the extraction and delegates to the &lt;code&gt;JwtDecoder&lt;/code&gt; you configure, so you write almost no filter code yourself. The trust anchor is the authorisation server's public key, not anything your application generates.&lt;/p&gt;

&lt;p&gt;SSO via &lt;code&gt;oauth2Login&lt;/code&gt; applies when a browser-based user needs to authenticate once and have that identity recognised across multiple applications. It reuses the same &lt;code&gt;OAuth2LoginAuthenticationFilter&lt;/code&gt; that handles a standard OAuth2 login callback, but the session becomes the persistence mechanism rather than a token the client presents on each request. The identity provider, not your application, manages when that session expires.&lt;/p&gt;

&lt;p&gt;When a 401 appears, and you need to find out why, the filter chain position is almost always the first thing worth examining. In a JWT setup, a silent 401 with no accompanying error body usually means one of two things: the custom filter ran but did not write to the &lt;code&gt;SecurityContext&lt;/code&gt;, or the filter was registered at a position that places it after the authorisation check rather than before it. In an OAuth2 resource server setup, the failure is more often a misconfigured &lt;code&gt;issuer-uri&lt;/code&gt; or an expired token that the decoder rejects. In an SSO setup, a 302 redirect rather than a 401 is the more common symptom; the session wasn't found, so the user is sent back to the identity provider.&lt;/p&gt;

&lt;p&gt;The filter chain is the right mental model to carry into all three of these situations. Once you see JWT validation, OAuth2 token introspection, and SSO session lookup as separate plug-ins wired into the same ordered pipeline, the framework stops behaving like an unpredictable wall and starts behaving like a predictable sequence you can reason about step by step. Every 401 is answerable by asking: which filter was responsible for this request, did it run, and if it ran, did it write to the &lt;code&gt;SecurityContext&lt;/code&gt;? You can make that question concrete immediately by enabling &lt;code&gt;TRACE&lt;/code&gt; logging on &lt;code&gt;org.springframework.security&lt;/code&gt;, the output prints each filter in the chain as it executes, which turns a confusing status code into a visible gap in the sequence. That visibility is a direct consequence of the model: when you understand the pipeline, you know exactly where to look.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://github.com/hardikSinghBehl/jwt-auth-flow-spring-security" rel="noopener noreferrer"&gt;GitHub - hardikSinghBehl/jwt-auth-flow-spring-security: Java backend application using Spring-security to implement JWT based Authentication and Authorization · GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://stackoverflow.com/questions/58331184/spring-security-oauth2-add-filter-after-oauth-token-call" rel="noopener noreferrer"&gt;java - Spring security oauth2 - add filter after oauth/token call - Stack Overflow&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>backend</category>
      <category>java</category>
      <category>security</category>
      <category>springboot</category>
    </item>
    <item>
      <title>How Spring Data JPA, JPA, and Hibernate work together</title>
      <dc:creator>srinivas reddy gouru</dc:creator>
      <pubDate>Tue, 26 May 2026 17:24:05 +0000</pubDate>
      <link>https://dev.to/srinivas_gouru_d26dc31f21/how-spring-data-jpa-jpa-and-hibernate-work-together-55cn</link>
      <guid>https://dev.to/srinivas_gouru_d26dc31f21/how-spring-data-jpa-jpa-and-hibernate-work-together-55cn</guid>
      <description>&lt;h2&gt;
  
  
  The Magic Line That Raises the Right Question
&lt;/h2&gt;

&lt;p&gt;Spring Data JPA is a library that lets you query a relational database by writing a Java interface method and nothing else: no SQL, no &lt;code&gt;ResultSet&lt;/code&gt; parsing, no &lt;code&gt;PreparedStatement&lt;/code&gt; boilerplate. You declare what you want, and the framework figures out how to fetch it.&lt;/p&gt;

&lt;p&gt;Here is the moment that stops most backend engineers in their tracks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;interface&lt;/span&gt; &lt;span class="nc"&gt;UserRepository&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;JpaRepository&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;User&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Long&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
 &lt;span class="nc"&gt;Optional&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;User&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;findByEmail&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is the entire implementation. You inject &lt;code&gt;UserRepository&lt;/code&gt; into a service, call &lt;code&gt;userRepository.findByEmail("alice@example.com")&lt;/code&gt;, and get back a populated &lt;code&gt;User&lt;/code&gt; object from the database. [1] The method works on the first try, which is satisfying until something breaks, or until you need to understand why a query is slow, why a &lt;code&gt;LazyInitializationException&lt;/code&gt; keeps appearing, or why your transaction did not roll back the way you expected.&lt;/p&gt;

&lt;p&gt;At that point, the magic stops feeling helpful and starts feeling like a wall.&lt;/p&gt;

&lt;p&gt;The wall exists because &lt;code&gt;findByEmail&lt;/code&gt; is not a single thing. It is the visible surface of three separate layers, each with its own responsibilities, failure modes, and configuration surface. Spring Data JPA translated your method name into a query. JPA, the Java Persistence API, provided the standard contract that queries had to follow. Hibernate actually executed it against the database. Without knowing which layer owns which job, you cannot know which layer to look at when something goes wrong.&lt;/p&gt;

&lt;p&gt;To answer why that one line works, you need a clear picture of all three layers. That is what the rest of this article builds.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three Layers, Three Jobs: The Stack in One Mental Model
&lt;/h2&gt;

&lt;p&gt;When you call &lt;code&gt;findByEmail("alice@example.com")&lt;/code&gt; on a repository interface you never implemented, what actually runs? The answer involves three distinct layers, each with a single clear job, and conflating any two of them is the source of most of the confusion engineers run into.&lt;/p&gt;

&lt;p&gt;At the bottom sits &lt;strong&gt;Hibernate&lt;/strong&gt;. Hibernate is a full ORM framework. It owns a &lt;code&gt;SessionFactory&lt;/code&gt;, manages an in-memory representation of your entities, generates SQL, and fires that SQL over JDBC to the database. [2] When things go wrong at the database level, a bad join, an unexpected number of queries, or a lazy-loading exception, Hibernate is the layer to examine.&lt;/p&gt;

&lt;p&gt;In the middle sits &lt;strong&gt;JPA&lt;/strong&gt; (Java Persistence API). JPA is not a library you can download and run; it is a specification, a set of interfaces, annotations, and contracts that any compliant ORM must honour. [3] It defines what &lt;code&gt;@Entity&lt;/code&gt;, &lt;code&gt;@OneToMany&lt;/code&gt;, and &lt;code&gt;EntityManager&lt;/code&gt; mean, but it ships no implementation of its own. Hibernate is Spring Boot's default JPA provider, meaning Hibernate is the concrete code that fulfils those contracts. [2] This distinction matters because your annotations belong to JPA, while the SQL they produce belongs to Hibernate.&lt;/p&gt;

&lt;p&gt;At the top sits &lt;strong&gt;Spring Data JPA&lt;/strong&gt;. Its sole job is to eliminate boilerplate in the data-access layer. [1] You write a repository interface; Spring generates a proxy implementation at startup that translates method names and annotations into JPA operations. [1] Spring Data JPA does not talk to your database directly, and it is not itself an ORM. Every call it receives flows downward through JPA's &lt;code&gt;EntityManager&lt;/code&gt; and on into Hibernate, which finally issues the JDBC call. &lt;/p&gt;

&lt;p&gt;Laid out as a delegation chain, the flow looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Your code
 → Spring Data JPA (repository proxy)
 → JPA EntityManager (standard contract)
 → Hibernate (SQL generation + JDBC)
 → Database
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each arrow in that chain crosses a responsibility boundary. Spring Data JPA knows about Spring conventions and method-name patterns. JPA knows about entities and the persistence context lifecycle. Hibernate knows about SQL dialects and connection pooling. None of the three layers was designed to do each other's jobs, which is exactly why each one is replaceable in theory. You could swap Hibernate for EclipseLink and keep all your JPA annotations untouched, or swap Spring Data JPA for plain &lt;code&gt;EntityManager&lt;/code&gt; calls and keep Hibernate doing exactly what it already does.&lt;/p&gt;

&lt;p&gt;Keeping this map in your head pays off the moment something goes wrong. A &lt;code&gt;LazyInitializationException&lt;/code&gt; is a Hibernate story. A &lt;code&gt;@Transactional&lt;/code&gt; annotation that seems to do nothing is a Spring story. A repository method that generates a query you didn't expect is a Spring Data JPA story, though Hibernate ultimately executes it. The next three sections go bottom-up through the stack, starting with the layer that actually touches your database.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fghrdrxsi72k8we0xo79j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fghrdrxsi72k8we0xo79j.png" alt=" " width="800" height="2443"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Hibernate: The Engine That Actually Talks to the Database
&lt;/h2&gt;

&lt;p&gt;When your Spring application saves an entity and an SQL &lt;code&gt;INSERT&lt;/code&gt; appears in the logs, Hibernate wrote that statement. It is an Object-Relational Mapping framework whose single job is bridging the gap between Java objects and relational tables, translating method calls and object state into the SQL your database actually understands, so you don't have to hand-write every query for routine CRUD operations. [2]&lt;/p&gt;

&lt;p&gt;At the centre of Hibernate's runtime is the &lt;code&gt;SessionFactory&lt;/code&gt;, a heavyweight, thread-safe object built once at application startup. From that factory, Hibernate creates per-request &lt;code&gt;Session&lt;/code&gt; instances (or, in the JPA vocabulary you'll see more often in Spring code, &lt;code&gt;EntityManager&lt;/code&gt; instances). Those per-request objects are not thread-safe, so Spring handles their lifecycle carefully. The &lt;code&gt;EntityManager&lt;/code&gt; you inject with &lt;code&gt;@PersistenceContext&lt;/code&gt; is actually a thread-bound proxy that delegates to whichever transactional &lt;code&gt;EntityManager&lt;/code&gt; is currently active for that request, falling back to a freshly created one if no transaction is in progress. [4] That indirection is invisible in normal usage but matters when you start debugging concurrency issues.&lt;/p&gt;

&lt;p&gt;Several behaviours that surface as runtime surprises are owned entirely by Hibernate, not by the layers above it. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dirty checking&lt;/strong&gt; is one of the most useful and least understood. Inside an active transaction, Hibernate holds a snapshot of every entity it has loaded. When the transaction commits, it compares current field values against those snapshots and issues &lt;code&gt;UPDATE&lt;/code&gt; statements automatically for anything that changed, even if you never called &lt;code&gt;save()&lt;/code&gt;. This is a feature, but it confuses engineers who expect no SQL to run unless they explicitly persist something.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;First-level cache&lt;/strong&gt; (the session cache) means that within a single &lt;code&gt;EntityManager&lt;/code&gt; lifetime, calling &lt;code&gt;find(User.class, 1L)&lt;/code&gt; twice returns the same Java object from memory on the second call, with no second database round-trip. This cache is scoped to the session, not the application, so it disappears when the session closes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lazy loading&lt;/strong&gt; lets Hibernate defer fetching associated collections until your code actually accesses them. A &lt;code&gt;User&lt;/code&gt; with a &lt;code&gt;@OneToMany&lt;/code&gt; list of &lt;code&gt;Orders&lt;/code&gt; won't load those orders until you call &lt;code&gt;user.getOrders()&lt;/code&gt;. The downside is the N+1 problem: if you load 50 users in a list and then touch &lt;code&gt;.getOrders()&lt;/code&gt; for each one inside a loop, Hibernate fires one query to fetch the users and then 50 separate queries to fetch each user's orders. The fix usually involves a &lt;code&gt;JOIN FETCH&lt;/code&gt; in your query or switching the fetch type, but the first step is recognising that Hibernate is what's generating those 50 extra statements.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SQL dialect translation&lt;/strong&gt; means Hibernate handles the differences between database engines. You configure &lt;code&gt;spring.jpa.database-platform&lt;/code&gt; to point at a dialect class, and Hibernate adapts its generated SQL to match the syntax of PostgreSQL, MySQL, Oracle, or whatever you're running.&lt;/p&gt;

&lt;p&gt;All of these behaviours, dirty checking, caching, lazy loading, dialect generation, live at the Hibernate layer. When something breaks, the explanation usually lives there too. The layer above it, JPA, doesn't implement any of this; it just defines the contract that Hibernate must honour.&lt;/p&gt;

&lt;h2&gt;
  
  
  JPA: The Standard Contract Every ORM Must Honour
&lt;/h2&gt;

&lt;p&gt;The previous section covered what Hibernate actually does at runtime. But if you look at a typical Spring Boot entity class, you'll notice the imports don't say &lt;code&gt;org.hibernate.*&lt;/code&gt;. They say &lt;code&gt;jakarta.persistence.*&lt;/code&gt; (or &lt;code&gt;javax.persistence.*&lt;/code&gt; on older versions). That's JPA, and understanding why it's there clarifies the whole stack.&lt;/p&gt;

&lt;p&gt;JPA, the Jakarta Persistence API, is a specification, not a library you run. It defines a set of interfaces, annotations, and rules that any compliant ORM must implement. [2] The annotations you use every day, &lt;code&gt;@Entity&lt;/code&gt;, &lt;code&gt;@Id&lt;/code&gt;, &lt;code&gt;@OneToMany&lt;/code&gt;, &lt;code&gt;@Column&lt;/code&gt;, are all defined by JPA. The central API for interacting with the persistence context, &lt;code&gt;EntityManager&lt;/code&gt;, is also defined by JPA, with methods like &lt;code&gt;persist&lt;/code&gt;, &lt;code&gt;find&lt;/code&gt;, &lt;code&gt;merge&lt;/code&gt;, and &lt;code&gt;remove&lt;/code&gt;. Hibernate is simply the most popular implementation of that contract.&lt;/p&gt;

&lt;p&gt;This separation matters practically. Because your application code targets JPA interfaces rather than Hibernate-specific classes, you could swap Hibernate for EclipseLink or OpenJPA and recompile without touching your entity or service code. In practice, most teams never make that swap, but portability isn't the only benefit. The bigger win is that JPA gives the whole Java ecosystem a shared vocabulary. Documentation, Stack Overflow answers, and framework libraries all speak JPA, even when each is backed by a different provider.&lt;/p&gt;

&lt;p&gt;JPA also introduced JPQL, the Java Persistence Query Language, which lets you query against entity class names and field names rather than table and column names. A JPQL query looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;TypedQuery&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Order&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;entityManager&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;createQuery&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
 &lt;span class="s"&gt;"SELECT o FROM Order o WHERE o.customer.email =:email"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Order&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;class&lt;/span&gt;
&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setParameter&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"email"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"alice@example.com"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Order&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getResultList&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Compare that to raw JDBC, where you would write a SQL string against &lt;code&gt;orders.customer_email&lt;/code&gt;, manually open a &lt;code&gt;ResultSet&lt;/code&gt;, and map each column back to a field by hand. [5] JPQL keeps queries at the object level, so renaming a Java field and its corresponding column mapping stays in one place rather than scattered across SQL strings throughout the codebase.&lt;/p&gt;

&lt;p&gt;The honest limitation of plain JPA is that it still requires you to wire up an &lt;code&gt;EntityManagerFactory&lt;/code&gt;, manage &lt;code&gt;EntityManager&lt;/code&gt; lifecycles, write those &lt;code&gt;createQuery&lt;/code&gt; calls, and handle transactions explicitly. For a small application that's manageable. For a service with thirty entity types and hundreds of queries, it becomes a lot of repeated structural code. That gap is exactly what Spring Data JPA was designed to close.&lt;/p&gt;

&lt;h2&gt;
  
  
  Spring Data JPA: The Boilerplate Eliminator on Top
&lt;/h2&gt;

&lt;p&gt;With the JPA specification and Hibernate's implementation both accounted for, the question that opened this article is still hanging: where exactly does &lt;code&gt;findByEmail()&lt;/code&gt; come from? You never wrote a SQL query, never touched an &lt;code&gt;EntityManager&lt;/code&gt;, never defined a concrete class. The answer lives entirely in Spring Data JPA's repository abstraction.&lt;/p&gt;

&lt;p&gt;When your application starts up, Spring Data JPA scans for interfaces that extend &lt;code&gt;JpaRepository&lt;/code&gt; (or one of its parent interfaces) and generates concrete implementations on the fly. [1] No DAO class needs to be written, because Spring creates a proxy object that wires in the actual behaviour at startup, before any request arrives. That proxy is what gets injected into your service layer when you use &lt;code&gt;@Autowired&lt;/code&gt; or constructor injection.&lt;/p&gt;

&lt;p&gt;The method-name translation is the part that feels most like magic. Spring Data JPA parses the method signature using reflection, breaks it down by the keywords it recognises (&lt;code&gt;findBy&lt;/code&gt;, &lt;code&gt;And&lt;/code&gt;, &lt;code&gt;Or&lt;/code&gt;, &lt;code&gt;OrderBy&lt;/code&gt;, &lt;code&gt;Between&lt;/code&gt;, and so on), and constructs a JPQL query to match. A method named &lt;code&gt;findByLastNameAndAgeGreaterThan&lt;/code&gt; becomes something like &lt;code&gt;SELECT u FROM User u WHERE u.lastName =:lastName AND u.age &amp;gt;:age&lt;/code&gt;, assembled at startup rather than at call time. If you make a typo in the method name that produces an unresolvable expression, the application will fail to start, not fail at 2 AM when that code path is first hit in production.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;interface&lt;/span&gt; &lt;span class="nc"&gt;UserRepository&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;JpaRepository&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;User&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Long&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
 &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;User&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;findByLastName&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;lastName&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
 &lt;span class="nc"&gt;Optional&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;User&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;findByEmail&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
 &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;User&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;findByAgeGreaterThanOrderByLastNameAsc&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;age&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is the entire file. Spring Data JPA does the rest.&lt;/p&gt;

&lt;p&gt;When one of these methods is called at runtime, the proxy delegates to a &lt;code&gt;SimpleJpaRepository&lt;/code&gt; implementation under the hood, which in turn uses a JPA &lt;code&gt;EntityManager&lt;/code&gt; to execute the translated query. That &lt;code&gt;EntityManager&lt;/code&gt; call flows down to Hibernate, which generates the SQL and sends it to the database. The call chain traverses all three layers: your repository interface, the JPA &lt;code&gt;EntityManager&lt;/code&gt;, and Hibernate's SQL engine.&lt;/p&gt;

&lt;p&gt;Spring Data JPA is part of the broader Spring Data umbrella project, which applies the same repository abstraction pattern to MongoDB, Redis, Cassandra, and other stores. Switching from a relational database to MongoDB doesn't require learning an entirely different programming model. You extend a MongoDB-specific repository interface, and the query derivation works the same way. That consistency is the deeper goal of Spring Data JPA's design.&lt;/p&gt;

&lt;p&gt;Derived query methods cover a large surface area, but they are not the only option. For anything complex, &lt;code&gt;@Query&lt;/code&gt; lets you write JPQL (or even native SQL with &lt;code&gt;nativeQuery = true&lt;/code&gt;) directly on the method. The proxy mechanism stays in place; only the query source changes. This means you can stay in the repository interface without writing any implementation code, even for queries that method-name derivation cannot express cleanly.&lt;/p&gt;

&lt;p&gt;Knowing which layer owns which behaviour is immediately useful when things go wrong.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzs8v067fp46vvjd1dj20.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzs8v067fp46vvjd1dj20.jpg" alt=" " width="799" height="514"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  When the Magic Breaks: Debugging Across the Three Layers
&lt;/h2&gt;

&lt;p&gt;Each failure mode has a home layer, and pointing your debugger at the wrong one wastes time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;LazyInitializationException&lt;/code&gt;, Hibernate's session is closed&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This exception means Hibernate tried to load a lazy association, looked for an open persistence context, and found none. [4] It is a Hibernate-layer error about session lifecycle, not a Spring Data JPA bug. The usual cause is accessing a &lt;code&gt;@OneToMany&lt;/code&gt; collection after the repository method has returned and the &lt;code&gt;EntityManager&lt;/code&gt; has closed. [2] Fix it at the layer that owns sessions: either annotate the calling service method with &lt;code&gt;@Transactional&lt;/code&gt; so the persistence context stays open while you traverse the association, or switch the fetch type to eager for relationships you always need. A third option is a JPQL &lt;code&gt;JOIN FETCH&lt;/code&gt; or &lt;code&gt;@EntityGraph&lt;/code&gt; on the repository method itself, which tells Hibernate to load the association in the initial query rather than deferring it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Unexpected query count, Hibernate's lazy-loading strategy&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If your application issues far more SQL statements than you expect, the N+1 problem is the likely culprit. Hibernate's default for &lt;code&gt;@OneToMany&lt;/code&gt; is &lt;code&gt;FetchType.LAZY&lt;/code&gt;, so fetching ten authors and iterating their posts triggers one query for the author list and ten more for the post collections. [6] You won't see this in your repository interface; you have to look at what Hibernate is generating. Enable SQL logging with these two lines in &lt;code&gt;application.properties&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="py"&gt;spring.jpa.show-sql&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;true&lt;/span&gt;
&lt;span class="py"&gt;logging.level.org.hibernate.SQL&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;DEBUG&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once you can see the queries, the fix is at the Hibernate or JPA layer, not the Spring Data layer. A &lt;code&gt;JOIN FETCH&lt;/code&gt; in a &lt;code&gt;@Query&lt;/code&gt; annotation collapses the N+1 into a single join. [6] &lt;code&gt;@EntityGraph&lt;/code&gt; on the repository method achieves the same result declaratively. [6] For cases where you prefer to keep lazy loading but still want fewer round trips, setting &lt;code&gt;spring.jpa.properties.hibernate.default_batch_fetch_size=10&lt;/code&gt; tells Hibernate to load related collections in batches with an &lt;code&gt;IN&lt;/code&gt; clause rather than one query per row. [6]&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Transaction rollback surprises, Spring's &lt;code&gt;@Transactional&lt;/code&gt; infrastructure&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When a transaction does not behave the way you expect, changes not persisting, rollbacks firing on the wrong exception, or operations on a second data source not participating in the transaction, the problem lives at the Spring layer. &lt;code&gt;@Transactional&lt;/code&gt; is managed entirely by Spring's AOP proxy, and its default behaviour is to roll back only on unchecked exceptions. If your application has multiple data sources, Spring applies &lt;code&gt;@Transactional&lt;/code&gt; to the primary one by default; the secondary data source participates only if you configure a &lt;code&gt;JtaTransactionManager&lt;/code&gt; or explicitly name the transaction manager in the annotation. [7] The fix is a Spring configuration change, not a Hibernate tuning exercise.&lt;/p&gt;

&lt;p&gt;The mental model from earlier sections makes all three of these straightforward to triage: session lifecycle is Hibernate, query shape is Hibernate, transaction boundaries are Spring. When you see a stack trace or unexpected behaviour, ask which layer owns that concept, then inspect or configure exactly that layer. Turning on SQL logging in development is a good habit regardless; the raw queries Hibernate sends are the single most informative signal available, and most performance surprises reveal themselves within a few minutes of reading them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://spring.io/projects/spring-data-jpa" rel="noopener noreferrer"&gt;Spring Data JPA&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://medium.com/@burakkocakeu/jpa-hibernate-and-spring-data-jpa-efa71feb82ac" rel="noopener noreferrer"&gt;JPA, Hibernate And Spring Data JPA&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.baeldung.com/spring-data-jpa-vs-jpa" rel="noopener noreferrer"&gt;Difference Between JPA and Spring Data JPA | Baeldung&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.spring.io/spring-framework/reference/data-access/orm/jpa.html" rel="noopener noreferrer"&gt;JPA :: Spring Framework&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.baeldung.com/jpa-vs-jdbc" rel="noopener noreferrer"&gt;A Comparison Between JPA and JDBC | Baeldung&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/sadiul_hakim/understanding-and-solving-the-n1-problem-in-spring-data-jpa-2b6f"&gt;Understanding and Solving the N+1 Problem in Spring Data JPA - DEV Community&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://stackoverflow.com/questions/23862994/whats-the-difference-between-hibernate-and-spring-data-jpa" rel="noopener noreferrer"&gt;java - What's the difference between Hibernate and Spring Data JPA - Stack Overflow&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>backend</category>
      <category>database</category>
      <category>java</category>
      <category>springboot</category>
    </item>
  </channel>
</rss>
