DEV Community: Piyush Kumar Singh

Method-Level Security in Spring Security — How @PreAuthorize Actually Works Inside

Piyush Kumar Singh — Mon, 20 Jul 2026 08:19:11 +0000

In the first three parts of this series, every security decision happened before your controller ran.

The filter chain intercepted the HTTP request. JWT validation proved who you were. OAuth2 handed off your identity from Google. All of it — every authentication check, every URL-level access rule — happened at the HTTP boundary, before a single line of your business logic executed.

That’s a solid foundation. But it has a gap that URL patterns can’t fill.

Consider this: .requestMatchers("/invoices/").hasRole("USER")** Lets any authenticated user hit any invoice endpoint. But what if User A should only see their own invoices, not User B's? The URL pattern /invoices/4521 doesn't know who created invoice 4521. You need to be inside the method, with access to the arguments or the return value, to make that ownership decision.

That’s the problem method-level security solves. And it’s why it exists as a completely separate layer from the filter chain.

Enabling it — and why it’s off by default

Method-level security is disabled by default. You have to opt in:

@Configuration
@EnableMethodSecurity
public class SecurityConfig {
    // rest of your config
}

Why off by default? Because enabling it causes Spring to create AOP proxies around every bean with a security annotation. That adds startup overhead and a small per-call interception cost. If you don’t need it, you don’t pay for it. Spring’s design philosophy is explicit opt-in for features that have a cost.

One thing worth knowing if you’re reading older code: @EnableMethodSecurity replaced @EnableGlobalMethodSecurity(prePostEnabled = true) in Spring Security 6. If you see the old annotation in a codebase, it still works, but it's deprecated. Don't use it for new projects — the behaviour is slightly different, and the new API is cleaner.

What Spring actually does when it sees @PreAuthorize

**
This is where most tutorials stop saying anything useful. They show you the annotation and assume you understand what fires it.

Here’s what actually happens.

When your Spring application starts and finds a bean with @PreAuthorize on a method, Spring doesn't modify your class. It creates a proxy object that wraps your bean. From that point on, any other bean in your application that injects InvoiceService actually receives the proxy, not the real service.

When something calls invoiceService.getInvoice(id), the call hits the proxy first. The proxy hands it to *AuthorizationManagerBeforeMethodInterceptor *— an AOP advice registered specifically to handle @PreAuthorize. This interceptor is the component that actually does the security check.

The interceptor goes to SecurityContextHolder and retrieves the current Authentication object. If you read Part 2 of this series, this is the same Authentication that your JWT filter stored there. If you read Part 3, it's the same one the OAuth2 login flow stored there. The filter chain populates it. Method security reads it. Same object, same store, different consumers.

The interceptor then evaluates the SpEL expression — the string inside the annotation — against that authentication. If it returns true, control passes to your real method. If it returns false, AccessDeniedException is thrown, caught by ExceptionTranslationFilter (from Part 1), and the client receives a 403.

Your actual service code only runs if all of that passes cleanly.

SpEL expressions — what you can actually write

The string inside @PreAuthorize is a Spring Expression Language (SpEL) expression evaluated at runtime. It has access to the current authentication, method arguments, and Spring beans.

The expressions you’ll use most often in production:


// Check for a role — Spring prepends "ROLE_" automatically
// This checks for ROLE_ADMIN, not ADMIN
@PreAuthorize("hasRole('ADMIN')")
public void deleteUser(Long id) { ... }

// Check for a specific authority - no prefix added
// Use this for fine-grained permissions
@PreAuthorize("hasAuthority('PAYMENT_WRITE')")
public void processPayment(Payment p) { ... }

// Access method arguments - the # prefix refers to parameter names
// This is the ownership check URL patterns can't do
@PreAuthorize("#userId == authentication.principal.id")
public Invoice getInvoice(Long userId, Long invoiceId) { ... }

// Call a Spring bean from inside the expression
// @ prefix followed by bean name
@PreAuthorize("@invoiceService.isOwner(#id, authentication.name)")
public Invoice getInvoice(Long id) { ... }

The difference between hasRole and hasAuthority trips up almost every developer at least once. hasRole('ADMIN') checks for the authority ROLE_ADMIN — Spring prepends the prefix automatically. hasAuthority('PAYMENT_WRITE') checks for the exact string. If you store custom permissions in your authorities list without the ROLE_ prefix, use hasAuthority. If you're working with standard roles granted via ROLE_ conventions, use hasRole.

The #userId == authentication.principal.id pattern is the most valuable thing method security enables. You can't express "the requested userId matches the logged-in user's id" at the URL-matcher level — but you can express it right here, next to the method that enforces it. That's both more secure (the check is right next to the code it protects) and more readable.

@PostAuthorize — when you need the return value

**
@PostAuthorize runs after the method executes. It has access to the return value through the returnObject keyword:

// Only return the invoice if the logged-in user owns it
@PostAuthorize("returnObject.ownerId == authentication.principal.id")
public Invoice getInvoice(Long id) {
    return invoiceRepository.findById(id).orElseThrow();
}

The use case: you need to fetch a record from the database to know who owns it. You don’t want to run a separate ownership query before the fetch — that’s two queries instead of one. So you fetch it, then check.

The gotcha — and this one matters: the method always runs. If it has side effects — writing to a database, sending an email, calling an external service — those side effects happen even if @PostAuthorize denies access afterward. Only use @PostAuthorize on read-only operations where running the method and then rejecting the result is safe.

Where method security fits — the full picture

**
Parts 1 through 4 have built a three-layer security system. This diagram shows how they connect.

Layer 1 is the filter chain. It operates at the HTTP boundary, before your controller. FilterChainProxy picks the right SecurityFilterChain. AuthorizationFilter applies URL-level access rules. Requests that fail here never reach your code.

Layer 2 is authentication. JWT validation via OncePerRequestFilter (Part 2) or OAuth2 login via OAuth2LoginAuthenticationFilter (Part 3) — both populate SecurityContextHolder with the verified identity. This is the layer that answers "who is this person."

Layer 3 is method security. The AOP proxy intercepts service-layer method calls. @PreAuthorize Reads from the same SecurityContextHolder that Layer 2 populated. Fine-grained, data-aware decisions happen here — decisions that Layer 1 couldn't make because they depend on method arguments or return values, not just URL patterns.

All three layers share one SecurityContextHolder. Layer 2 writes to it. Layers 1 and 3 read from it. The filter chain and method security are not two separate security systems bolted together. They're two consumers of the same authentication store, operating at different points in the request lifecycle.

@Secured and @RolesAllowed — what they are and why @PreAuthorize wins

Two older annotations you’ll encounter in legacy code:

// Spring's older annotation — no SpEL, role checks only
@Secured({"ROLE_ADMIN"})
public void deleteUser(Long id) { ... }

// JSR-250 standard - not Spring-specific, same limitations
@RolesAllowed("ADMIN")
public void deleteUser(Long id) { ... }

@Secured is Spring-specific, no SpEL. @RolesAllowed is the JSR-250 standard, also no SpEL. Both work for simple role checks. Neither lets you access method arguments, call Spring beans, or check return values.

@PreAuthorize With full SpEL support does everything both of those do and more. For new code, there's no reason to use either of the older annotations. Enable @EnableMethodSecurity and use @PreAuthorize.

When NOT to use method-level security — and the proxy trap

The self-invocation proxy trap is the most common @PreAuthorize bug in production.

If a method inside a bean calls another annotated method in the same bean, the proxy is bypassed. The call goes directly to the real object, skipping the AOP interceptor entirely. The @PreAuthorize annotation silently does nothing.

@Service
public class InvoiceService {

public void processAll() {
        // This calls the REAL method directly - not the proxy
        // @PreAuthorize on getInvoice will NOT fire
        this.getInvoice(1L);
    }
    @PreAuthorize("hasRole('ADMIN')")
    public Invoice getInvoice(Long id) {
        return invoiceRepository.findById(id).orElseThrow();
    }
}

The fix: inject the service into itself via Spring (which gives you the proxy), or move the annotated method to a different bean.

Don’t put complex ownership logic in SpEL. Calling @userService.isOwner(#id, authentication.name) from inside an annotation is hard to test, hard to debug, and invisible during code reviews. If the expression is more than one condition, extract it into a properly tested method.

Don’t use @PostAuthorize on methods with side effects. The method runs regardless. If you can’t afford the side effect when access is denied, use @PreAuthorize with a pre-fetch ownership check instead.

Method security doesn’t replace the filter chain. Both layers are needed. The filter chain handles unauthenticated requests — turning them away before they reach any code at all. Method security handles authorisation decisions that require data context the filter chain doesn’t have. They’re complementary layers of the same system, not alternatives to each other.

The series in three sentences
The filter chain decides if you can enter the building. JWT or OAuth2 proves who you are at the door. @PreAuthorize decides what rooms you can open once you're inside. Same security context. Three different places it applies.

How Kafka Actually Works — Logs, Partitions, and the Design Decisions Nobody Explains

Piyush Kumar Singh — Tue, 07 Jul 2026 09:58:16 +0000

The first time I used Kafka in production, I treated it like RabbitMQ: a message queue. Produce a message, consumer picks it up, done. The message disappears. That’s how queues work.
That mental model worked — until a downstream payment processor went down for 3 hours, and we had to replay everything it missed. I went to look for the messages.

They were still there.
That moment is when Kafka actually clicked for me. It is not a queue. The messages do not disappear when consumed. And once that distinction lands, every other design decision Kafka makes becomes logical rather than arbitrary.

The commit log — Kafka’s actual storage model

Kafka stores messages in an append-only commit log — a file on disk that only ever grows. Every message gets an offset: a sequential position number starting at zero. When a consumer reads a message, nothing is deleted. The message stays. The consumer’s position pointer simply advances.

This is why two completely independent systems can read the same Kafka topic without interfering with each other. Consumer Group A might be at offset 7. Consumer Group B might be at offset 4. They share the same underlying log but track their own positions independently. If you add a third consumer group next week — a new analytics pipeline, a new audit service — it starts at offset 0 and replays everything from the beginning.

In a traditional message queue, that’s impossible. The message was already consumed and discarded. In Kafka, the log is the source of truth, and consumers are just readers moving through it at their own pace.

Now here’s the counterintuitive part. Kafka writes to disk and is still faster than many in-memory message brokers for high throughput. The reason is sequential I/O. Random disk writes are slow — the write head has to seek across the platter. Sequential writes, appending to the end of a file, are fast. Kafka only ever appends. Combined with the OS page cache — which keeps recently written data in RAM automatically — and a zero-copy technique called sendfile() that transfers data directly from the page cache to the network socket without passing through application memory, Kafka achieves throughput that can reach hundreds of thousands of messages per second on commodity hardware.

The disk is not Kafka’s compromise. It is the design.

Topics, partitions, and the order guarantee everyone gets wrong

A topic is a named log — think of it like a database table, but for events. A partition is a shard of that topic stored on a specific broker. More partitions means more parallelism, which means higher throughput.

Here is the part that trips up almost every developer building on Kafka for the first time: Kafka guarantees message order within a partition. It does not guarantee order across partitions.

If you have a topic with three partitions and a producer sends ten messages for the same user, those messages could land on any partition — round-robin by default. If they scatter across partitions, a consumer might process them out of order.

The fix is the partition key. You set the message key to something stable — a user ID, an order ID, a transaction ID. Kafka hashes that key and routes all messages with the same key to the same partition, every time. Every message for user-1001 lands on partition 0. Every message for user-2002 lands on partition 1. Order is guaranteed per user, not globally — which is usually exactly what you need.

This is one of the most important production decisions you make when designing a Kafka-based system, and most tutorials mention it in a single sentence. Getting it wrong means debugging ordering issues at 2 am that only manifest under load, when multiple partitions are being consumed concurrently.

Consumer groups: how Kafka scales consumption

A consumer group is a set of consumers that share the work of reading a topic. Kafka assigns each partition to exactly one consumer in the group at any given time. With three partitions and three consumers, each consumer handles one partition. With three partitions and five consumers, two consumers sit idle — you cannot have more active consumers than partitions.

Each consumer group tracks its own offset for each partition, stored in an internal Kafka topic called __consumer_offsets. When a consumer restarts after a crash, it reads its last committed offset from that topic and resumes exactly where it left off. This is what makes Kafka fault-tolerant — not replication alone, but the combination of durable storage and tracked offsets.

The production pain point is rebalancing. When a consumer joins or leaves a group — whether through a deployment, a crash, or a scale event — Kafka reassigns partitions across the remaining consumers. During a rebalance, consumption pauses across the entire group. For most workloads, this pause lasts a few seconds. For high-throughput systems with many consumers and frequent deployments, rebalancing can become a significant source of latency spikes — sometimes called a “rebalancing storm.”

Kafka 4.0’s KIP-848 incremental cooperative rebalancing dramatically reduces this. Instead of stopping all consumers and reassigning all partitions, consumers now transfer partitions incrementally while other consumers continue reading. If you’re seeing rebalancing issues in an older Kafka version, upgrading is one of the most effective fixes available.

Replication — how Kafka survives broker failures

Each partition has one leader broker and zero or more follower brokers. All reads and writes go to the leader. Followers replicate the leader’s log passively, staying as close to current as possible. If the leader broker fails, Kafka elects a new leader from the in-sync replica set (ISR) — the brokers that are fully caught up with the leader.

The replication factor controls how many copies exist. A factor of three means you can lose two brokers and still serve traffic. Most production setups use three.

The acks producer setting is your durability dial:

// Fire and forget — fastest, but you can lose messages
props.put("acks", "0");

// Wait for leader only — fast, safer
props.put("acks", "1");

// Wait for all in-sync replicas — slowest, safest
props.put("acks", "all");
props.put("min.insync.replicas", "2");

In fintech systems, acks=all with min.insync.replicas=2 is the standard. You are waiting for at least two brokers to confirm the write before acknowledging the producer. The added latency — typically 10–30ms — is a fair price for knowing that a single broker failure cannot silently drop a payment event.

KRaft — ZooKeeper is finally gone

Before Kafka 4.0, every Kafka cluster depended on ZooKeeper — a separate distributed coordination service — to manage broker metadata, leader elections, and cluster configuration. This meant running and maintaining a separate system alongside Kafka, with its own failure modes and operational complexity.

In March 2025, Apache Kafka 4.0 shipped. ZooKeeper is gone. KRaft — Kafka’s built-in Raft-based consensus protocol — now manages all cluster metadata directly on a quorum of controller nodes. Simpler deployments, faster failover, fewer moving parts.

If you are setting up a new Kafka cluster today, there is no ZooKeeper to configure. If you are still running an older version with ZooKeeper, the migration path to KRaft is well-documented, and the upgrade is worth the effort. Any tutorial or architecture diagram that still shows ZooKeeper as a required component is outdated.

When NOT to use Kafka

This is the part that saves you from over-engineering.

Don’t use Kafka as a simple job queue. If you need tasks processed once by one worker — send an email, resize an image, process a webhook — RabbitMQ or Amazon SQS is simpler and operationally lighter. Kafka’s complexity is only justified when you need replay, multiple independent consumer groups, or throughput above what a traditional queue can handle.

Don’t use Kafka for request-reply patterns. Kafka is designed for one-way event streaming. If you need a response to a specific message, you are working against the grain. Use a proper RPC mechanism.

Don’t get your partition count wrong at the start. You can increase partitions later, but you cannot decrease them without recreating the topic. And increasing partitions changes the hash-to-partition mapping, which breaks the ordering guarantee for any keys that shift partitions. Plan your partition count upfront based on your expected throughput and consumer count.

Don’t skip idempotent producers. Without enable.idempotence=true, network retries can produce duplicate messages. Kafka acknowledges a write, the network drops the response, your producer retries, and the message lands twice. In a payment system, that means double charges. Enable idempotence from day one:

props.put("enable.idempotence", "true");
props.put("acks", "all");
props.put("retries", Integer.MAX_VALUE);
props.put("max.in.flight.requests.per.connection", "5");

This is not optional in fintech. It is the baseline. Kafka is durable not despite writing to disk, but because of it. The commit log is not a compromise — it is the design. Once that clicks, every other Kafka decision makes sense.

How OAuth2 Login Works in Spring Security — The Authorization Code Flow Explained

Piyush Kumar Singh — Thu, 18 Jun 2026 15:55:32 +0000

You were redirected to Google. You clicked Allow. You came back logged in. Between those three seconds, six things happened — and most backend developers who have built that button have never traced all of them.

I know because I was one of them. When I first wired up OAuth2 login, it worked on the first attempt. I was relieved. Then a teammate asked me to explain what the state parameter was for, and I didn't have an answer. The feature worked. The understanding wasn’t there.

This article is that explanation — not the config, but the actual flow. What fires, in what order, and where it sits inside Spring Security’s filter chain. If you have read Part 1 of this series on Spring Security internals and Part 2 on JWT with OncePerRequestFilter, this picks up exactly where those left off.

OAuth2 is not what most developers think it is

Before tracing the flow, one distinction is worth getting right.

OAuth2 is an authorization framework, not an authentication protocol. It was designed to let applications access resources on behalf of users — not to verify who a user actually is. When you click “Login with Google,” OAuth2 handles the delegation part: your app is asking Google for permission to act on your behalf. But OAuth2 alone cannot tell your app who you are.

That is what OpenID Connect (OIDC) does. OIDC is a thin identity layer built on top of OAuth2. It adds the id_token — a JWT that carries the user's identity information. When you use "Login with Google" in a Spring Boot app, you are using both: OAuth2 for the delegation mechanism, OIDC for the identity proof.

Most developers conflate these two because the Spring Security config handles both transparently. Understanding the distinction matters when things go wrong — because the errors from each layer are very different.

The 4 roles — mapped to something you already know

Every OAuth2 explanation starts with four roles. Most explanations define them abstractly, which is why they don’t stick. Map them to Login with Google, and they’re immediately obvious.

Resource Owner — you, the user. You own the Google account and the data inside it.

Client — your Spring Boot application. It wants access to the user’s basic profile on your behalf.

Authorization Server — Google’s accounts infrastructure (accounts.google.com). It authenticates the user, shows the consent screen, and issues tokens.

Resource Server — Google’s userinfo API (googleapis.com). It holds the protected data — the email, name, and profile picture your app actually wants.

One thing worth noting: in the Login with Google scenario, Google plays both the Authorization Server and the Resource Server. That is common with large identity providers. In your own systems — if you are building an API protected by OAuth2—your Spring Boot application becomes the Resource Server, and a separate authorization server (Keycloak, Okta, Auth0) handles the tokens.

The Authorization Code Flow—What Happened During That Redirect

There are several OAuth2 flows. The one you use for web applications — and the only one worth knowing in 2026 — is the Authorization Code Flow. The others (Implicit, Resource Owner Password) are either deprecated or meant for machine-to-machine scenarios.

Here is each step:

Step 1 — User clicks Login with Google.
Your Spring Boot app constructs a URL pointing to Google’s authorization endpoint, including your client_id, the redirect_uri it wants Google to return to, the scope (what access you're requesting — typically openid profile email), a state parameter (more on this shortly), and response_type=code. The user's browser is redirected to that URL.

Step 2 — Google shows the consent screen.
Google authenticates the user (if not already logged in) and shows them what your app is requesting access to. The user clicks Allow.

Step 3 — Google redirects back with a code.
Google redirects the user’s browser to your redirect_uri with an authorization code appended to the URL. This code is short-lived — typically valid for about 60 seconds — and can only be used once.

Notice what Google did not send: an access token. The code travels through the browser, which means it passes through browser history, HTTP logs, and referrer headers. Sending the actual token here would be a security hole. The code is intentionally useless on its own.

Step 4 — Your app exchanges the code for tokens.
This step happens entirely server-to-server. Your Spring Boot backend takes the code and calls Google’s token endpoint directly — not through the browser. It sends the code along with your client_secret. Google verifies both and issues the tokens.

This server-to-server exchange is the entire security model of the Authorization Code Flow. The client_secret never touches the browser. The actual token never travels through a URL. The browser only ever saw a short-lived code.

Step 5 — Google returns an access token and an ID token.
The access_token is what you use to call Google's APIs on the user's behalf. The id_token is an OIDC JWT — it contains the user's identity: their Google ID, email, name, and profile picture. If you read Part 2 of this series, this JWT follows the same structure — header, payload, signature — that OncePerRequestFilter was validating.

Step 6 — Your app reads the id token, loads the user, and populates the SecurityContext.
Spring Security decodes theid_token, extracts the user's details, and stores an authentication object in SecurityContextHolder. From here, the rest of your application sees a fully authenticated user — through @AuthenticationPrincipal, SecurityContextHolder.getContext(), or a principal in your controllers. The same as username-password login.

Where OAuth2 plugs into Spring Security’s filter chain

This is the part no other OAuth2 tutorial can show you — because they did not write Part 1 first.

In Part 1, you traced how FilterChainProxy selects a SecurityFilterChain, how authentication filters extract credentials, how ProviderManager delegates to an AuthenticationProvider, and how everything ends up in SecurityContextHolder.

OAuth2 does not replace that chain. It extends it with two new filters.

OAuth2AuthorizationRequestRedirectFilter — this fires when a user hits /oauth2/authorization/google (or whichever provider you configured). It builds the full authorization URL — including generating and storing the state parameter — and redirects the user's browser to Google. This is what runs in Step 1 and 2 of the flow above.

OAuth2LoginAuthenticationFilter — this fires when Google redirects back to your /login/oauth2/code/google callback URL. It reads the authorization code from the request, verifies the state parameter, calls Google's token endpoint to exchange the code (Step 4), receives the tokens (Step 5), and creates an OAuth2LoginAuthenticationToken.

That token then goes to ProviderManager — the same ProviderManager you already know from Part 1. ProviderManager loops through its registered providers, finds OidcAuthorizationCodeAuthenticationProvider, and calls it.

OidcAuthorizationCodeAuthenticationProvider validates the id_token JWT signature, checks its expiry and claims, and calls OAuth2UserService to load or create your application's user. Then it returns a fully authenticated Authentication object.

That object goes into SecurityContextHolder. Same destination as username-password login. Different path to get there.

OAuth2UserService — where your app takes over
Spring Security handles everything up to this point automatically. OAuth2UserService is where your application steps in.

After validating the id_token, Spring Security calls OAuth2UserService.loadUser() to turn Google's user attributes into something your application understands. The default implementation, DefaultOAuth2UserService, fetches the user's profile from Google's userinfo endpoint and returns an OAuth2User with those attributes.

In most production systems, you need more than that. Google gives you a name and email. Your database has a User entity with an internal ID, roles, preferences, and account status. You override DefaultOAuth2UserService to bridge that gap:

@Service
public class CustomOAuth2UserService extends DefaultOAuth2UserService {

    @Autowired
    private UserRepository userRepository;
    @Override
    public OAuth2User loadUser(OAuth2UserRequest userRequest) throws OAuth2AuthenticationException {
        OAuth2User oAuth2User = super.loadUser(userRequest);
        String email = oAuth2User.getAttribute("email");
        // load or create the user in your own DB
        User user = userRepository.findByEmail(email)
            .orElseGet(() -> createNewUser(oAuth2User));
        return new DefaultOAuth2User(
            Collections.singleton(new SimpleGrantedAuthority(user.getRole())),
            oAuth2User.getAttributes(),
            "email"
        );
    }
}

This is not an OAuth2 concept — it is a Spring Security hook. But it is where most “Login with Google” implementations actually live. The config wires up the flow. This class decides what your application does with the identity Google hands back.

The state parameter — the detail almost everyone skips

Every authorization request your app sends to Google includes a state parameter — a randomly generated value that Spring Security creates and stores in the session. When Google redirects back to your app, it returns that same state value unchanged.

Your app verifies that the returned state matches the one it stored before doing anything else.

If you skip this check, an attacker can craft a malicious link that starts a login flow and redirects to your callback with their own authorization code. The victim clicks the link, your app completes the exchange, and the victim is now logged into the attacker’s account — with full access to whatever your app shows an authenticated user.

This is a CSRF attack against OAuth2. Spring Security handles state generation and validation automatically. You do not need to write this logic. But if you ever build an OAuth2 flow manually — or review one built without a framework — missing state validation is the first thing to check.

When NOT to use OAuth2

Do not use the Implicit Flow. It was removed from OAuth2.1. It sent tokens directly in the URL, which means tokens in browser history, server logs, and referrer headers. There is no legitimate reason to use it today.

Do not store access tokens in localStorage. XSS vulnerabilities can read localStorage trivially. Use HttpOnly cookies, or keep tokens server-side in the session, where JavaScript cannot touch them.

Do not use OAuth2 when you do not need it. If your application has its own user database and no external identity provider, JWT authentication — which you traced in Part 2 — is simpler. You own the entire flow, you are not dependent on a third party’s uptime, and there is no redirect dance to debug. OAuth2 earns its complexity only when the third-party identity or the delegated access to third-party resources is genuinely valuable.

Do not forget token expiry. Access tokens are short-lived by design. If you store a token and reuse it without checking expiry, your users will get silent authentication failures at the worst possible moment. Build refresh token logic from the start, not as an afterthought.

https://dev.to/piyushsingh_dev/spring-ai-tool-calling-explained-how-to-give-your-llm-real-superpowers-3g04

Piyush Kumar Singh — Sat, 06 Jun 2026 14:53:18 +0000

Piyush Kumar Singh

Jun 6

Spring AI Tool Calling Explained | How to Give Your LLM Real Superpowers

#ai #springboot #llm #rag

6 min read

Spring AI Tool Calling Explained | How to Give Your LLM Real Superpowers

Piyush Kumar Singh — Sat, 06 Jun 2026 14:51:01 +0000

Ask ChatGPT what Apple’s stock price is right now. It’ll either tell you it doesn’t have real-time data or confidently give you a number that’s months old. That’s not a bug in the model. It’s a fundamental limitation of how LLMs work. They’re trained on data up to a certain date, sealed, and shipped. They can’t call your database. They can’t check the weather. They can’t look up a customer’s order status. They know nothing about the world after their training cutoff — and nothing about your world at all.

Tool calling is how you fix that. And it’s one of the core building blocks of what people are calling AI agents — systems where an LLM can take action, not just generate text.

What tool calling actually is

Before diving into Spring AI, let’s get the concept straight in plain language.

Imagine you hire an extremely smart consultant. They know a lot — strategy, frameworks, history, writing. But they don’t have access to your company’s internal systems. They can’t log into your CRM and check whether a customer paid last month.

So you make an arrangement: whenever they need internal data to answer a question, they hand you a sticky note saying, “I need order #4521’s status.” You go look it up. You bring them the answer. They use it to finish their response.

That sticky note exchange is tool calling. You may also see it called function calling — same concept, different name depending on which LLM provider’s docs you’re reading.

The LLM doesn’t execute code. It doesn’t call APIs. It doesn’t touch your database directly. It just says, “I need this specific piece of information from this specific function.” Your application handles the actual execution and returns the result. The LLM then uses that result to write its final response.

That distinction — the model asks, your app executes — is the most important thing to understand about tool calling. And most tutorials skip right past it.
The 5-step loop — what happens inside every tool call

Spring AI Tool call

The diagram above shows the full cycle. Here’s each step in plain terms:

Step 1 — User asks a question. “What is Apple’s stock price right now?”

Step 2 — The LLM reads the question and your tool list. When you register tools with Spring AI, each one has a name and a description. The LLM reads those descriptions and makes a decision: can I answer this from my training data, or do I need to call a tool?

Step 3 — The LLM returns a tool request, not an answer. This is the part people don’t expect. Instead of answering your question, the model sends back something like: “Call the getStockPrice function with argument AAPL.” Just a tool name and arguments. No answer yet.

Step 4 — Your Spring app executes the method. Spring AI picks up the tool request, finds the matching @Tool method in your code, calls it, and gets the real data — from your API, your database, whatever you wired in.

Step 5 — The result goes back to the LLM. Spring AI sends the function’s return value back into the conversation context. The LLM reads it and now writes its actual response: “Apple is trading at $213.45 as of this morning.”

The user sees a real, grounded answer. No hallucination. No outdated training data. The model provided the intelligence, your application provided the data.

The @Tool annotation: registering tools in Spring AI

In Spring AI, turning a regular Java method into a tool the LLM can call takes one annotation:

@Component
public class StockPriceTool {

@Tool(description = "Get the current stock price for a given ticker symbol")
    public String getStockPrice(
            @ToolParam(description = "The stock ticker symbol, e.g. AAPL, TSLA")
            String ticker) {
        // call a real stock API here
        return financialApiClient.getCurrentPrice(ticker);
    }
}

Two things here that most tutorials don’t explain:

The description on @Tool is not documentation for you — it’s instructions for the LLM. The model reads this text to decide when to call your function. A vague description like “gets stock data” will confuse the model. A precise one like “Get the current stock price for a given ticker symbol” tells the model exactly when this tool applies. Treat it like a prompt, not a comment.

The same goes for @ToolParam. The LLM reads your parameter descriptions to understand what values it should pass. If your description is unclear, the model will pass the wrong arguments. Bad input = bad output.

Registering your tools with ChatClient

Once your tool class is ready, you wire it into ChatClient through your configuration:

@Configuration
public class AiConfig {

    @Autowired
    private StockPriceTool stockPriceTool;

    @Bean
    public ChatClient chatClient(ChatClient.Builder builder) {
        return builder
            .defaultTools(stockPriceTool)
            .build();
    }
}

Spring AI scans the bean, finds every method annotated with @Tool, and registers them automatically. No verbose schema definitions, no manual type mappings — just your annotated method and a single line in your configuration.

What the LLM reads and why description quality is everything

Think about what happens from the model’s perspective. Before answering any question, it sees something like this:

Available tools:
- getStockPrice: Get the current stock price for a given ticker symbol
- getWeatherForecast: Get 3-day weather forecast for a city
- getUserOrderHistory: Get the last 10 orders for a given customer ID

That list of descriptions is the only information the model has about what your tools can do. It uses those descriptions to reason about which tool — if any — applies to the current question.

If someone asks, “Is it going to rain in Mumbai tomorrow?” — the model reads the descriptions, matches them to getWeatherForecast, and calls it.

If someone asks, “Should I carry an umbrella to the client meeting?” — the model has to infer that “client meeting” implies weather concern, decide Mumbai is a reasonable assumption if that’s the user’s location, and call the right tool accordingly.

All of that reasoning happens against your description strings. Write them well, and the model becomes genuinely useful. Write them carelessly, and the model either calls the wrong tool, passes wrong arguments, or skips tools it should use.

When the LLM chains tools and the security

Once you register multiple tools, something interesting happens: the LLM can call them in sequence. It might call getCustomerProfile first, get the customer’s account tier, then call getOrderHistory with that tier to filter the results, then call calculateDiscount based on the history.

This is the foundation of what people call AI agents — LLMs that don’t just answer questions but complete multi-step tasks by composing tools. Spring AI’s approach to this connects naturally with MCP (Model Context Protocol), a standard for exposing tools to LLMs in a consistent, discoverable way — but that’s a deeper topic for another article.

Here’s the security consideration that almost every tutorial skips: the LLM cannot call your database directly. It cannot make HTTP requests. It cannot run arbitrary code. Every tool execution goes through your Spring code — which means every tool execution can be authenticated, rate-limited, logged, and validated exactly like any other function call in your system.

That’s not a limitation. That’s the design. You can safely expose powerful capabilities to the LLM without handing over the keys to your infrastructure.

When NOT to use the tool calling

Don’t use a tool calling for questions the LLM can answer from its training. Every tool call adds a round trip — your app waits for the LLM to respond, executes the function, and sends a second request back. That’s real latency. If someone asks, “explain what a P/E ratio is”, there’s no need to call a financial API. Reserve tool calls for data the model genuinely can’t have.

Watch out for circular tool chains. If Tool A can trigger Tool B, which can trigger Tool A again under certain conditions, you can end up in a loop. Spring AI doesn’t automatically break cycles. Define clear, narrow tool responsibilities to avoid this.

Description drift is a real maintenance problem. When your underlying function changes — new parameters, changed logic, different return format — you need to update your @Tool description to match. The LLM reasons from the description, not the code. Stale descriptions lead to bad arguments and broken calls.

Latency stacks with every hop. One tool call adds roughly 500ms–1s. Three tool calls in sequence can add 2–3 seconds. For user-facing chatbots, that’s noticeable. Design your tools to be composable but not deeply chained unless the use case genuinely requires it.

Spring AI 2.x — a note if you’re upgrading

If you’ve followed a tool-calling tutorial written before mid-2025 and the code isn’t working, this is likely why.

The old approach used FunctionCallback and verbose bean registration, long configuration chains, manual input/output type definitions, and FunctionCallbackWrapper setup. Spring AI has moved away from all of that.

The old .tools(“toolName”) syntax no longer works reliably in Spring AI 1.0 GA. If your tools aren’t being picked up, check that you’re passing the bean directly to defaultTools(myToolBean), not a string name.

The new annotation-based approach shown in this article is the current recommended path. It’s cleaner, requires no manual schema definitions, and fits naturally into how Spring Boot beans already work. If you’re migrating from an older version, the change is mostly mechanical: replace FunctionCallback registrations with @Tool annotated methods and update your ChatClient configuration.

Spring AI Explained — ChatClient, RAG, Advisors, and Every Core Component

Piyush Kumar Singh — Mon, 18 May 2026 04:30:23 +0000

Most Spring AI tutorials jump straight to code. You copy the dependency, paste the config, call ChatClient, and something works. But when you need to actually build something — a chatbot that remembers conversations, an API that answers questions from your own documents — you hit a wall. Because you don't know what's actually doing what. Friend’s Link

What Spring AI actually is — in one sentence

Spring AI is an abstraction layer that lets you wire LLMs into your Spring Boot app without hardcoding any particular AI provider.

That last part matters. OpenAI, Google Gemini, Anthropic Claude, and Ollama are running locally on your machine — Spring AI talks to all of them through the same API. Swap providers without touching your business logic. That’s the entire value proposition, and everything else is built on top of it.

Spring AI Components

ChatClient — the front door
ChatClient is the component you'll interact with the most. It's the fluent API that sits at the top of the stack and handles the actual request-response cycle with the LLM.

Think of it like a RestTemplate or a WebClient— but instead of calling a REST endpoint, you're sending a prompt and getting a response back. It handles all the low-level connection details, request formatting, and response parsing so you don't have to.

What makes ChatClient genuinely well-designed is its fluent builder style. You don't configure it once globally and hope for the best. Each call is composable — you can set the system prompt, attach advisors, pass user input, and control the output format all in one readable chain.

It also separates two things that often get conflated: the default configuration you set at startup (your system prompt, default advisors, model parameters) and the per-request configuration you apply at call time. That separation matters in production, where different endpoints need different behaviours from the same underlying client.

PromptTemplate — how you talk to the LLM properly

A raw string shoved into an LLM is not a prompt. A prompt is a structured piece of text with placeholders, context, and instructions — and this PromptTemplate is how Spring AI handles that.

The idea is simple: you define a template with variables, and at runtime, you fill those variables in. Instead of building prompt strings with Java string concatenation — which gets messy fast — you define the shape of the prompt separately from the data that goes into it.

This matters for three reasons. First, it keeps prompts readable and maintainable. Second, it separates the “what to ask” from the “what data to inject” which is the same separation concerns you apply everywhere else in your codebase. Third, it makes prompt versioning possible. When your prompt needs tweaking, you’re editing a template, not hunting through business logic.

PromptTemplate also gives you a proper Prompt object that carries both the human message and the system message. That distinction — system prompt (the instructions) vs user prompt (the question) — is one of the most important things to understand when working with LLMs, and Spring AI models it explicitly.

EmbeddingModel — the piece that makes search smart

An EmbeddingModel takes text and converts it into a vector — a list of floating point numbers that represents the meaning of that text in multi-dimensional space.

That sounds abstract. Here’s the concrete thing to grasp: two pieces of text that mean similar things will produce vectors that are close to each other mathematically. “What’s your return policy?” and “How do I get a refund?” are different strings, but their vectors will be very close — because semantically, they’re the same question.

This is what makes semantic search possible. Traditional search matches keywords. Embedding-based search matches meaning. A user asking “how do I cancel my order” will find a document titled “Order cancellation policy” even if the words don’t overlap, because the meanings are geometrically close in vector space.

In Spring AI, EmbeddingModel is the interface that abstracts over whatever embedding service you're using — OpenAI's text-embedding-ada-002, Gemini's embedding API, or a local model via Ollama. The abstraction is consistent regardless of provider, which means your RAG pipeline doesn't break if you switch models.

VectorStore — where embeddings live

VectorStore is the database for embeddings. You put vectors in, and you query them by similarity — "give me the top 5 stored vectors that are closest to this query vector."

It’s worth understanding that this is a fundamentally different kind of database from what you’re used to. You don’t query it with SQL. You don’t look things up by ID. You ask: which stored content is most semantically similar to this input? And it returns the matches ranked by similarity score.

Spring AI’s VectorStore interface abstracts over the actual storage engine underneath. In development, you might use SimpleVectorStore an in-memory implementation. In production, you'd swap to Pinecone, Weaviate, pgvector on top of Postgres, or Elasticsearch. The interface stays identical. Your code doesn't change.

The VectorStore is also responsible for handling the metadata that travels alongside each vector — the document title, page number, source URL, whatever you stored at ingestion time. When it returns matching chunks, that metadata comes with it, so your prompt builder knows where the information came from.

Advisors — the middleware nobody talks about enough

This is the component most tutorials skip, and it’s arguably the most powerful part of the whole framework.

An Advisor in Spring AI is a piece of middleware that wraps around every ChatClient request. Before the request goes to the LLM, advisors can intercept it and modify it — add context, inject memory, apply safety rules, log the conversation, filter the input. After the response comes back, they can post-process it too.

The important thing to understand is that advisors form a chain. Each one wraps the next, like servlet filters in a web application. You configure which advisors run in which order, and each one has a defined responsibility.

QuestionAnswerAdvisor is the one you'll use for RAG. Before your question reaches the LLM, this advisor takes that question, queries VectorStore for the most relevant chunks, and injects them into the prompt automatically. From ChatClient's perspective, you just asked a question. Internally, your question has been enriched with your own data before the LLM ever sees it.

MessageChatMemoryAdvisor is what makes conversations persistent. Without it, every call to ChatClient starts fresh — no memory of what was said before. With it, previous turns from ChatMemory are injected into each new request so the LLM has context.

You can write your own advisors too. Any cross-cutting concern that applies to every LLM call — rate limiting, PII detection, response caching, A/B testing between prompts — belongs in an advisor, not in your business logic.

ChatMemory — giving the LLM a memory

LLMs are stateless. Every API call is completely independent. Ask an LLM “what’s the capital of France,” then ask “what did I just ask you,” and it has no idea — because, from its perspective, that second request is the first thing you’ve ever said.

ChatMemory is how Spring AI solves this. It's a storage abstraction for conversation history. After each exchange, the message — both the user's question and the LLM's response — gets saved. On the next request, that history gets loaded and injected into the prompt so the LLM has context.

InMemoryChatMemory is the default — history lives in your application's heap and disappears on restart. That's fine for development and short stateless sessions. For production chatbots that need to remember users across sessions, you'd implement a persistent ChatMemory backed by Redis or a database.

There’s a real constraint here worth knowing upfront: every message you inject into the conversation history costs tokens. LLMs have a context window limit — usually somewhere between 8K and 128K tokens, depending on the model. If a conversation goes on long enough, the accumulated history will either exceed the limit and fail, or you’ll need to implement a summarisation strategy to compress older messages.

This is not a Spring AI problem — it’s a fundamental LLM constraint. But ChatMemory is where you manage it.

RAG Flow

How RAG brings it all together
RAG — Retrieval-Augmented Generation — is the pattern that makes Spring AI genuinely production-useful. The diagram above shows both phases. Here’s the thinking behind it.

The core problem: your LLM knows nothing about your company. It doesn’t know your product documentation, your internal policies, your customer data. Fine-tuning a model on your data is expensive, slow, and goes stale every time the data changes.

RAG is the pragmatic answer. Instead of teaching the model your data, you just hand it the relevant pages at the moment it needs them. Like giving a contractor a specific clause from the contract rather than asking them to memorise the whole thing.

The ingestion phase runs once, or whenever your data changes. Your documents are loaded, split into manageable chunks, embedded into vectors, and stored in a VectorStore. This is how your data gets indexed for semantic retrieval.

The query phase runs on every request. The user’s question is embedded into a vector. That vector is used to query the VectorStore for the closest matching chunks. Those chunks — plus the original question — get injected into the prompt. The LLM reads them as context and answers based on what it finds there.

The LLM never “learned” your data. It reads it fresh on each request, like an open-book exam. That framing matters because it sets the right expectations: if the relevant information isn’t in the retrieved chunks, the model will still try to answer — and that’s when hallucinations happen. RAG reduces hallucinations by providing grounding. It doesn’t eliminate them.

The part that controls retrieval quality isn’t the LLM and isn’t the vector database — it’s the chunking strategy. How you split your documents determines what gets retrieved. A chunk that’s too large buries the relevant detail in noise. A chunk too small loses the surrounding context that makes it meaningful. Getting chunking right is usually where the real tuning work happens.

The one-line mental model for each component

ChatClient — you talk to the LLM through this. PromptTemplate — You structure what you say. EmbeddingModel — converts meaning into math. VectorStore — stores and searches that math. Advisors — middleware that enriches every request automatically. ChatMemory — gives the conversation a past. Together, they’re the full stack for building LLM features that actually behave like software — predictable, configurable, and debuggable.

https://dev.to/piyush_kumarsingh_da3833/how-redis-actually-works-ram-single-thread-and-the-expiry-behavior-nobody-explains-2j4n

Piyush Kumar Singh — Sun, 03 May 2026 16:48:10 +0000

How Redis Actually Works — RAM, Single Thread, and the Expiry Behavior Nobody Explains - DEV Community

A RAM read takes about 100 nanoseconds. A disk read — even on a modern SSD — takes around 100,000...

dev.to

How Redis Actually Works — RAM, Single Thread, and the Expiry Behavior Nobody Explains

Piyush Kumar Singh — Sun, 03 May 2026 16:45:01 +0000

A RAM read takes about 100 nanoseconds. A disk read — even on a modern SSD — takes around 100,000 nanoseconds. That single gap explains most of Redis’s speed, before it does a single thing clever. Friend’s Link

But RAM alone isn’t the full story. The other half is a design decision that looks like a limitation on paper — and turns out to be one of the smartest choices in the codebase. More on that in a moment. Here’s what’s actually happening inside Redis when your app talks to it.

Why is Redis so fast?

The first reason is obvious once you hear it: Redis keeps everything in RAM. Your PostgreSQL instance, however well-tuned, writes to disk. Redis doesn’t. Every key lives in memory, which is why a GET on a Redis key can return in under a millisecond even under load. There’s no disk seek, no page cache miss, no I/O wait. But here’s where most explanations stop — and they shouldn’t.

Single-threaded — and that’s the point

Redis processes one command at a time—one thread. No parallelism, no concurrency. That sounds like a bottleneck. It’s actually a feature.

In a multi-threaded system, shared state requires locks. Locks mean threads waiting on each other. Waiting introduces latency spikes that are hard to reproduce and harder to debug. Redis avoids the entire problem by never having two threads compete for the same data.

# These three clients connect simultaneously
Client 1: SET counter 100     ← executes fully first
Client 2: INCR counter        ← executes next, sees 100, returns 101
Client 3: GET counter         ← executes last, returns 101

The order is deterministic. Always. You can reason about it. With threads and locks, you can’t—not without careful synchronization, which adds complexity and latency.

Redis is fast, not just because of RAM, but because it never waits on itself.

The six data structures — with the tradeoffs that actually matter

Redis isn’t just strings. Each data structure solves a specific problem, and knowing when to pick one over another is more useful than knowing the commands.

String

SET user:1001:name "John"
GET user:1001:name        # "John"
SET page:views 0
INCR page:views           # atomic - safe under concurrent load
GET page:views            # "1"

The default choice for simple values, flags, and counters. INCR is atomic — a thousand clients calling it simultaneously will never produce a wrong count.

Hash

HSET user:1001 name "John" email "j@example.com" role "admin"
HGET user:1001 name       # "John"
HGETALL user:1001         # all fields

A Hash is better than a String when you have a structured object with multiple fields you’ll update independently. If you stored this as a JSON string, updating a single field means deserializing the whole blob, changing one value, and reserializing. A Hash lets you update one field with one command.

List

RPUSH notifications:1001 "Order shipped"
RPUSH notifications:1001 "Payment received"
LRANGE notifications:1001 0 -1   # all items in order
LPOP notifications:1001           # "Order shipped"

Lists maintain insertion order. Use them for notification feeds, activity timelines, and simple job queues where you’re okay with at-most-once delivery. If you need guaranteed delivery, a List isn’t enough — use Kafka or RabbitMQ.

Sorted Set

ZADD leaderboard 9500 "alice"
ZADD leaderboard 11200 "John"
ZREVRANGE leaderboard 0 2 WITHSCORES
# John    11200
# alice   9500

Every member has a score. Redis keeps them sorted automatically. Real-time leaderboards, priority queues, and rate limiting windows — sorted sets handle all three. The reason to reach for this over a List is when rank or score matters, not just insertion order.

Set

SADD online:users "user:1001" "user:1002"
SISMEMBER online:users "user:1002"   # 1 (true)
SCARD online:users                   # 2

No duplicates, O(1) membership check. Good for tracking online users, visited pages, or anything where “is X in this group” is the question. Use a Set over a List when you need uniqueness and don’t care about order.

HyperLogLog

PFADD page:views:home "ip1" "ip2" "ip3" "ip1"
PFCOUNT page:views:home   # 3 (deduplicated)

HyperLogLog gives you approximate unique counts using a fixed 12KB of memory — regardless of whether you have 1,000 or 100 million unique values. A plain Set would work too, but each unique member consumes memory. For a site with 50 million daily visitors, the Set version could eat gigabytes. HyperLogLog stays at 12KB with a ~0.81% error margin. That tradeoff is almost always worth it for analytics.

TTL and the expiry behavior nobody explains

SET session:abc123 "user_data" EX 3600   # expires in 1 hour
TTL session:abc123                        # 3597
# one hour later
GET session:abc123                        # (nil)

Most developers assume Redis runs a background job that scans for expired keys and deletes them at the exact moment of expiry. It doesn’t. That would be expensive — imagine scanning millions of keys every second. Instead, Redis uses two strategies in parallel:

Lazy deletion: When you read a key, Redis checks its expiry first. If it’s expired, Redis deletes it right then and returns nil. Memory is reclaimed at access time, not expiry time.

Active sampling: Every 100ms, Redis randomly picks 20 keys that have TTLs set. If more than 25% of them are expired, it runs the loop again immediately. It keeps looping until the expired ratio drops below 25%.

The consequence: if you have 10 million keys expiring at 3 am and nothing reads them, the active sampler will gradually clean them up over the following minutes. Your memory won’t drop instantly. If you’re sizing Redis memory around key expiry, that lag is real, and you need to account for it.

Persistence — what survives a restart

Redis lives in RAM. Restart the process, lose everything — unless you’ve configured persistence.

RDB snapshots

# redis.conf
save 900 1       # snapshot if 1+ keys changed in 15 minutes
save 300 10      # snapshot if 10+ keys changed in 5 minutes
save 60 10000    # snapshot if 10000+ keys changed in 1 minute

Redis forks a child process and writes everything to dump.rdb. Fast to recover from. The risk: if your server crashes between snapshots, you lose whatever happened in that window. Fine for cache. Not fine for anything where losing recent writes matters.

AOF — Append Only File

# redis.conf
appendonly yes
appendfsync everysec   # flush to disk every second

Every write command gets appended to a log. On restart, Redis replays the log. With every second, you lose at most one second of data. With always, you lose nothing, but your write throughput drops noticeably.

Production setup — use both

save 900 1
appendonly yes
appendfsync everysec

RDB handles fast restarts. AOF handles durability. Together, they cover both failure modes without adding much overhead.

Spring Boot integration — with the why behind the config

Dependencies and connection

<!-- pom.xml -->
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-redis</artifactId>
</dependency>
# application.yml
spring:
  redis:
    host: localhost
    port: 6379
    timeout: 2000ms

RedisTemplate — why you need a custom serializer
By default, Spring uses Java serialization for values. That works, but it stores class names alongside data, making keys unreadable and tying you to your class structure. Switch to JSON serialization so your data is readable outside Spring too:

@Bean
public RedisTemplate<String, Object> redisTemplate(RedisConnectionFactory factory) {
    RedisTemplate<String, Object> template = new RedisTemplate<>();
    template.setConnectionFactory(factory);
    template.setKeySerializer(new StringRedisSerializer());
    // Jackson JSON instead of Java serialization
    template.setValueSerializer(new GenericJackson2JsonRedisSerializer());
    return template;
}
@Cacheable — the two-line cache layer
Spring’s caching abstraction lets you add Redis caching without touching repository logic. The first call hits the database; every subsequent call with the same ID returns from Redis:

@Cacheable(value = "users", key = "#id")
public User getUserById(Long id) {
    return userRepository.findById(id)
        .orElseThrow(() -> new RuntimeException("User not found"));
}

@CacheEvict(value = "users", key = "#user.id")
public User updateUser(User user) {
    return userRepository.save(user);   // evicts stale cache on update
}
@SpringBootApplication
@EnableCaching   // don't forget this
public class Application { ... }

Rate limiting with sorted sets
A sliding window rate limiter is one of Redis’s cleanest use cases. The sorted set score is the timestamp — so you can count requests within a time window with a range query:

public boolean isAllowed(String userId, int maxRequests, long windowSeconds) {
    String key = "ratelimit:" + userId;
    long now = System.currentTimeMillis();
    long windowStart = now - (windowSeconds * 1000);
    ZSetOperations<String, String> ops = redisTemplate.opsForZSet();
    ops.removeRangeByScore(key, 0, windowStart);   // drop old requests
    Long count = ops.zCard(key);
    if (count != null && count >= maxRequests) return false;
    ops.add(key, String.valueOf(now), now);
    redisTemplate.expire(key, windowSeconds, TimeUnit.SECONDS);
    return true;
}

When your tech lead says “add Redis,” — ask this first

**
There’s a version of this story everyone knows: the tech lead says “add Redis,” you add Redis, and something gets faster. Nobody questions it. But Redis has real constraints, and using it wrong is a common way to create problems that look like infrastructure issues.

Don’t use it as your primary database. Redis has no foreign keys, no joins, no complex queries. If your data has relationships, use a relational database. Redis is the layer on top, not the foundation.

Don’t store large values. Redis works well with small, hot data. A 5MB JSON blob in Redis is possible and wasteful — you’re burning expensive RAM, hurting the event loop for every other client, and making serialization your bottleneck.

Don’t use pub/sub for anything you can’t afford to lose. Redis pub/sub has no persistence. If a subscriber goes offline for 30 seconds, those messages are gone. Use Kafka or RabbitMQ when reliability matters.

Set a memory limit and eviction policy — always. Without it, Redis will reject writes when it runs out of memory, and that failure mode is jarring in production:

# redis.conf
maxmemory 2gb
maxmemory-policy allkeys-lru   # evict least recently used keys when full
**

## ```


The one line that ties it all together
**
Redis is fast because it stays in RAM and never waits for itself. Use it for caching, sessions, leaderboards, rate limiting, and lightweight pub/sub, where dropped messages are acceptable. Don’t ask it to be your source of truth. Understand those two constraints, and Redis stops being magic — it becomes a predictable tool that does exactly what you’d expect. That’s a good thing.

How Spring Security Works Internally (Filters, Authentication & Authorization Explained)

Piyush Kumar Singh — Sat, 02 May 2026 16:57:49 +0000

If you have worked with Spring Boot for a while, you have used Spring Security without fully tracing what happens inside it.

You add a dependency, configure a SecurityFilterChain, and wire a UserDetailsService, and your APIs are suddenly protected. It works. But under the hood, there is a very disciplined flow that decides who the user is, whether the password is valid, and whether the request should even reach your controller.

Once that internal flow clicks, Spring Security stops feeling magical and starts feeling predictable.

The Big Picture

Every incoming HTTP request does not go straight to your controller. Before that request reaches DispatcherServlet, it passes through the servlet filter chain. Spring Security plugs itself into that chain and intercepts the request early.

That matters because security decisions should happen before business logic runs. The flow looks like this:

That is the full security journey in one line: intercept, authenticate, authorize, then continue.

Step 1: The Request Enters the Servlet Filter Chain
When a client sends a request, Tomcat receives it first. Tomcat then passes it through a chain of servlet filters.

These filters are not specific to Spring Security. They are part of the servlet infrastructure. Any framework can register filters here. Spring Security registers one important filter called FilterChainProxy.

You can think of FilterChainProxy as the front desk for all Spring Security logic. It does not do all the security work itself. Instead, it decides which internal security filters should handle the request.

Step 2: FilterChainProxy Picks the Right SecurityFilterChain
This is a key part that many developers miss. Spring Security does not always use one universal chain for every request. It can maintain multiple SecurityFilterChain configurations, each tied to different URL patterns or request matchers.

For example:

/api/** may use JWT authentication
/admin/** may require stricter role checks
/login may use a form login
FilterChainProxy checks the request and selects the correct chain using RequestMatcher. That means Spring Security is not just a collection of filters. It is a smart router for security filters.

Step 3: Authentication Filter Extracts Credentials
Once the correct security chain is selected, one of the authentication filters takes over. In the classic username-password login flow, that filter is usually UsernamePasswordAuthenticationFilter.

Its job is simple:

Read the username and password from the request
Create an unauthenticated Authentication object
Pass that object to AuthenticationManager
At this point, the user is not yet trusted. The filter has only collected credentials. Verification still has to happen. This distinction is important. Extracting credentials and validating credentials are two separate responsibilities.

Step 4: AuthenticationManager Coordinates Authentication
AuthenticationManager is the entry point for authentication logic. In most applications, the default implementation is ProviderManager.

ProviderManager does not usually authenticate the user directly. Instead, it delegates to one of the configured AuthenticationProvider implementations. That design makes Spring Security flexible. Different providers can handle different authentication mechanisms:

username and password
JWT token
OAuth2 login
LDAP
custom authentication rules
When ProviderManager receives an authentication object, it loops through the registered providers and calls supports() on each one.

The first provider that says, “Yes, I know how to handle this type of authentication,” gets the job. Then authenticate() is called on that provider.

Step 5: AuthenticationProvider Verifies the User
This is where the real authentication work happens. For username-password login, the provider is often DaoAuthenticationProvider.

Its job usually includes two things:

Load the user from a data source
Validate the submitted password
To load the user, it calls UserDetailsService.

To validate the password, it uses PasswordEncoder.

This split is one of the reasons Spring Security is so clean internally. Fetching user data and checking password hashing are handled by dedicated components, not mixed into one giant class.

Step 6: UserDetailsService Loads the User From the Database
UserDetailsService is a very small but important contract.

Its core method is:

loadUserByUsername(String username)
This method is responsible for fetching the user from your database, external system, or custom source. It returns a UserDetails object that contains:

username
password
roles or authorities
account status flags, such as locked or disabled
If the user is not found, Spring Security throws an exception, and authentication fails. At this stage, the system now knows what the stored user record looks like. Next, it needs to compare the password.

Step 7: PasswordEncoder Validates the Password
Spring Security does not compare raw passwords directly. Instead, it uses a PasswordEncoder such as BCryptPasswordEncoder.

Here is the idea:

The password submitted by the client is plain text
The stored password in the database is hashed
PasswordEncoder.matches(rawPassword, encodedPassword) checks if they match safely
If the password is wrong, authentication fails.

If it matches, Spring Security creates a fully authenticated Authentication object containing the user’s identity and authorities. That object is now trusted.

Step 8: SecurityContextHolder Stores the Authenticated User
Once authentication succeeds, Spring Security stores the authenticated user in SecurityContextHolder.

This is what makes the user available for the rest of the request lifecycle.

From here, other parts of the application can access the logged-in user through:

SecurityContextHolder.getContext().getAuthentication()
@AuthenticationPrincipal
Principal in controllers
In a regular servlet application, this security context is usually stored per request thread. That is why the controller can later know who the current user is without manually passing user details around.

Step 9: Authorization Happens Before the Controller
Authentication answers this question: Who are you?

Authorization answers this one: What are you allowed to do?

After the user is authenticated, Spring Security moves to authorization filters such as AuthorizationFilter.

This stage checks whether the current user has the required role, authority, or permission for the requested resource.

Examples:

hasRole(“ADMIN”)
hasAuthority(“PAYMENT_READ”)
Request matchers that restrict endpoints
If authorization fails, Spring Security stops the request and returns an error such as 403 Forbidden. If authorization succeeds, the request continues.

Step 10: The Request Reaches DispatcherServlet and Then the Controller
Only after authentication and authorization are complete does the request proceed to Spring MVC. Now, DispatcherServlet can route the request to the correct controller.

At this point, your controller can safely assume one of two things:

The endpoint is public, or
The user has already been authenticated and authorized
That separation is why controllers stay cleaner. Security is handled earlier in the pipeline instead of being scattered across business logic.

What Happens If Authentication Fails?
If authentication fails anywhere in the chain, Spring Security throws an authentication-related exception.

Common outcomes include:

401 Unauthorized for unauthenticated access
Redirect to the login page in form-based login
custom error response in REST APIs
The controller is never called.

This is a useful mental model: failed authentication stops the request before business logic begins.

What Makes Spring Security Feel Complex?
Usually, it is not the concepts. It is the number of moving parts.

There are many classes involved:

FilterChainProxy
SecurityFilterChain
UsernamePasswordAuthenticationFilter
AuthenticationManager
ProviderManager
AuthenticationProvider
UserDetailsService
PasswordEncoder
SecurityContextHolder
AuthorizationFilter At first glance, that looks like a lot.

But if you group them by responsibility, it becomes manageable:

_Filters _handle request interception
_Manager _and providers handle authentication delegation
_UserDetailsService _and PasswordEncoder validate identity
_SecurityContextHolder _stores the authenticated user
_Authorization _filters enforce access rules
That is really the whole story.

A Simple Way to Remember the Flow

Use this line:

Request comes in -> filter intercepts -> credentials extracted -> manager delegates -> provider authenticates -> context stores user -> authorization checks access -> controller runs

If you remember that sentence, you already understand the internals better than many developers who use Spring Security every day.

Final Takeaway

Spring Security works like a layered checkpoint system. It intercepts the request before your application code sees it, verifies identity using providers and encoders, stores the authenticated user in a security context, checks permissions, and only then allows the request to hit the controller. Once you understand that flow, the framework feels a lot less intimidating.