Six months ago, I was the guy manually stitching together RestTemplate calls to OpenAI's API, writing my own retry logic, parsing raw JSON responses, and wondering why nobody had made this less painful.
Then Spring AI 2.0 dropped and I felt genuinely stupid for not waiting.
This is the exact chatbot I built — from a blank Spring Boot project to a deployed, production-ready AI assistant. No fluff, no "exercise left to the reader." Real code, real mistakes, real gotchas.
What is Spring AI and Why Does It Matter in 2026?
Spring AI is Spring's official abstraction layer for working with AI models. Think of it the way Spring Data abstracted away JDBC boilerplate — Spring AI does the same for LLM integrations.
Version 2.0 (released early 2026) brought:
- Unified
ChatClientAPI across OpenAI, Anthropic, Ollama, Azure OpenAI, and more - Built-in streaming support
- First-class RAG (Retrieval-Augmented Generation) support with vector store integrations
- Advisors API for intercepting and augmenting prompts
For Java developers, this is huge. You no longer need to learn a Python framework or maintain fragile HTTP wrapper classes. This is proper, idiomatic Spring.
Project Setup
I used Spring Boot 3.4.x with Java 21. Here's the core pom.xml section you need:
<properties>
<java.version>21</java.version>
<spring-ai.version>2.0.0</spring-ai.version>
</properties>
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-bom</artifactId>
<version>${spring-ai.version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependencies>
<!-- Spring Boot Web -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<!-- Spring AI - OpenAI -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-openai-spring-boot-starter</artifactId>
</dependency>
<!-- Optional: for streaming responses -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-webflux</artifactId>
</dependency>
</dependencies>
Add your API key to application.properties:
spring.ai.openai.api-key=${OPENAI_API_KEY}
spring.ai.openai.chat.options.model=gpt-4o
spring.ai.openai.chat.options.temperature=0.7
spring.ai.openai.chat.options.max-tokens=1024
Never hardcode the API key. I know that sounds obvious but I've seen it in 3 different GitHub repos this year.
Architecture Overview
Before jumping to code, here's the shape of what we're building:
┌─────────────────────────────────────────────────┐
│ CLIENT (Browser/App) │
└───────────────────┬─────────────────────────────┘
│ HTTP POST /api/chat
▼
┌─────────────────────────────────────────────────┐
│ ChatController (REST Layer) │
│ Validates input, calls service │
└───────────────────┬─────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────┐
│ ChatService (Business Logic) │
│ Builds prompt, injects system context, │
│ handles errors, logs usage │
└───────────────────┬─────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────┐
│ Spring AI ChatClient │
│ Abstracts the LLM provider (OpenAI here) │
└───────────────────┬─────────────────────────────┘
│
▼
OpenAI API (gpt-4o)
Simple. Clean. Swappable — if tomorrow you want to run Ollama locally, you change one dependency and one properties line.
Building the Chatbot — Step by Step
The Service Layer
This is where the real logic lives. I kept the controller thin and put everything meaningful here:
@Service
@Slf4j
public class ChatService {
private final ChatClient chatClient;
private static final String SYSTEM_PROMPT = """
You are a helpful assistant for a Java developer support portal.
Be concise, technical, and accurate.
If you don't know something, say so — don't hallucinate.
Always format code snippets using markdown.
""";
public ChatService(ChatClient.Builder builder) {
this.chatClient = builder
.defaultSystem(SYSTEM_PROMPT)
.build();
}
public String chat(String userMessage) {
log.info("Sending message to AI. Length: {} chars", userMessage.length());
try {
String response = chatClient.prompt()
.user(userMessage)
.call()
.content();
log.info("Response received. Length: {} chars", response.length());
return response;
} catch (Exception ex) {
log.error("AI call failed: {}", ex.getMessage(), ex);
throw new ChatServiceException("Unable to process your request. Please try again.", ex);
}
}
}
The ChatClient.Builder is auto-configured by Spring AI — you just inject it. The defaultSystem() call sets a persistent system prompt for every conversation. This is your personality config for the bot.
The Controller
@RestController
@RequestMapping("/api/chat")
@Validated
public class ChatController {
private final ChatService chatService;
public ChatController(ChatService chatService) {
this.chatService = chatService;
}
@PostMapping
public ResponseEntity<ChatResponse> chat(@RequestBody @Valid ChatRequest request) {
String reply = chatService.chat(request.message());
return ResponseEntity.ok(new ChatResponse(reply));
}
public record ChatRequest(
@NotBlank(message = "Message cannot be blank")
@Size(max = 2000, message = "Message too long")
String message
) {}
public record ChatResponse(String reply) {}
}
Records for DTOs. Validation annotations on input. No boilerplate. This is what Java 21 + Spring Boot 3 feels like when it's working well.
Adding Conversation Memory
The above is stateless — every call is a fresh context. For a real chatbot, you need memory. Spring AI handles this with the MessageChatMemoryAdvisor:
@Service
@Slf4j
public class StatefulChatService {
private final ChatClient chatClient;
private final InMemoryChatMemory chatMemory;
public StatefulChatService(ChatClient.Builder builder) {
this.chatMemory = new InMemoryChatMemory();
this.chatClient = builder
.defaultSystem("You are a helpful Java developer assistant.")
.defaultAdvisors(new MessageChatMemoryAdvisor(chatMemory))
.build();
}
public String chat(String conversationId, String userMessage) {
return chatClient.prompt()
.user(userMessage)
.advisors(advisor -> advisor.param(
AbstractChatMemoryAdvisor.CHAT_MEMORY_CONVERSATION_ID_KEY,
conversationId
))
.call()
.content();
}
}
Pass a conversationId (like a session ID or user ID) and Spring AI automatically maintains the message history for that conversation. For production, swap InMemoryChatMemory with a Redis or database-backed implementation.
Production Considerations
This is the part most tutorials skip. Here's what bit me in staging:
Rate Limiting
OpenAI has TPM (tokens per minute) limits. Under load, you will hit them. Add a simple token bucket or use Resilience4j:
@Bean
public RateLimiter aiRateLimiter(RateLimiterRegistry registry) {
return registry.rateLimiter("openai", RateLimiterConfig.custom()
.limitForPeriod(50)
.limitRefreshPeriod(Duration.ofMinutes(1))
.timeoutDuration(Duration.ofSeconds(5))
.build());
}
Wrap your chatService.chat() call with RateLimiter.decorateSupplier(...).
Timeouts
OpenAI's API can be slow — especially under load. Set explicit timeouts:
spring.ai.openai.chat.options.timeout=30s
And handle TimeoutException separately from other errors so you can return a user-friendly "the AI is taking too long, try again" message instead of a generic 500.
Logging Token Usage
You'll want to track costs. Use Spring AI's response metadata:
ChatResponse response = chatClient.prompt()
.user(userMessage)
.call()
.chatResponse();
Usage usage = response.getMetadata().getUsage();
log.info("Tokens used — prompt: {}, completion: {}, total: {}",
usage.getPromptTokens(),
usage.getGenerationTokens(),
usage.getTotalTokens());
Ship these metrics to your observability stack. Untracked AI costs will surprise you at billing time.
💡 Pro Tip: Use Spring AI's
Advisorpattern to implement cross-cutting concerns like content filtering, PII detection, and audit logging — without polluting your service layer. Think of Advisors like AOP for your AI calls.
Deployment Tips
I deployed this on a basic OpenShift container (which is our standard at work). A few things worth noting:
Environment variables over properties files. Set OPENAI_API_KEY as a secret, reference it with ${OPENAI_API_KEY} in properties. Never bake the key into your image.
Health checks matter. Add a custom health indicator that pings the AI provider on startup:
@Component
public class AiHealthIndicator implements HealthIndicator {
private final ChatClient chatClient;
@Override
public Health health() {
try {
chatClient.prompt().user("ping").call().content();
return Health.up().withDetail("provider", "OpenAI").build();
} catch (Exception e) {
return Health.down().withDetail("error", e.getMessage()).build();
}
}
}
Horizontal scaling is fine — since AI state lives in the LLM provider, your app is stateless (unless you're using in-memory chat history, in which case: pin sessions or use Redis).
Mistakes I Made (So You Don't Have To)
1. Forgetting to set max-tokens.
Without a limit, the model will generate until it's done. Long responses = high cost = slow responses. Always set max-tokens.
2. Putting too much in the system prompt.
I crammed business logic into a 500-word system prompt. It made outputs inconsistent. Keep system prompts focused and short — under 200 words ideally.
3. Not handling empty/null responses.
Spring AI's .content() can return null if the model returns a stop reason other than "stop" (like content filtering). Add a null check.
4. Skipping integration tests.
Unit tests with mocked ChatClient are fine, but you need at least a few integration tests that hit the real API. Model behavior changes between versions. I found a regression this way before it hit production.
5. Using blocking calls in a WebFlux context.
If you're mixing Spring AI's blocking call() API with reactive code, you'll get thread starvation. Either go fully reactive (use .stream() and return Flux<String>) or keep it on a dedicated thread pool.
Key Takeaways
- Spring AI 2.0 makes LLM integration feel like writing any other Spring service — idiomatic, testable, and provider-agnostic
- The
ChatClientbuilder pattern is your main API surface — learn it well - Memory management (
MessageChatMemoryAdvisor) is built-in — don't hand-roll it - Production readiness means: rate limiting, timeouts, token tracking, and proper health checks
- The Advisors API is underrated — use it for cross-cutting AI concerns
What's Next
I'm currently adding RAG to this project — embedding our internal documentation into a vector store (PGVector on PostgreSQL) and letting the chatbot answer questions grounded in actual docs rather than hallucinating.
After that: streaming responses with SSE so the UI feels instant instead of waiting for the full completion.
If this was useful, follow me here on DEV — I'll be posting the RAG follow-up next week.
Have you integrated AI into a Spring Boot project? What was the hardest part — the setup, the prompting, or convincing your team it was production-safe? Drop it in the comments. 👇
Top comments (0)