🧩 Introduction: Why Spring Boot + LLM = Game Changer
Imagine having your own ChatGPT-like AI — but running completely on your laptop, no API keys, no internet, and no cost.
Sounds futuristic? With Ollama and Spring Boot, it’s reality.
In today’s AI-driven world, developers are racing to embed LLM (Large Language Model) capabilities directly into their backend systems. Whether it’s generating code, summarizing text, or building chatbots — you can now do all this locally using Ollama and Spring Boot’s WebClient.
This guide walks you through building a production-grade REST API that connects Spring Boot to an Ollama LLM using WebClient, the modern replacement for RestTemplate.
⚙️ What You’ll Build
We’ll create a Spring Boot app with:
- A REST endpoint
/api/ask - A POST request that sends a prompt to Ollama (running locally)
- WebClient-based, non-blocking reactive communication
- JSON response from the AI model
🧠 Why Use WebClient Instead of RestTemplate?
Many older tutorials use RestTemplate, but it’s now deprecated in Spring 6+.
Here’s why WebClient is the smarter choice for LLM integrations:
| Feature | RestTemplate | WebClient |
|---|---|---|
| Nature | Blocking | Non-blocking (Reactive) |
| Performance | Thread per request | Reactive I/O, lightweight |
| LLM Streaming | Hard to implement | Native Flux/Mono streaming |
| Modern Support | Deprecated | Actively maintained |
| Reactive Integration | ❌ | ✅ Perfect for LLMs |
Since LLM responses (like from Ollama or GPT) can take several seconds, you want a non-blocking client that doesn’t freeze your app — and that’s exactly what WebClient provides.
🛠️ Step 1: Prerequisites
Before we dive in, ensure you have:
- ✅ Ollama installed locally → Download Ollama
- ✅ Model pulled (example: Llama 3)
ollama pull llama3
ollama serve
- ✅ Java 17+
- ✅ Spring Boot 3.3+
📦 Step 2: Create a New Spring Boot Project
If you’re using Spring Initializr:
👉 https://start.spring.io
Select:
-
Dependencies:
Spring WebFlux,Lombok -
Group:
com.example -
Artifact:
ollama-integration
Click Generate, unzip, and open in IntelliJ or VS Code.
🧩 Step 3: Add Dependencies
In pom.xml:
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-webflux</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-webflux</artifactId>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<optional>true</optional>
</dependency>
</dependencies>
🧱 Step 4: Create Model Classes
AskRequest.java
package com.example.ollama.model;
import lombok.Data;
@Data
public class AskRequest {
private String prompt;
}
AskResponse.java
package com.example.ollama.model;
import lombok.AllArgsConstructor;
import lombok.Data;
@Data
@AllArgsConstructor
public class AskResponse {
private String response;
}
🔧 Step 5: Configure WebClient Bean
WebClientConfig.java
package com.example.ollama.config;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.web.reactive.function.client.WebClient;
@Configuration
public class WebClientConfig {
@Bean
public WebClient webClient(WebClient.Builder builder) {
return builder
.baseUrl("http://localhost:11434/api")
.build();
}
}
🧠 Step 6: Service Layer to Connect with Ollama
OllamaService.java
package com.example.ollama.service;
import com.example.ollama.model.AskRequest;
import com.example.ollama.model.AskResponse;
import lombok.RequiredArgsConstructor;
import org.springframework.stereotype.Service;
import org.springframework.web.reactive.function.client.WebClient;
import reactor.core.publisher.Mono;
import reactor.util.retry.Retry;
import java.time.Duration;
import java.util.Map;
@Service
@RequiredArgsConstructor
public class OllamaService {
private final WebClient webClient;
public Mono<AskResponse> askModel(AskRequest request) {
Map<String, Object> body = Map.of(
"model", "llama3",
"prompt", request.getPrompt(),
"stream", false
);
return webClient.post()
.uri("/generate")
.bodyValue(body)
.retrieve()
.bodyToMono(Map.class)
.timeout(Duration.ofSeconds(30))
.retryWhen(Retry.fixedDelay(2, Duration.ofSeconds(3)))
.map(resp -> new AskResponse((String) resp.get("response")))
.onErrorResume(e -> Mono.just(new AskResponse("Error: " + e.getMessage())));
}
}
What’s happening here:
- Sends JSON request to
http://localhost:11434/api/generate - Non-blocking async call
- Handles timeouts and retries gracefully
- Maps Ollama’s response into a custom
AskResponseclass
🧭 Step 7: Controller Layer
OllamaController.java
package com.example.ollama.controller;
import com.example.ollama.model.AskRequest;
import com.example.ollama.model.AskResponse;
import com.example.ollama.service.OllamaService;
import lombok.RequiredArgsConstructor;
import org.springframework.web.bind.annotation.*;
import reactor.core.publisher.Mono;
@RestController
@RequestMapping("/api/ask")
@RequiredArgsConstructor
public class OllamaController {
private final OllamaService ollamaService;
@PostMapping
public Mono<AskResponse> ask(@RequestBody AskRequest request) {
return ollamaService.askModel(request);
}
}
⚡ Step 8: Run and Test
Start your app:
mvn spring-boot:run
Make sure Ollama is running:
ollama serve
Then open another terminal and run:
curl -X POST http://localhost:8080/api/ask \
-H "Content-Type: application/json" \
-d '{"prompt":"Explain polymorphism in Java"}'
✅ You’ll get a response like:
{
"response": "Polymorphism in Java allows objects to take multiple forms..."
}
🧠 Step 9 (Optional): Handling Streaming Responses
If you want real-time token streaming like ChatGPT:
webClient.post()
.uri("/generate")
.bodyValue(body)
.retrieve()
.bodyToFlux(String.class)
.subscribe(System.out::print);
This allows you to stream tokens as Ollama generates them — perfect for chat UIs.
🧾 Final Project Structure
src/
├─ config/
│ └─ WebClientConfig.java
├─ controller/
│ └─ OllamaController.java
├─ model/
│ ├─ AskRequest.java
│ └─ AskResponse.java
├─ service/
│ └─ OllamaService.java
└─ OllamaIntegrationApplication.java
💡 Real-World Use Cases
You can extend this project to:
- 🧑💻 Build a developer assistant that explains code.
- 🧾 Create a document summarizer for PDFs.
- 💬 Power a customer support chatbot — 100% offline.
- 🧠 Integrate AI-based reasoning into enterprise APIs.
🏁 Conclusion
With Spring Boot’s WebClient and Ollama, you’ve just built a local AI microservice that’s:
- ✅ Fast and reactive
- ✅ Secure and private
- ✅ Zero-cost (no API keys or tokens!)
This modern approach keeps your data inside your system while leveraging the power of local LLMs like Llama 3, Mistral, or Phi-3.
💬 “If Spring Boot is the heart of modern backend development, LLMs are the new brain.”
Now you have both — running right on your laptop.
Top comments (0)