π§© Introduction: Why Spring Boot + LLM = Game Changer
Imagine having your own ChatGPT-like AI β but running completely on your laptop, no API keys, no internet, and no cost.
Sounds futuristic? With Ollama and Spring Boot, itβs reality.
In todayβs AI-driven world, developers are racing to embed LLM (Large Language Model) capabilities directly into their backend systems. Whether itβs generating code, summarizing text, or building chatbots β you can now do all this locally using Ollama and Spring Bootβs WebClient.
This guide walks you through building a production-grade REST API that connects Spring Boot to an Ollama LLM using WebClient, the modern replacement for RestTemplate.
βοΈ What Youβll Build
Weβll create a Spring Boot app with:
- A REST endpoint
/api/ask - A POST request that sends a prompt to Ollama (running locally)
- WebClient-based, non-blocking reactive communication
- JSON response from the AI model
π§ Why Use WebClient Instead of RestTemplate?
Many older tutorials use RestTemplate, but itβs now deprecated in Spring 6+.
Hereβs why WebClient is the smarter choice for LLM integrations:
| Feature | RestTemplate | WebClient |
|---|---|---|
| Nature | Blocking | Non-blocking (Reactive) |
| Performance | Thread per request | Reactive I/O, lightweight |
| LLM Streaming | Hard to implement | Native Flux/Mono streaming |
| Modern Support | Deprecated | Actively maintained |
| Reactive Integration | β | β Perfect for LLMs |
Since LLM responses (like from Ollama or GPT) can take several seconds, you want a non-blocking client that doesnβt freeze your app β and thatβs exactly what WebClient provides.
π οΈ Step 1: Prerequisites
Before we dive in, ensure you have:
- β Ollama installed locally β Download Ollama
- β Model pulled (example: Llama 3)
ollama pull llama3
ollama serve
- β Java 17+
- β Spring Boot 3.3+
π¦ Step 2: Create a New Spring Boot Project
If youβre using Spring Initializr:
π https://start.spring.io
Select:
-
Dependencies:
Spring WebFlux,Lombok -
Group:
com.example -
Artifact:
ollama-integration
Click Generate, unzip, and open in IntelliJ or VS Code.
π§© Step 3: Add Dependencies
In pom.xml:
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-webflux</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-webflux</artifactId>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<optional>true</optional>
</dependency>
</dependencies>
π§± Step 4: Create Model Classes
AskRequest.java
package com.example.ollama.model;
import lombok.Data;
@Data
public class AskRequest {
private String prompt;
}
AskResponse.java
package com.example.ollama.model;
import lombok.AllArgsConstructor;
import lombok.Data;
@Data
@AllArgsConstructor
public class AskResponse {
private String response;
}
π§ Step 5: Configure WebClient Bean
WebClientConfig.java
package com.example.ollama.config;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.web.reactive.function.client.WebClient;
@Configuration
public class WebClientConfig {
@Bean
public WebClient webClient(WebClient.Builder builder) {
return builder
.baseUrl("http://localhost:11434/api")
.build();
}
}
π§ Step 6: Service Layer to Connect with Ollama
OllamaService.java
package com.example.ollama.service;
import com.example.ollama.model.AskRequest;
import com.example.ollama.model.AskResponse;
import lombok.RequiredArgsConstructor;
import org.springframework.stereotype.Service;
import org.springframework.web.reactive.function.client.WebClient;
import reactor.core.publisher.Mono;
import reactor.util.retry.Retry;
import java.time.Duration;
import java.util.Map;
@Service
@RequiredArgsConstructor
public class OllamaService {
private final WebClient webClient;
public Mono<AskResponse> askModel(AskRequest request) {
Map<String, Object> body = Map.of(
"model", "llama3",
"prompt", request.getPrompt(),
"stream", false
);
return webClient.post()
.uri("/generate")
.bodyValue(body)
.retrieve()
.bodyToMono(Map.class)
.timeout(Duration.ofSeconds(30))
.retryWhen(Retry.fixedDelay(2, Duration.ofSeconds(3)))
.map(resp -> new AskResponse((String) resp.get("response")))
.onErrorResume(e -> Mono.just(new AskResponse("Error: " + e.getMessage())));
}
}
Whatβs happening here:
- Sends JSON request to
http://localhost:11434/api/generate - Non-blocking async call
- Handles timeouts and retries gracefully
- Maps Ollamaβs response into a custom
AskResponseclass
π§ Step 7: Controller Layer
OllamaController.java
package com.example.ollama.controller;
import com.example.ollama.model.AskRequest;
import com.example.ollama.model.AskResponse;
import com.example.ollama.service.OllamaService;
import lombok.RequiredArgsConstructor;
import org.springframework.web.bind.annotation.*;
import reactor.core.publisher.Mono;
@RestController
@RequestMapping("/api/ask")
@RequiredArgsConstructor
public class OllamaController {
private final OllamaService ollamaService;
@PostMapping
public Mono<AskResponse> ask(@RequestBody AskRequest request) {
return ollamaService.askModel(request);
}
}
β‘ Step 8: Run and Test
Start your app:
mvn spring-boot:run
Make sure Ollama is running:
ollama serve
Then open another terminal and run:
curl -X POST http://localhost:8080/api/ask \
-H "Content-Type: application/json" \
-d '{"prompt":"Explain polymorphism in Java"}'
β
Youβll get a response like:
{
"response": "Polymorphism in Java allows objects to take multiple forms..."
}
π§ Step 9 (Optional): Handling Streaming Responses
If you want real-time token streaming like ChatGPT:
webClient.post()
.uri("/generate")
.bodyValue(body)
.retrieve()
.bodyToFlux(String.class)
.subscribe(System.out::print);
This allows you to stream tokens as Ollama generates them β perfect for chat UIs.
π§Ύ Final Project Structure
src/
ββ config/
β ββ WebClientConfig.java
ββ controller/
β ββ OllamaController.java
ββ model/
β ββ AskRequest.java
β ββ AskResponse.java
ββ service/
β ββ OllamaService.java
ββ OllamaIntegrationApplication.java
π‘ Real-World Use Cases
You can extend this project to:
- π§βπ» Build a developer assistant that explains code.
- π§Ύ Create a document summarizer for PDFs.
- π¬ Power a customer support chatbot β 100% offline.
- π§ Integrate AI-based reasoning into enterprise APIs.
π Conclusion
With Spring Bootβs WebClient and Ollama, youβve just built a local AI microservice thatβs:
- β Fast and reactive
- β Secure and private
- β Zero-cost (no API keys or tokens!)
This modern approach keeps your data inside your system while leveraging the power of local LLMs like Llama 3, Mistral, or Phi-3.
π¬ βIf Spring Boot is the heart of modern backend development, LLMs are the new brain.β
Now you have both β running right on your laptop.
Top comments (0)