DEV Community

realNameHidden
realNameHidden

Posted on

# 🚀 Building a Local AI App with Spring Boot and Ollama Using WebClient

🧩 Introduction: Why Spring Boot + LLM = Game Changer

Imagine having your own ChatGPT-like AI — but running completely on your laptop, no API keys, no internet, and no cost.
Sounds futuristic? With Ollama and Spring Boot, it’s reality.

In today’s AI-driven world, developers are racing to embed LLM (Large Language Model) capabilities directly into their backend systems. Whether it’s generating code, summarizing text, or building chatbots — you can now do all this locally using Ollama and Spring Boot’s WebClient.

This guide walks you through building a production-grade REST API that connects Spring Boot to an Ollama LLM using WebClient, the modern replacement for RestTemplate.


⚙️ What You’ll Build

We’ll create a Spring Boot app with:

  • A REST endpoint /api/ask
  • A POST request that sends a prompt to Ollama (running locally)
  • WebClient-based, non-blocking reactive communication
  • JSON response from the AI model

🧠 Why Use WebClient Instead of RestTemplate?

Many older tutorials use RestTemplate, but it’s now deprecated in Spring 6+.
Here’s why WebClient is the smarter choice for LLM integrations:

Feature RestTemplate WebClient
Nature Blocking Non-blocking (Reactive)
Performance Thread per request Reactive I/O, lightweight
LLM Streaming Hard to implement Native Flux/Mono streaming
Modern Support Deprecated Actively maintained
Reactive Integration ✅ Perfect for LLMs

Since LLM responses (like from Ollama or GPT) can take several seconds, you want a non-blocking client that doesn’t freeze your app — and that’s exactly what WebClient provides.


🛠️ Step 1: Prerequisites

Before we dive in, ensure you have:

  • Ollama installed locallyDownload Ollama
  • ✅ Model pulled (example: Llama 3)
  ollama pull llama3
  ollama serve
Enter fullscreen mode Exit fullscreen mode
  • Java 17+
  • Spring Boot 3.3+

📦 Step 2: Create a New Spring Boot Project

If you’re using Spring Initializr:
👉 https://start.spring.io

Select:

  • Dependencies: Spring WebFlux, Lombok
  • Group: com.example
  • Artifact: ollama-integration

Click Generate, unzip, and open in IntelliJ or VS Code.


🧩 Step 3: Add Dependencies

In pom.xml:

<dependencies>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-webflux</artifactId>
    </dependency>
    <dependency>
             <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-webflux</artifactId>
</dependency>

    <dependency>
        <groupId>org.projectlombok</groupId>
        <artifactId>lombok</artifactId>
        <optional>true</optional>
    </dependency>

</dependencies>
Enter fullscreen mode Exit fullscreen mode

🧱 Step 4: Create Model Classes

AskRequest.java

package com.example.ollama.model;

import lombok.Data;

@Data
public class AskRequest {
    private String prompt;
}
Enter fullscreen mode Exit fullscreen mode

AskResponse.java

package com.example.ollama.model;

import lombok.AllArgsConstructor;
import lombok.Data;

@Data
@AllArgsConstructor
public class AskResponse {
    private String response;
}
Enter fullscreen mode Exit fullscreen mode

🔧 Step 5: Configure WebClient Bean

WebClientConfig.java

package com.example.ollama.config;

import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.web.reactive.function.client.WebClient;

@Configuration
public class WebClientConfig {

    @Bean
    public WebClient webClient(WebClient.Builder builder) {
        return builder
                .baseUrl("http://localhost:11434/api")
                .build();
    }
}
Enter fullscreen mode Exit fullscreen mode

🧠 Step 6: Service Layer to Connect with Ollama

OllamaService.java

package com.example.ollama.service;

import com.example.ollama.model.AskRequest;
import com.example.ollama.model.AskResponse;
import lombok.RequiredArgsConstructor;
import org.springframework.stereotype.Service;
import org.springframework.web.reactive.function.client.WebClient;
import reactor.core.publisher.Mono;
import reactor.util.retry.Retry;

import java.time.Duration;
import java.util.Map;

@Service
@RequiredArgsConstructor
public class OllamaService {

    private final WebClient webClient;

    public Mono<AskResponse> askModel(AskRequest request) {
        Map<String, Object> body = Map.of(
                "model", "llama3",
                "prompt", request.getPrompt(),
                "stream", false
        );

        return webClient.post()
                .uri("/generate")
                .bodyValue(body)
                .retrieve()
                .bodyToMono(Map.class)
                .timeout(Duration.ofSeconds(30))
                .retryWhen(Retry.fixedDelay(2, Duration.ofSeconds(3)))
                .map(resp -> new AskResponse((String) resp.get("response")))
                .onErrorResume(e -> Mono.just(new AskResponse("Error: " + e.getMessage())));
    }
}
Enter fullscreen mode Exit fullscreen mode

What’s happening here:

  • Sends JSON request to http://localhost:11434/api/generate
  • Non-blocking async call
  • Handles timeouts and retries gracefully
  • Maps Ollama’s response into a custom AskResponse class

🧭 Step 7: Controller Layer

OllamaController.java

package com.example.ollama.controller;

import com.example.ollama.model.AskRequest;
import com.example.ollama.model.AskResponse;
import com.example.ollama.service.OllamaService;
import lombok.RequiredArgsConstructor;
import org.springframework.web.bind.annotation.*;
import reactor.core.publisher.Mono;

@RestController
@RequestMapping("/api/ask")
@RequiredArgsConstructor
public class OllamaController {

    private final OllamaService ollamaService;

    @PostMapping
    public Mono<AskResponse> ask(@RequestBody AskRequest request) {
        return ollamaService.askModel(request);
    }
}
Enter fullscreen mode Exit fullscreen mode

⚡ Step 8: Run and Test

Start your app:

mvn spring-boot:run
Enter fullscreen mode Exit fullscreen mode

Make sure Ollama is running:

ollama serve
Enter fullscreen mode Exit fullscreen mode

Then open another terminal and run:

curl -X POST http://localhost:8080/api/ask \
     -H "Content-Type: application/json" \
     -d '{"prompt":"Explain polymorphism in Java"}'
Enter fullscreen mode Exit fullscreen mode

✅ You’ll get a response like:

{
  "response": "Polymorphism in Java allows objects to take multiple forms..."
}
Enter fullscreen mode Exit fullscreen mode

🧠 Step 9 (Optional): Handling Streaming Responses

If you want real-time token streaming like ChatGPT:

webClient.post()
    .uri("/generate")
    .bodyValue(body)
    .retrieve()
    .bodyToFlux(String.class)
    .subscribe(System.out::print);
Enter fullscreen mode Exit fullscreen mode

This allows you to stream tokens as Ollama generates them — perfect for chat UIs.


🧾 Final Project Structure

src/
 ├─ config/
 │   └─ WebClientConfig.java
 ├─ controller/
 │   └─ OllamaController.java
 ├─ model/
 │   ├─ AskRequest.java
 │   └─ AskResponse.java
 ├─ service/
 │   └─ OllamaService.java
 └─ OllamaIntegrationApplication.java
Enter fullscreen mode Exit fullscreen mode

💡 Real-World Use Cases

You can extend this project to:

  • 🧑‍💻 Build a developer assistant that explains code.
  • 🧾 Create a document summarizer for PDFs.
  • 💬 Power a customer support chatbot — 100% offline.
  • 🧠 Integrate AI-based reasoning into enterprise APIs.

🏁 Conclusion

With Spring Boot’s WebClient and Ollama, you’ve just built a local AI microservice that’s:

  • ✅ Fast and reactive
  • ✅ Secure and private
  • ✅ Zero-cost (no API keys or tokens!)

This modern approach keeps your data inside your system while leveraging the power of local LLMs like Llama 3, Mistral, or Phi-3.

💬 “If Spring Boot is the heart of modern backend development, LLMs are the new brain.”

Now you have both — running right on your laptop.

Top comments (0)