DEV Community

Cover image for Building a Local-First AI Assistant with Spring Boot 4 and Spring AI 2.0
Sujan Lamichhane
Sujan Lamichhane

Posted on

Building a Local-First AI Assistant with Spring Boot 4 and Spring AI 2.0

Your AI. Your Data. Your Machine.

For the last few years, AI development has been dominated by Python.

When developers talk about AI frameworks, the conversation usually revolves around LangChain, LlamaIndex, AutoGPT, CrewAI, and other Python-first ecosystems.

As a Java developer, I kept asking myself:

Where is the equivalent ecosystem for Java?

The answer is that it already exists.

With Spring AI, Spring Boot 4, WebFlux, PostgreSQL, and Ollama, it is now possible to build serious AI applications entirely in Java.

That realization led me to build Jarvis AI Platform.

GitHub Repository:

https://github.com/sujankim/jarvis-ai-platform
Enter fullscreen mode Exit fullscreen mode

The Problem With Most AI Assistants

Most AI assistants follow the same architecture:

Your Message
      ↓
 Cloud Service
      ↓
  AI Model
      ↓
  Response
Enter fullscreen mode Exit fullscreen mode

Your conversations travel through someone else's infrastructure.

You depend on their uptime.

You depend on their pricing.

You depend on their privacy policies.

If the service changes tomorrow, you're affected immediately.

That model works for many people.

But I wanted something different.


A Local-First Alternative

Jarvis follows a completely different approach:

Your Message
      ↓
 Your Machine
      ↓
    Ollama
      ↓
  AI Model
      ↓
  Response
Enter fullscreen mode Exit fullscreen mode

Everything stays on your computer.

No data leaves your machine.

No monthly subscription.

No external dependency for core functionality.

That's why the project's philosophy is simple:

Your AI. Your Data. Your Machine.


What Is Jarvis AI Platform?

Jarvis is not just a chatbot.

It is a modular AI orchestration platform designed around the Java ecosystem.

At a high level, the architecture looks like this:

Spring Shell CLI / REST API
              │
      Spring Boot 4
              │
      AI Orchestration
              │
    +---------+---------+
    │                   │
OllamaProvider   GeminiProvider
 (Primary)        (Fallback)
    │
 PostgreSQL
(Sessions & Messages)
Enter fullscreen mode Exit fullscreen mode

The goal is to make AI providers interchangeable while keeping the application architecture clean and maintainable.

Current features in v0.1.0 include:

  • Interactive AI chat with token streaming
  • JWT authentication
  • Argon2id password hashing
  • Session persistence
  • PostgreSQL storage
  • Ollama local AI support
  • Gemini fallback support
  • Provider abstraction layer
  • Working memory system
  • Swagger/OpenAPI integration
  • Health monitoring and diagnostics

Tech Stack

Layer Technology
Language Java 21
Framework Spring Boot 4.0.6
AI Spring AI 2.0
Web Spring WebFlux
Security Spring Security 7
Authentication JWT
Password Hashing Argon2id
Database PostgreSQL 16
Database Access R2DBC
Migrations Flyway
CLI Spring Shell 4
Local AI Ollama
Cloud AI Gemini
Mapping MapStruct 1.6

Why I Chose Java Instead of Python

One question I hear often is:

"Why didn't you build this in Python?"

The short answer:

Because I enjoy building systems in Java.

The longer answer is that Java provides several advantages for long-term AI applications:

  • Strong type safety
  • Excellent tooling
  • Mature ecosystem
  • Production-ready frameworks
  • Reactive programming support
  • Enterprise-grade security

Spring AI is making AI development feel like a natural extension of the Spring ecosystem.

Instead of learning an entirely new stack, Java developers can use tools they already know.

That was one of the biggest motivations behind Jarvis.


Architecture Deep Dive

The most interesting part of Jarvis isn't the CLI.

It isn't PostgreSQL.

It isn't even the AI model.

The most important design decision was the architecture that sits between users and AI providers.

The goal from day one was simple:

Never lock Jarvis to a single AI provider.

That requirement shaped the entire system.


1. Provider Abstraction Layer

Every AI provider in Jarvis implements the same interface.

public interface AiProvider {

    Flux<String> streamChat(Prompt prompt);

    Mono<Boolean> isAvailable();

    String getName();

    String getModelName();
}
Enter fullscreen mode Exit fullscreen mode

Both OllamaProvider and GeminiProvider implement this contract.

That means the rest of the application never needs to know which provider is currently being used.

The provider router handles that responsibility.

return ollamaProvider.isAvailable()
    .flatMap(ollamaUp -> {

        if (ollamaUp) {
            return Mono.just((AiProvider) ollamaProvider);
        }

        return geminiProvider.isAvailable()
            .flatMap(geminiUp -> {

                if (geminiUp) {
                    return Mono.just((AiProvider) geminiProvider);
                }

                return Mono.error(
                    new RuntimeException(
                        "No provider available"));
            });
    });
Enter fullscreen mode Exit fullscreen mode

This creates a provider-agnostic architecture.

If Ollama is running, Jarvis uses Ollama.

If Ollama becomes unavailable, Jarvis automatically falls back to Gemini.

Users don't need to change anything.

The architecture stays the same.

Adding a new provider becomes straightforward:

public class ClaudeProvider
        implements AiProvider {
}
Enter fullscreen mode Exit fullscreen mode

Implement the interface.

Register the provider.

Done.

No orchestrator changes.

No controller changes.

No CLI changes.


2. Reactive Streaming

One feature I absolutely wanted was real-time token streaming.

I didn't want users waiting ten seconds for an entire response.

I wanted responses to appear immediately.

That requirement pushed the project toward a fully reactive architecture.

The flow looks like this:

Ollama
   ↓
Spring AI
   ↓
Flux<String>
   ↓
AiOrchestrator
   ↓
SSE Endpoint
   ↓
CLI Client
   ↓
Terminal Output
Enter fullscreen mode Exit fullscreen mode

Each token moves through the pipeline independently.

The user starts seeing output almost immediately.

The controller endpoint looks like this:

@PostMapping(
    value = "/stream",
    produces = MediaType.TEXT_EVENT_STREAM_VALUE
)
public Flux<ServerSentEvent<String>> stream(
        @Valid @RequestBody ChatRequest request) {

    return orchestrator.chat(...)
            .map(token ->
                    ServerSentEvent
                            .<String>builder()
                            .event("token")
                            .data(token)
                            .build());
}
Enter fullscreen mode Exit fullscreen mode

The result feels significantly faster than waiting for a complete response.

Even when generation takes several seconds, users immediately know something is happening.

That small improvement dramatically improves user experience.


3. The Whitespace Bug

One of the strangest bugs I encountered involved spaces.

Responses looked like this:

Hellohowareyoutoday?
Enter fullscreen mode Exit fullscreen mode

Instead of:

Hello how are you today?
Enter fullscreen mode Exit fullscreen mode

The cause turned out to be Server Sent Events.

Leading whitespace inside tokens was being lost during transmission.

The fix was surprisingly simple.

Instead of sending raw text, I wrapped every token in JSON.

private String jsonToken(String token) {

    return "{\"t\":\""
            + token
                .replace("\\", "\\\\")
                .replace("\"", "\\\"")
                .replace("\n", "\\n")
            + "\"}";
}
Enter fullscreen mode Exit fullscreen mode

The client then extracts the value from the JSON payload.

Problem solved.

Sometimes the hardest bugs are not AI-related at all.

They're just spaces.


4. Working Memory

One of the most common questions I receive is:

How does Jarvis know today's date?

The answer is simple.

We provide that information.

Before every request, Jarvis generates a small working-memory block.

@Component
public class WorkingMemoryBuilder {

    public String build(
            String username,
            String role,
            String sessionId,
            String modelName) {

        String currentTime =
                ZonedDateTime.now()
                        .format(...);

        return """
                Date: %s
                User: %s
                Role: %s
                Session: %s
                Model: %s
                """
                .formatted(
                        currentTime,
                        username,
                        role,
                        sessionId,
                        modelName);
    }
}
Enter fullscreen mode Exit fullscreen mode

This memory is injected into every prompt.

The AI isn't magically aware of the current date.

The application simply tells it.

Understanding that distinction helped me better understand how modern LLM applications actually work.

Much of what appears intelligent is often carefully engineered context.


5. Prompt Assembly

Every user request passes through a component called PromptAssembler.

Its job is to construct the final prompt.

The assembled prompt contains four pieces:

  1. System instructions
  2. Working memory
  3. Session history
  4. Current user message

Simplified version:

messages.add(systemPrompt);

messages.add(workingMemory);

messages.addAll(history);

messages.add(
    new UserMessage(userMessage));

return new Prompt(messages);
Enter fullscreen mode Exit fullscreen mode

This process gives the AI everything it needs to generate contextual responses.

Without prompt assembly, the AI would only see the current message.

With prompt assembly, it understands:

  • who the user is
  • previous conversation history
  • current date and time
  • session context
  • assistant instructions

This is where much of the "assistant" behavior actually comes from.


6. Spring Shell 4.0

Jarvis uses Spring Shell as its primary interface.

One challenge was adapting to the changes introduced in Spring Shell 4.

Previous versions used annotations such as:

@ShellComponent
@ShellMethod
Enter fullscreen mode Exit fullscreen mode

Those annotations were removed.

The new approach uses:

@Component
public class AuthCommands {

    @Command(
        name = "login",
        description = "Login to Jarvis")
    public String login() {
        return "OK";
    }
}
Enter fullscreen mode Exit fullscreen mode

The migration wasn't difficult.

The real challenge came from JLine integration.

I encountered a circular dependency involving LineReader.

The solution was lazy injection.

public AuthCommands(
        CliStateManager state,
        CliHttpClient http,
        @Lazy LineReader lineReader) {

    this.state = state;
    this.http = http;
    this.lineReader = lineReader;
}
Enter fullscreen mode Exit fullscreen mode

That single annotation solved hours of debugging.


7. Reactive Security

Spring Security behaves differently in reactive applications.

Traditional applications rely heavily on ThreadLocal.

Reactive applications cannot.

Requests may move across multiple threads.

Instead, WebFlux uses Reactor Context.

return chain.filter(exchange)
    .contextWrite(
        ReactiveSecurityContextHolder
            .withAuthentication(auth));
Enter fullscreen mode Exit fullscreen mode

Authentication information travels with the reactive stream itself.

Once I understood that concept, many WebFlux security patterns suddenly made much more sense.


Quick Start

Getting Jarvis running locally takes only a few minutes.

Prerequisites

  • Java 21+
  • Docker
  • Ollama

1. Clone the Repository

git clone https://github.com/sujankim/jarvis-ai-platform.git

cd jarvis-ai-platform
Enter fullscreen mode Exit fullscreen mode

2. Download a Local Model

ollama pull llama3.1:8b
Enter fullscreen mode Exit fullscreen mode

This is a one-time download of approximately 5 GB.

3. Configure Environment Variables

cp .env.example .env
Enter fullscreen mode Exit fullscreen mode

Update the .env file and set a secure JWT secret.

JARVIS_JWT_SECRET=your-secret-key
Enter fullscreen mode Exit fullscreen mode

4. Start PostgreSQL

docker-compose up -d
Enter fullscreen mode Exit fullscreen mode

5. Run Jarvis

cd server

./mvnw spring-boot:run
Enter fullscreen mode Exit fullscreen mode

Example Session

jarvis:> login

Username: dravin
Password: ******

Welcome back, Dravin!

jarvis:> chat

You: Hello Jarvis! What day is it today?

Jarvis: Today is Tuesday, June 3, 2026.

You: exit
Enter fullscreen mode Exit fullscreen mode

At this point, everything is running locally on your machine.

No cloud dependency is required.


What I Learned

Building Jarvis taught me far more than I expected.

Some lessons came from AI.

Most came from software engineering.


Reactive Programming Is Harder Than Traditional MVC

There is no point pretending otherwise.

A traditional Spring MVC application is easier to build.

A traditional JPA repository is easier to understand.

A blocking HTTP client is easier to debug.

But AI applications are fundamentally streaming applications.

Responses often take several seconds to generate.

Blocking threads while waiting for tokens simply doesn't make sense.

The reactive stack allowed me to:

  • Stream responses in real time
  • Handle multiple conversations efficiently
  • Avoid thread starvation
  • Build a true end-to-end streaming pipeline

The learning curve was steep.

However, AI workloads are fundamentally different from typical CRUD applications.

When a language model spends 10–30 seconds generating a response, blocking threads becomes expensive.

Reactive streaming solves that problem elegantly.

Instead of waiting for the entire response to finish, tokens flow through the system as they are generated.

Ollama
   ↓
Spring AI
   ↓
Flux<String>
   ↓
Server-Sent Events
   ↓
CLI Client
   ↓
Terminal Output
Enter fullscreen mode Exit fullscreen mode

The result is a much more responsive experience.

Users begin receiving output immediately instead of waiting for a complete response.

For AI applications, that difference feels enormous.

The payoff was worth it.


Spring AI Feels Like Spring

One thing I appreciate about Spring AI is that it doesn't feel like a separate ecosystem.

It feels like Spring.

Builders.

Dependency injection.

Configuration properties.

Auto-configuration.

The same conventions Java developers already know.

Creating an Ollama client feels familiar.

Creating a Gemini client feels familiar.

Switching between providers feels familiar.

That consistency significantly reduces friction.


Local AI Is Better Than Most People Think

Before building Jarvis, I assumed local models would be too slow or too limited.

I was wrong.

Running llama3.1:8b locally produces surprisingly useful results.

For:

  • General questions
  • Brainstorming
  • Coding assistance
  • Documentation help
  • Learning

it performs remarkably well.

Is it as capable as the largest cloud models?

No.

Does it need to be?

Also no.

For many personal workflows, local models are already good enough.

And the privacy benefits are enormous.


Architecture Matters More Than Models

This was probably the biggest lesson.

People often focus entirely on the model.

GPT.

Claude.

Gemini.

Llama.

Mistral.

But real AI applications are mostly architecture.

Prompt management.

Memory.

Security.

Persistence.

Streaming.

Observability.

Provider routing.

Error handling.

The model is only one piece of the system.

Building Jarvis reinforced that idea repeatedly.


What's Next?

Jarvis is still early.

Version 0.1.0 focuses on the foundation.

Future releases will add significantly more capabilities.


Phase 2 — Memory System

Current conversations are session-based.

Future versions will introduce persistent memory.

Planned features include:

  • Long-term memory
  • User preferences
  • Redis caching
  • Semantic retrieval
  • pgvector integration

The goal is simple:

Jarvis should remember useful information across sessions.


Phase 3 — RAG Engine

Retrieval-Augmented Generation is one of the most requested features.

Planned capabilities:

  • PDF ingestion
  • Knowledge bases
  • Semantic search
  • Document chat
  • Context-aware answers

Instead of asking only the model, users will be able to ask their own documents.


Phase 4 — Tool Engine

The next major step is action-taking.

Examples:

  • Weather tools
  • Search tools
  • Calculators
  • External integrations
  • MCP support

At that point Jarvis becomes more than a conversational assistant.

It becomes an assistant that can actually do things.


Phase 5 — Voice

Eventually Jarvis will gain voice capabilities.

The long-term vision is a genuinely useful local AI assistant that remains private and self-hosted.



Phase 6 — Agent System

Longer-term plans include:

  • Agent planning
  • Multi-step execution
  • Workflow automation
  • Tool orchestration

The ultimate goal is to move beyond chat and build a true personal AI assistant.


Phase 7 - Web UI

Beautiful web interface powered by the same backend.

Features:

  • Real-time streaming chat
  • Session sidebar
  • Document upload UI
  • Memory management
  • Settings panel
  • Agent dashboard
  • Voice interface

Contributing

Jarvis is open source and actively looking for contributors.

Whether you're experienced with Java or just learning Spring Boot, contributions are welcome.

Some beginner-friendly areas include:

  • Documentation improvements
  • Unit tests
  • CLI enhancements
  • New provider integrations
  • Bug fixes
  • Architecture diagrams

Repository:

https://github.com/sujankim/jarvis-ai-platform
Enter fullscreen mode Exit fullscreen mode

If you'd like to contribute, start with:

CONTRIBUTING.md
Enter fullscreen mode Exit fullscreen mode

and look for issues labeled:

good first issue
Enter fullscreen mode Exit fullscreen mode

Conclusion

When I started this project, I wasn't trying to build the next ChatGPT.

I was trying to answer a simple question:

Can modern AI applications be built effectively in Java?

After building Jarvis, my answer is absolutely yes.

The Java ecosystem has matured rapidly.

Spring Boot 4 provides an excellent foundation.

Spring AI removes much of the complexity involved in provider integrations.

WebFlux enables real-time streaming.

Ollama makes local AI practical.

Most importantly, the ecosystem finally feels ready.

If you're a Java developer who has been watching the AI space from the sidelines, there has never been a better time to start building.

The tools exist.

The frameworks exist.

The community is growing.

Now it's time to build.

If you found this article useful, I'd love to hear your thoughts.

Questions, suggestions, architecture feedback, and contributions are always welcome.

⭐ If you'd like to support the project, consider starring the repository:

https://github.com/sujankim/jarvis-ai-platform
Enter fullscreen mode Exit fullscreen mode

Your AI. Your Data. Your Machine.

Top comments (0)