Sujan Lamichhane

Posted on Jun 3 • Edited on Jul 7

Building a Local-First AI Assistant with Spring Boot 4 and Spring AI 2.0

#ai #java #springboot #opensource

Your AI. Your Data. Your Machine.

For the last few years, AI development has been dominated by Python.

When developers talk about AI frameworks, the conversation usually revolves around LangChain, LlamaIndex, AutoGPT, CrewAI, and other Python-first ecosystems.

As a Java developer, I kept asking myself:

Where is the equivalent ecosystem for Java?

The answer is that it already exists.

With Spring AI, Spring Boot 4, WebFlux, PostgreSQL, and Ollama, it is now possible to build serious AI applications entirely in Java.

That realization led me to build Jarvis AI Platform.

GitHub Repository:

https://github.com/sujankim/jarvis-ai-platform

The Problem With Most AI Assistants

Most AI assistants follow the same architecture:

Your Message
      ↓
 Cloud Service
      ↓
  AI Model
      ↓
  Response

Your conversations travel through someone else's infrastructure.

You depend on their uptime.

You depend on their pricing.

You depend on their privacy policies.

If the service changes tomorrow, you're affected immediately.

That model works for many people.

But I wanted something different.

A Local-First Alternative

Jarvis follows a completely different approach:

Your Message
      ↓
 Your Machine
      ↓
    Ollama
      ↓
  AI Model
      ↓
  Response

Everything stays on your computer.

No data leaves your machine.

No monthly subscription.

No external dependency for core functionality.

That's why the project's philosophy is simple:

Your AI. Your Data. Your Machine.

What Is Jarvis AI Platform?

Jarvis is not just a chatbot.

It is a modular AI orchestration platform designed around the Java ecosystem.

At a high level, the architecture looks like this:

Spring Shell CLI / REST API
              │
      Spring Boot 4
              │
      AI Orchestration
              │
    +---------+---------+
    │                   │
OllamaProvider   GeminiProvider
 (Primary)        (Fallback)
    │
 PostgreSQL
(Sessions & Messages)

The goal is to make AI providers interchangeable while keeping the application architecture clean and maintainable.

Current features in v0.1.0 include:

Interactive AI chat with token streaming
JWT authentication
Argon2id password hashing
Session persistence
PostgreSQL storage
Ollama local AI support
Gemini fallback support
Provider abstraction layer
Working memory system
Swagger/OpenAPI integration
Health monitoring and diagnostics

Tech Stack

Layer	Technology
Language	Java 21
Framework	Spring Boot 4.0.6
AI	Spring AI 2.0
Web	Spring WebFlux
Security	Spring Security 7
Authentication	JWT
Password Hashing	Argon2id
Database	PostgreSQL 16
Database Access	R2DBC
Migrations	Flyway
CLI	Spring Shell 4
Local AI	Ollama
Cloud AI	Gemini
Mapping	MapStruct 1.6

Why I Chose Java Instead of Python

One question I hear often is:

"Why didn't you build this in Python?"

The short answer:

Because I enjoy building systems in Java.

The longer answer is that Java provides several advantages for long-term AI applications:

Strong type safety
Excellent tooling
Mature ecosystem
Production-ready frameworks
Reactive programming support
Enterprise-grade security

Spring AI is making AI development feel like a natural extension of the Spring ecosystem.

Instead of learning an entirely new stack, Java developers can use tools they already know.

That was one of the biggest motivations behind Jarvis.

Architecture Deep Dive

The most interesting part of Jarvis isn't the CLI.

It isn't PostgreSQL.

It isn't even the AI model.

The most important design decision was the architecture that sits between users and AI providers.

The goal from day one was simple:

Never lock Jarvis to a single AI provider.

That requirement shaped the entire system.

1. Provider Abstraction Layer

Every AI provider in Jarvis implements the same interface.

public interface AiProvider {

    Flux<String> streamChat(Prompt prompt);

    Mono<Boolean> isAvailable();

    String getName();

    String getModelName();
}

Both OllamaProvider and GeminiProvider implement this contract.

That means the rest of the application never needs to know which provider is currently being used.

The provider router handles that responsibility.

return ollamaProvider.isAvailable()
    .flatMap(ollamaUp -> {

        if (ollamaUp) {
            return Mono.just((AiProvider) ollamaProvider);
        }

        return geminiProvider.isAvailable()
            .flatMap(geminiUp -> {

                if (geminiUp) {
                    return Mono.just((AiProvider) geminiProvider);
                }

                return Mono.error(
                    new RuntimeException(
                        "No provider available"));
            });
    });

This creates a provider-agnostic architecture.

If Ollama is running, Jarvis uses Ollama.

If Ollama becomes unavailable, Jarvis automatically falls back to Gemini.

Users don't need to change anything.

The architecture stays the same.

Adding a new provider becomes straightforward:

public class ClaudeProvider
        implements AiProvider {
}

Implement the interface.

Done.

No orchestrator changes.

No controller changes.

No CLI changes.

2. Reactive Streaming

One feature I absolutely wanted was real-time token streaming.

I didn't want users waiting ten seconds for an entire response.

I wanted responses to appear immediately.

That requirement pushed the project toward a fully reactive architecture.

The flow looks like this:

Ollama
   ↓
Spring AI
   ↓
Flux<String>
   ↓
AiOrchestrator
   ↓
SSE Endpoint
   ↓
CLI Client
   ↓
Terminal Output

Each token moves through the pipeline independently.

The user starts seeing output almost immediately.

The controller endpoint looks like this:

@PostMapping(
    value = "/stream",
    produces = MediaType.TEXT_EVENT_STREAM_VALUE
)
public Flux<ServerSentEvent<String>> stream(
        @Valid @RequestBody ChatRequest request) {

    return orchestrator.chat(...)
            .map(token ->
                    ServerSentEvent
                            .<String>builder()
                            .event("token")
                            .data(token)
                            .build());
}

The result feels significantly faster than waiting for a complete response.

Even when generation takes several seconds, users immediately know something is happening.

That small improvement dramatically improves user experience.

3. The Whitespace Bug

One of the strangest bugs I encountered involved spaces.

Responses looked like this:

Hellohowareyoutoday?

Instead of:

Hello how are you today?

The cause turned out to be Server Sent Events.

Leading whitespace inside tokens was being lost during transmission.

The fix was surprisingly simple.

Instead of sending raw text, I wrapped every token in JSON.

private String jsonToken(String token) {

    return "{\"t\":\""
            + token
                .replace("\\", "\\\\")
                .replace("\"", "\\\"")
                .replace("\n", "\\n")
            + "\"}";
}

The client then extracts the value from the JSON payload.

Problem solved.

Sometimes the hardest bugs are not AI-related at all.

They're just spaces.

4. Working Memory

One of the most common questions I receive is:

How does Jarvis know today's date?

The answer is simple.

We provide that information.

Before every request, Jarvis generates a small working-memory block.

@Component
public class WorkingMemoryBuilder {

    public String build(
            String username,
            String role,
            String sessionId,
            String modelName) {

        String currentTime =
                ZonedDateTime.now()
                        .format(...);

        return """
                Date: %s
                User: %s
                Role: %s
                Session: %s
                Model: %s
                """
                .formatted(
                        currentTime,
                        username,
                        role,
                        sessionId,
                        modelName);
    }
}

This memory is injected into every prompt.

The AI isn't magically aware of the current date.

The application simply tells it.

Understanding that distinction helped me better understand how modern LLM applications actually work.

Much of what appears intelligent is often carefully engineered context.

5. Prompt Assembly

Every user request passes through a component called PromptAssembler.

Its job is to construct the final prompt.

The assembled prompt contains four pieces:

System instructions
Working memory
Session history
Current user message

Simplified version:

messages.add(systemPrompt);

messages.add(workingMemory);

messages.addAll(history);

messages.add(
    new UserMessage(userMessage));

return new Prompt(messages);

This process gives the AI everything it needs to generate contextual responses.

Without prompt assembly, the AI would only see the current message.

With prompt assembly, it understands:

who the user is
previous conversation history
current date and time
session context
assistant instructions

This is where much of the "assistant" behavior actually comes from.

6. Spring Shell 4.0

Jarvis uses Spring Shell as its primary interface.

One challenge was adapting to the changes introduced in Spring Shell 4.

Previous versions used annotations such as:

@ShellComponent
@ShellMethod

Those annotations were removed.

The new approach uses:

@Component
public class AuthCommands {

    @Command(
        name = "login",
        description = "Login to Jarvis")
    public String login() {
        return "OK";
    }
}

The migration wasn't difficult.

The real challenge came from JLine integration.

I encountered a circular dependency involving LineReader.

The solution was lazy injection.

public AuthCommands(
        CliStateManager state,
        CliHttpClient http,
        @Lazy LineReader lineReader) {

    this.state = state;
    this.http = http;
    this.lineReader = lineReader;
}

That single annotation solved hours of debugging.

7. Reactive Security

Spring Security behaves differently in reactive applications.

Traditional applications rely heavily on ThreadLocal.

Reactive applications cannot.

Requests may move across multiple threads.

Instead, WebFlux uses Reactor Context.

return chain.filter(exchange)
    .contextWrite(
        ReactiveSecurityContextHolder
            .withAuthentication(auth));

Authentication information travels with the reactive stream itself.

Once I understood that concept, many WebFlux security patterns suddenly made much more sense.

Quick Start

Getting Jarvis running locally takes only a few minutes.

Prerequisites

Java 21+
Docker
Ollama

1. Clone the Repository

git clone https://github.com/sujankim/jarvis-ai-platform.git

cd jarvis-ai-platform

2. Download a Local Model

ollama pull llama3.1:8b

This is a one-time download of approximately 5 GB.

3. Configure Environment Variables

cp .env.example .env

Update the .env file and set a secure JWT secret.

JARVIS_JWT_SECRET=your-secret-key

4. Start PostgreSQL

docker-compose up -d

5. Run Jarvis

cd server

./mvnw spring-boot:run

Example Session

jarvis:> login

Username: dravin
Password: ******

Welcome back, Dravin!

jarvis:> chat

You: Hello Jarvis! What day is it today?

Jarvis: Today is Tuesday, June 3, 2026.

You: exit

At this point, everything is running locally on your machine.

No cloud dependency is required.

What I Learned

Building Jarvis taught me far more than I expected.

Some lessons came from AI.

Most came from software engineering.

Reactive Programming Is Harder Than Traditional MVC

There is no point pretending otherwise.

A traditional Spring MVC application is easier to build.

A traditional JPA repository is easier to understand.

A blocking HTTP client is easier to debug.

But AI applications are fundamentally streaming applications.

Responses often take several seconds to generate.

Blocking threads while waiting for tokens simply doesn't make sense.

The reactive stack allowed me to:

Stream responses in real time
Handle multiple conversations efficiently
Avoid thread starvation
Build a true end-to-end streaming pipeline

The learning curve was steep.

However, AI workloads are fundamentally different from typical CRUD applications.

When a language model spends 10–30 seconds generating a response, blocking threads becomes expensive.

Reactive streaming solves that problem elegantly.

Instead of waiting for the entire response to finish, tokens flow through the system as they are generated.

Ollama
   ↓
Spring AI
   ↓
Flux<String>
   ↓
Server-Sent Events
   ↓
CLI Client
   ↓
Terminal Output

The result is a much more responsive experience.

Users begin receiving output immediately instead of waiting for a complete response.

For AI applications, that difference feels enormous.

The payoff was worth it.

Spring AI Feels Like Spring

One thing I appreciate about Spring AI is that it doesn't feel like a separate ecosystem.

It feels like Spring.

Builders.

Dependency injection.

Configuration properties.

Auto-configuration.

The same conventions Java developers already know.

Creating an Ollama client feels familiar.

Creating a Gemini client feels familiar.

Switching between providers feels familiar.

That consistency significantly reduces friction.

Local AI Is Better Than Most People Think

Before building Jarvis, I assumed local models would be too slow or too limited.

I was wrong.

Running llama3.1:8b locally produces surprisingly useful results.

For:

General questions
Brainstorming
Coding assistance
Documentation help
Learning

it performs remarkably well.

Is it as capable as the largest cloud models?

No.

Does it need to be?

Also no.

For many personal workflows, local models are already good enough.

And the privacy benefits are enormous.

Architecture Matters More Than Models

This was probably the biggest lesson.

People often focus entirely on the model.

GPT.

Claude.

Gemini.

Llama.

Mistral.

But real AI applications are mostly architecture.

Prompt management.

Memory.

Security.

Persistence.

Streaming.

Observability.

Provider routing.

Error handling.

The model is only one piece of the system.

Building Jarvis reinforced that idea repeatedly.

What's Next?

Jarvis is still early.

Version 0.1.0 focuses on the foundation.

Future releases will add significantly more capabilities.

Phase 2 — Memory System

Current conversations are session-based.

Future versions will introduce persistent memory.

Planned features include:

Long-term memory
User preferences
Redis caching
Semantic retrieval
pgvector integration

The goal is simple:

Jarvis should remember useful information across sessions.

Phase 3 — RAG Engine

Retrieval-Augmented Generation is one of the most requested features.

Planned capabilities:

PDF ingestion
Knowledge bases
Semantic search
Document chat
Context-aware answers

Instead of asking only the model, users will be able to ask their own documents.

Phase 4 — Tool Engine

The next major step is action-taking.

Examples:

Weather tools
Search tools
Calculators
External integrations
MCP support

At that point Jarvis becomes more than a conversational assistant.

It becomes an assistant that can actually do things.

Phase 5 — Voice

Eventually Jarvis will gain voice capabilities.

The long-term vision is a genuinely useful local AI assistant that remains private and self-hosted.

Phase 6 — Agent System

Longer-term plans include:

Agent planning
Multi-step execution
Workflow automation
Tool orchestration

The ultimate goal is to move beyond chat and build a true personal AI assistant.

Phase 7 - Web UI

Beautiful web interface powered by the same backend.

Features:

Real-time streaming chat
Session sidebar
Document upload UI
Memory management
Settings panel
Agent dashboard
Voice interface

Contributing

Jarvis is open source and actively looking for contributors.

Whether you're experienced with Java or just learning Spring Boot, contributions are welcome.

Some beginner-friendly areas include:

Documentation improvements
Unit tests
CLI enhancements
New provider integrations
Bug fixes
Architecture diagrams

Repository:

https://github.com/sujankim/jarvis-ai-platform

If you'd like to contribute, start with:

CONTRIBUTING.md

and look for issues labeled:

good first issue

Conclusion

When I started this project, I wasn't trying to build the next ChatGPT.

I was trying to answer a simple question:

Can modern AI applications be built effectively in Java?

After building Jarvis, my answer is absolutely yes.

The Java ecosystem has matured rapidly.

Spring Boot 4 provides an excellent foundation.

Spring AI removes much of the complexity involved in provider integrations.

WebFlux enables real-time streaming.

Ollama makes local AI practical.

Most importantly, the ecosystem finally feels ready.

If you're a Java developer who has been watching the AI space from the sidelines, there has never been a better time to start building.

The tools exist.

The frameworks exist.

The community is growing.

Now it's time to build.

If you found this article useful, I'd love to hear your thoughts.

Questions, suggestions, architecture feedback, and contributions are always welcome.

⭐ If you'd like to support the project, consider starring the repository:

https://github.com/sujankim/jarvis-ai-platform

Your AI. Your Data. Your Machine.

Part 2: Jarvis AI Platform: Building Long-Term Memory with pgvector and Spring AI

DEV Community

Building a Local-First AI Assistant with Spring Boot 4 and Spring AI 2.0

The Problem With Most AI Assistants

A Local-First Alternative

What Is Jarvis AI Platform?

Tech Stack

Why I Chose Java Instead of Python

Architecture Deep Dive

1. Provider Abstraction Layer

2. Reactive Streaming

3. The Whitespace Bug

4. Working Memory

5. Prompt Assembly

6. Spring Shell 4.0

7. Reactive Security

Quick Start

Prerequisites

1. Clone the Repository

2. Download a Local Model

3. Configure Environment Variables

4. Start PostgreSQL

5. Run Jarvis

Example Session

What I Learned

Reactive Programming Is Harder Than Traditional MVC

Spring AI Feels Like Spring

Local AI Is Better Than Most People Think

Architecture Matters More Than Models

What's Next?

Phase 2 — Memory System

Phase 3 — RAG Engine

Phase 4 — Tool Engine

Phase 5 — Voice

Phase 6 — Agent System

Phase 7 - Web UI

Contributing

Conclusion

Top comments (0)