Your AI. Your Data. Your Machine.
For the last few years, AI development has been dominated by Python.
When developers talk about AI frameworks, the conversation usually revolves around LangChain, LlamaIndex, AutoGPT, CrewAI, and other Python-first ecosystems.
As a Java developer, I kept asking myself:
Where is the equivalent ecosystem for Java?
The answer is that it already exists.
With Spring AI, Spring Boot 4, WebFlux, PostgreSQL, and Ollama, it is now possible to build serious AI applications entirely in Java.
That realization led me to build Jarvis AI Platform.
GitHub Repository:
https://github.com/sujankim/jarvis-ai-platform
The Problem With Most AI Assistants
Most AI assistants follow the same architecture:
Your Message
↓
Cloud Service
↓
AI Model
↓
Response
Your conversations travel through someone else's infrastructure.
You depend on their uptime.
You depend on their pricing.
You depend on their privacy policies.
If the service changes tomorrow, you're affected immediately.
That model works for many people.
But I wanted something different.
A Local-First Alternative
Jarvis follows a completely different approach:
Your Message
↓
Your Machine
↓
Ollama
↓
AI Model
↓
Response
Everything stays on your computer.
No data leaves your machine.
No monthly subscription.
No external dependency for core functionality.
That's why the project's philosophy is simple:
Your AI. Your Data. Your Machine.
What Is Jarvis AI Platform?
Jarvis is not just a chatbot.
It is a modular AI orchestration platform designed around the Java ecosystem.
At a high level, the architecture looks like this:
Spring Shell CLI / REST API
│
Spring Boot 4
│
AI Orchestration
│
+---------+---------+
│ │
OllamaProvider GeminiProvider
(Primary) (Fallback)
│
PostgreSQL
(Sessions & Messages)
The goal is to make AI providers interchangeable while keeping the application architecture clean and maintainable.
Current features in v0.1.0 include:
- Interactive AI chat with token streaming
- JWT authentication
- Argon2id password hashing
- Session persistence
- PostgreSQL storage
- Ollama local AI support
- Gemini fallback support
- Provider abstraction layer
- Working memory system
- Swagger/OpenAPI integration
- Health monitoring and diagnostics
Tech Stack
| Layer | Technology |
|---|---|
| Language | Java 21 |
| Framework | Spring Boot 4.0.6 |
| AI | Spring AI 2.0 |
| Web | Spring WebFlux |
| Security | Spring Security 7 |
| Authentication | JWT |
| Password Hashing | Argon2id |
| Database | PostgreSQL 16 |
| Database Access | R2DBC |
| Migrations | Flyway |
| CLI | Spring Shell 4 |
| Local AI | Ollama |
| Cloud AI | Gemini |
| Mapping | MapStruct 1.6 |
Why I Chose Java Instead of Python
One question I hear often is:
"Why didn't you build this in Python?"
The short answer:
Because I enjoy building systems in Java.
The longer answer is that Java provides several advantages for long-term AI applications:
- Strong type safety
- Excellent tooling
- Mature ecosystem
- Production-ready frameworks
- Reactive programming support
- Enterprise-grade security
Spring AI is making AI development feel like a natural extension of the Spring ecosystem.
Instead of learning an entirely new stack, Java developers can use tools they already know.
That was one of the biggest motivations behind Jarvis.
Architecture Deep Dive
The most interesting part of Jarvis isn't the CLI.
It isn't PostgreSQL.
It isn't even the AI model.
The most important design decision was the architecture that sits between users and AI providers.
The goal from day one was simple:
Never lock Jarvis to a single AI provider.
That requirement shaped the entire system.
1. Provider Abstraction Layer
Every AI provider in Jarvis implements the same interface.
public interface AiProvider {
Flux<String> streamChat(Prompt prompt);
Mono<Boolean> isAvailable();
String getName();
String getModelName();
}
Both OllamaProvider and GeminiProvider implement this contract.
That means the rest of the application never needs to know which provider is currently being used.
The provider router handles that responsibility.
return ollamaProvider.isAvailable()
.flatMap(ollamaUp -> {
if (ollamaUp) {
return Mono.just((AiProvider) ollamaProvider);
}
return geminiProvider.isAvailable()
.flatMap(geminiUp -> {
if (geminiUp) {
return Mono.just((AiProvider) geminiProvider);
}
return Mono.error(
new RuntimeException(
"No provider available"));
});
});
This creates a provider-agnostic architecture.
If Ollama is running, Jarvis uses Ollama.
If Ollama becomes unavailable, Jarvis automatically falls back to Gemini.
Users don't need to change anything.
The architecture stays the same.
Adding a new provider becomes straightforward:
public class ClaudeProvider
implements AiProvider {
}
Implement the interface.
Register the provider.
Done.
No orchestrator changes.
No controller changes.
No CLI changes.
2. Reactive Streaming
One feature I absolutely wanted was real-time token streaming.
I didn't want users waiting ten seconds for an entire response.
I wanted responses to appear immediately.
That requirement pushed the project toward a fully reactive architecture.
The flow looks like this:
Ollama
↓
Spring AI
↓
Flux<String>
↓
AiOrchestrator
↓
SSE Endpoint
↓
CLI Client
↓
Terminal Output
Each token moves through the pipeline independently.
The user starts seeing output almost immediately.
The controller endpoint looks like this:
@PostMapping(
value = "/stream",
produces = MediaType.TEXT_EVENT_STREAM_VALUE
)
public Flux<ServerSentEvent<String>> stream(
@Valid @RequestBody ChatRequest request) {
return orchestrator.chat(...)
.map(token ->
ServerSentEvent
.<String>builder()
.event("token")
.data(token)
.build());
}
The result feels significantly faster than waiting for a complete response.
Even when generation takes several seconds, users immediately know something is happening.
That small improvement dramatically improves user experience.
3. The Whitespace Bug
One of the strangest bugs I encountered involved spaces.
Responses looked like this:
Hellohowareyoutoday?
Instead of:
Hello how are you today?
The cause turned out to be Server Sent Events.
Leading whitespace inside tokens was being lost during transmission.
The fix was surprisingly simple.
Instead of sending raw text, I wrapped every token in JSON.
private String jsonToken(String token) {
return "{\"t\":\""
+ token
.replace("\\", "\\\\")
.replace("\"", "\\\"")
.replace("\n", "\\n")
+ "\"}";
}
The client then extracts the value from the JSON payload.
Problem solved.
Sometimes the hardest bugs are not AI-related at all.
They're just spaces.
4. Working Memory
One of the most common questions I receive is:
How does Jarvis know today's date?
The answer is simple.
We provide that information.
Before every request, Jarvis generates a small working-memory block.
@Component
public class WorkingMemoryBuilder {
public String build(
String username,
String role,
String sessionId,
String modelName) {
String currentTime =
ZonedDateTime.now()
.format(...);
return """
Date: %s
User: %s
Role: %s
Session: %s
Model: %s
"""
.formatted(
currentTime,
username,
role,
sessionId,
modelName);
}
}
This memory is injected into every prompt.
The AI isn't magically aware of the current date.
The application simply tells it.
Understanding that distinction helped me better understand how modern LLM applications actually work.
Much of what appears intelligent is often carefully engineered context.
5. Prompt Assembly
Every user request passes through a component called PromptAssembler.
Its job is to construct the final prompt.
The assembled prompt contains four pieces:
- System instructions
- Working memory
- Session history
- Current user message
Simplified version:
messages.add(systemPrompt);
messages.add(workingMemory);
messages.addAll(history);
messages.add(
new UserMessage(userMessage));
return new Prompt(messages);
This process gives the AI everything it needs to generate contextual responses.
Without prompt assembly, the AI would only see the current message.
With prompt assembly, it understands:
- who the user is
- previous conversation history
- current date and time
- session context
- assistant instructions
This is where much of the "assistant" behavior actually comes from.
6. Spring Shell 4.0
Jarvis uses Spring Shell as its primary interface.
One challenge was adapting to the changes introduced in Spring Shell 4.
Previous versions used annotations such as:
@ShellComponent
@ShellMethod
Those annotations were removed.
The new approach uses:
@Component
public class AuthCommands {
@Command(
name = "login",
description = "Login to Jarvis")
public String login() {
return "OK";
}
}
The migration wasn't difficult.
The real challenge came from JLine integration.
I encountered a circular dependency involving LineReader.
The solution was lazy injection.
public AuthCommands(
CliStateManager state,
CliHttpClient http,
@Lazy LineReader lineReader) {
this.state = state;
this.http = http;
this.lineReader = lineReader;
}
That single annotation solved hours of debugging.
7. Reactive Security
Spring Security behaves differently in reactive applications.
Traditional applications rely heavily on ThreadLocal.
Reactive applications cannot.
Requests may move across multiple threads.
Instead, WebFlux uses Reactor Context.
return chain.filter(exchange)
.contextWrite(
ReactiveSecurityContextHolder
.withAuthentication(auth));
Authentication information travels with the reactive stream itself.
Once I understood that concept, many WebFlux security patterns suddenly made much more sense.
Quick Start
Getting Jarvis running locally takes only a few minutes.
Prerequisites
- Java 21+
- Docker
- Ollama
1. Clone the Repository
git clone https://github.com/sujankim/jarvis-ai-platform.git
cd jarvis-ai-platform
2. Download a Local Model
ollama pull llama3.1:8b
This is a one-time download of approximately 5 GB.
3. Configure Environment Variables
cp .env.example .env
Update the .env file and set a secure JWT secret.
JARVIS_JWT_SECRET=your-secret-key
4. Start PostgreSQL
docker-compose up -d
5. Run Jarvis
cd server
./mvnw spring-boot:run
Example Session
jarvis:> login
Username: dravin
Password: ******
Welcome back, Dravin!
jarvis:> chat
You: Hello Jarvis! What day is it today?
Jarvis: Today is Tuesday, June 3, 2026.
You: exit
At this point, everything is running locally on your machine.
No cloud dependency is required.
What I Learned
Building Jarvis taught me far more than I expected.
Some lessons came from AI.
Most came from software engineering.
Reactive Programming Is Harder Than Traditional MVC
There is no point pretending otherwise.
A traditional Spring MVC application is easier to build.
A traditional JPA repository is easier to understand.
A blocking HTTP client is easier to debug.
But AI applications are fundamentally streaming applications.
Responses often take several seconds to generate.
Blocking threads while waiting for tokens simply doesn't make sense.
The reactive stack allowed me to:
- Stream responses in real time
- Handle multiple conversations efficiently
- Avoid thread starvation
- Build a true end-to-end streaming pipeline
The learning curve was steep.
However, AI workloads are fundamentally different from typical CRUD applications.
When a language model spends 10–30 seconds generating a response, blocking threads becomes expensive.
Reactive streaming solves that problem elegantly.
Instead of waiting for the entire response to finish, tokens flow through the system as they are generated.
Ollama
↓
Spring AI
↓
Flux<String>
↓
Server-Sent Events
↓
CLI Client
↓
Terminal Output
The result is a much more responsive experience.
Users begin receiving output immediately instead of waiting for a complete response.
For AI applications, that difference feels enormous.
The payoff was worth it.
Spring AI Feels Like Spring
One thing I appreciate about Spring AI is that it doesn't feel like a separate ecosystem.
It feels like Spring.
Builders.
Dependency injection.
Configuration properties.
Auto-configuration.
The same conventions Java developers already know.
Creating an Ollama client feels familiar.
Creating a Gemini client feels familiar.
Switching between providers feels familiar.
That consistency significantly reduces friction.
Local AI Is Better Than Most People Think
Before building Jarvis, I assumed local models would be too slow or too limited.
I was wrong.
Running llama3.1:8b locally produces surprisingly useful results.
For:
- General questions
- Brainstorming
- Coding assistance
- Documentation help
- Learning
it performs remarkably well.
Is it as capable as the largest cloud models?
No.
Does it need to be?
Also no.
For many personal workflows, local models are already good enough.
And the privacy benefits are enormous.
Architecture Matters More Than Models
This was probably the biggest lesson.
People often focus entirely on the model.
GPT.
Claude.
Gemini.
Llama.
Mistral.
But real AI applications are mostly architecture.
Prompt management.
Memory.
Security.
Persistence.
Streaming.
Observability.
Provider routing.
Error handling.
The model is only one piece of the system.
Building Jarvis reinforced that idea repeatedly.
What's Next?
Jarvis is still early.
Version 0.1.0 focuses on the foundation.
Future releases will add significantly more capabilities.
Phase 2 — Memory System
Current conversations are session-based.
Future versions will introduce persistent memory.
Planned features include:
- Long-term memory
- User preferences
- Redis caching
- Semantic retrieval
- pgvector integration
The goal is simple:
Jarvis should remember useful information across sessions.
Phase 3 — RAG Engine
Retrieval-Augmented Generation is one of the most requested features.
Planned capabilities:
- PDF ingestion
- Knowledge bases
- Semantic search
- Document chat
- Context-aware answers
Instead of asking only the model, users will be able to ask their own documents.
Phase 4 — Tool Engine
The next major step is action-taking.
Examples:
- Weather tools
- Search tools
- Calculators
- External integrations
- MCP support
At that point Jarvis becomes more than a conversational assistant.
It becomes an assistant that can actually do things.
Phase 5 — Voice
Eventually Jarvis will gain voice capabilities.
The long-term vision is a genuinely useful local AI assistant that remains private and self-hosted.
Phase 6 — Agent System
Longer-term plans include:
- Agent planning
- Multi-step execution
- Workflow automation
- Tool orchestration
The ultimate goal is to move beyond chat and build a true personal AI assistant.
Phase 7 - Web UI
Beautiful web interface powered by the same backend.
Features:
- Real-time streaming chat
- Session sidebar
- Document upload UI
- Memory management
- Settings panel
- Agent dashboard
- Voice interface
Contributing
Jarvis is open source and actively looking for contributors.
Whether you're experienced with Java or just learning Spring Boot, contributions are welcome.
Some beginner-friendly areas include:
- Documentation improvements
- Unit tests
- CLI enhancements
- New provider integrations
- Bug fixes
- Architecture diagrams
Repository:
https://github.com/sujankim/jarvis-ai-platform
If you'd like to contribute, start with:
CONTRIBUTING.md
and look for issues labeled:
good first issue
Conclusion
When I started this project, I wasn't trying to build the next ChatGPT.
I was trying to answer a simple question:
Can modern AI applications be built effectively in Java?
After building Jarvis, my answer is absolutely yes.
The Java ecosystem has matured rapidly.
Spring Boot 4 provides an excellent foundation.
Spring AI removes much of the complexity involved in provider integrations.
WebFlux enables real-time streaming.
Ollama makes local AI practical.
Most importantly, the ecosystem finally feels ready.
If you're a Java developer who has been watching the AI space from the sidelines, there has never been a better time to start building.
The tools exist.
The frameworks exist.
The community is growing.
Now it's time to build.
If you found this article useful, I'd love to hear your thoughts.
Questions, suggestions, architecture feedback, and contributions are always welcome.
⭐ If you'd like to support the project, consider starring the repository:
https://github.com/sujankim/jarvis-ai-platform
Your AI. Your Data. Your Machine.
Top comments (0)