<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Sujan Lamichhane</title>
    <description>The latest articles on DEV Community by Sujan Lamichhane (@sujankim).</description>
    <link>https://dev.to/sujankim</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F391089%2Fe8eae186-b2f2-40af-9050-6a8a3c5c8661.png</url>
      <title>DEV Community: Sujan Lamichhane</title>
      <link>https://dev.to/sujankim</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sujankim"/>
    <language>en</language>
    <item>
      <title>Building a Local-First AI Assistant with Spring Boot 4 and Spring AI 2.0</title>
      <dc:creator>Sujan Lamichhane</dc:creator>
      <pubDate>Wed, 03 Jun 2026 13:30:58 +0000</pubDate>
      <link>https://dev.to/sujankim/building-a-local-first-ai-assistant-with-spring-boot-4-and-spring-ai-20-6ci</link>
      <guid>https://dev.to/sujankim/building-a-local-first-ai-assistant-with-spring-boot-4-and-spring-ai-20-6ci</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Your AI. Your Data. Your Machine.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;For the last few years, AI development has been dominated by Python.&lt;/p&gt;

&lt;p&gt;When developers talk about AI frameworks, the conversation usually revolves around LangChain, LlamaIndex, AutoGPT, CrewAI, and other Python-first ecosystems.&lt;/p&gt;

&lt;p&gt;As a Java developer, I kept asking myself:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where is the equivalent ecosystem for Java?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The answer is that it already exists.&lt;/p&gt;

&lt;p&gt;With Spring AI, Spring Boot 4, WebFlux, PostgreSQL, and Ollama, it is now possible to build serious AI applications entirely in Java.&lt;/p&gt;

&lt;p&gt;That realization led me to build &lt;strong&gt;Jarvis AI Platform&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;GitHub Repository:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://github.com/sujankim/jarvis-ai-platform
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Problem With Most AI Assistants
&lt;/h2&gt;

&lt;p&gt;Most AI assistants follow the same architecture:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Your Message
      ↓
 Cloud Service
      ↓
  AI Model
      ↓
  Response
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your conversations travel through someone else's infrastructure.&lt;/p&gt;

&lt;p&gt;You depend on their uptime.&lt;/p&gt;

&lt;p&gt;You depend on their pricing.&lt;/p&gt;

&lt;p&gt;You depend on their privacy policies.&lt;/p&gt;

&lt;p&gt;If the service changes tomorrow, you're affected immediately.&lt;/p&gt;

&lt;p&gt;That model works for many people.&lt;/p&gt;

&lt;p&gt;But I wanted something different.&lt;/p&gt;




&lt;h2&gt;
  
  
  A Local-First Alternative
&lt;/h2&gt;

&lt;p&gt;Jarvis follows a completely different approach:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Your Message
      ↓
 Your Machine
      ↓
    Ollama
      ↓
  AI Model
      ↓
  Response
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Everything stays on your computer.&lt;/p&gt;

&lt;p&gt;No data leaves your machine.&lt;/p&gt;

&lt;p&gt;No monthly subscription.&lt;/p&gt;

&lt;p&gt;No external dependency for core functionality.&lt;/p&gt;

&lt;p&gt;That's why the project's philosophy is simple:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Your AI. Your Data. Your Machine.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  What Is Jarvis AI Platform?
&lt;/h2&gt;

&lt;p&gt;Jarvis is not just a chatbot.&lt;/p&gt;

&lt;p&gt;It is a modular AI orchestration platform designed around the Java ecosystem.&lt;/p&gt;

&lt;p&gt;At a high level, the architecture looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Spring Shell CLI / REST API
              │
      Spring Boot 4
              │
      AI Orchestration
              │
    +---------+---------+
    │                   │
OllamaProvider   GeminiProvider
 (Primary)        (Fallback)
    │
 PostgreSQL
(Sessions &amp;amp; Messages)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The goal is to make AI providers interchangeable while keeping the application architecture clean and maintainable.&lt;/p&gt;

&lt;p&gt;Current features in &lt;strong&gt;v0.1.0&lt;/strong&gt; include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Interactive AI chat with token streaming&lt;/li&gt;
&lt;li&gt;JWT authentication&lt;/li&gt;
&lt;li&gt;Argon2id password hashing&lt;/li&gt;
&lt;li&gt;Session persistence&lt;/li&gt;
&lt;li&gt;PostgreSQL storage&lt;/li&gt;
&lt;li&gt;Ollama local AI support&lt;/li&gt;
&lt;li&gt;Gemini fallback support&lt;/li&gt;
&lt;li&gt;Provider abstraction layer&lt;/li&gt;
&lt;li&gt;Working memory system&lt;/li&gt;
&lt;li&gt;Swagger/OpenAPI integration&lt;/li&gt;
&lt;li&gt;Health monitoring and diagnostics&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Tech Stack
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Technology&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Language&lt;/td&gt;
&lt;td&gt;Java 21&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Framework&lt;/td&gt;
&lt;td&gt;Spring Boot 4.0.6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI&lt;/td&gt;
&lt;td&gt;Spring AI 2.0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Web&lt;/td&gt;
&lt;td&gt;Spring WebFlux&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Security&lt;/td&gt;
&lt;td&gt;Spring Security 7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Authentication&lt;/td&gt;
&lt;td&gt;JWT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Password Hashing&lt;/td&gt;
&lt;td&gt;Argon2id&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Database&lt;/td&gt;
&lt;td&gt;PostgreSQL 16&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Database Access&lt;/td&gt;
&lt;td&gt;R2DBC&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Migrations&lt;/td&gt;
&lt;td&gt;Flyway&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CLI&lt;/td&gt;
&lt;td&gt;Spring Shell 4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Local AI&lt;/td&gt;
&lt;td&gt;Ollama&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cloud AI&lt;/td&gt;
&lt;td&gt;Gemini&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mapping&lt;/td&gt;
&lt;td&gt;MapStruct 1.6&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Why I Chose Java Instead of Python
&lt;/h2&gt;

&lt;p&gt;One question I hear often is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Why didn't you build this in Python?"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The short answer:&lt;/p&gt;

&lt;p&gt;Because I enjoy building systems in Java.&lt;/p&gt;

&lt;p&gt;The longer answer is that Java provides several advantages for long-term AI applications:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Strong type safety&lt;/li&gt;
&lt;li&gt;Excellent tooling&lt;/li&gt;
&lt;li&gt;Mature ecosystem&lt;/li&gt;
&lt;li&gt;Production-ready frameworks&lt;/li&gt;
&lt;li&gt;Reactive programming support&lt;/li&gt;
&lt;li&gt;Enterprise-grade security&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Spring AI is making AI development feel like a natural extension of the Spring ecosystem.&lt;/p&gt;

&lt;p&gt;Instead of learning an entirely new stack, Java developers can use tools they already know.&lt;/p&gt;

&lt;p&gt;That was one of the biggest motivations behind Jarvis.&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture Deep Dive
&lt;/h2&gt;

&lt;p&gt;The most interesting part of Jarvis isn't the CLI.&lt;/p&gt;

&lt;p&gt;It isn't PostgreSQL.&lt;/p&gt;

&lt;p&gt;It isn't even the AI model.&lt;/p&gt;

&lt;p&gt;The most important design decision was the architecture that sits between users and AI providers.&lt;/p&gt;

&lt;p&gt;The goal from day one was simple:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Never lock Jarvis to a single AI provider.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That requirement shaped the entire system.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. Provider Abstraction Layer
&lt;/h2&gt;

&lt;p&gt;Every AI provider in Jarvis implements the same interface.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;interface&lt;/span&gt; &lt;span class="nc"&gt;AiProvider&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

    &lt;span class="nc"&gt;Flux&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;streamChat&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Prompt&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

    &lt;span class="nc"&gt;Mono&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Boolean&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;isAvailable&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

    &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="nf"&gt;getName&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

    &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="nf"&gt;getModelName&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both &lt;code&gt;OllamaProvider&lt;/code&gt; and &lt;code&gt;GeminiProvider&lt;/code&gt; implement this contract.&lt;/p&gt;

&lt;p&gt;That means the rest of the application never needs to know which provider is currently being used.&lt;/p&gt;

&lt;p&gt;The provider router handles that responsibility.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ollamaProvider&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;isAvailable&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;flatMap&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ollamaUp&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ollamaUp&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Mono&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;just&lt;/span&gt;&lt;span class="o"&gt;((&lt;/span&gt;&lt;span class="nc"&gt;AiProvider&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="n"&gt;ollamaProvider&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;geminiProvider&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;isAvailable&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;flatMap&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;geminiUp&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;geminiUp&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
                    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Mono&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;just&lt;/span&gt;&lt;span class="o"&gt;((&lt;/span&gt;&lt;span class="nc"&gt;AiProvider&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="n"&gt;geminiProvider&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
                &lt;span class="o"&gt;}&lt;/span&gt;

                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Mono&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
                    &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;RuntimeException&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
                        &lt;span class="s"&gt;"No provider available"&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;
            &lt;span class="o"&gt;});&lt;/span&gt;
    &lt;span class="o"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This creates a provider-agnostic architecture.&lt;/p&gt;

&lt;p&gt;If Ollama is running, Jarvis uses Ollama.&lt;/p&gt;

&lt;p&gt;If Ollama becomes unavailable, Jarvis automatically falls back to Gemini.&lt;/p&gt;

&lt;p&gt;Users don't need to change anything.&lt;/p&gt;

&lt;p&gt;The architecture stays the same.&lt;/p&gt;

&lt;p&gt;Adding a new provider becomes straightforward:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ClaudeProvider&lt;/span&gt;
        &lt;span class="kd"&gt;implements&lt;/span&gt; &lt;span class="nc"&gt;AiProvider&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Implement the interface.&lt;/p&gt;

&lt;p&gt;Register the provider.&lt;/p&gt;

&lt;p&gt;Done.&lt;/p&gt;

&lt;p&gt;No orchestrator changes.&lt;/p&gt;

&lt;p&gt;No controller changes.&lt;/p&gt;

&lt;p&gt;No CLI changes.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. Reactive Streaming
&lt;/h2&gt;

&lt;p&gt;One feature I absolutely wanted was real-time token streaming.&lt;/p&gt;

&lt;p&gt;I didn't want users waiting ten seconds for an entire response.&lt;/p&gt;

&lt;p&gt;I wanted responses to appear immediately.&lt;/p&gt;

&lt;p&gt;That requirement pushed the project toward a fully reactive architecture.&lt;/p&gt;

&lt;p&gt;The flow looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Ollama
   ↓
Spring AI
   ↓
Flux&amp;lt;String&amp;gt;
   ↓
AiOrchestrator
   ↓
SSE Endpoint
   ↓
CLI Client
   ↓
Terminal Output
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each token moves through the pipeline independently.&lt;/p&gt;

&lt;p&gt;The user starts seeing output almost immediately.&lt;/p&gt;

&lt;p&gt;The controller endpoint looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@PostMapping&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"/stream"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;produces&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MediaType&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;TEXT_EVENT_STREAM_VALUE&lt;/span&gt;
&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;Flux&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;ServerSentEvent&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
        &lt;span class="nd"&gt;@Valid&lt;/span&gt; &lt;span class="nd"&gt;@RequestBody&lt;/span&gt; &lt;span class="nc"&gt;ChatRequest&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;orchestrator&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;chat&lt;/span&gt;&lt;span class="o"&gt;(...)&lt;/span&gt;
            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;map&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;
                    &lt;span class="nc"&gt;ServerSentEvent&lt;/span&gt;
                            &lt;span class="o"&gt;.&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
                            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;event&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"token"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
                            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
                            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The result feels significantly faster than waiting for a complete response.&lt;/p&gt;

&lt;p&gt;Even when generation takes several seconds, users immediately know something is happening.&lt;/p&gt;

&lt;p&gt;That small improvement dramatically improves user experience.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. The Whitespace Bug
&lt;/h2&gt;

&lt;p&gt;One of the strangest bugs I encountered involved spaces.&lt;/p&gt;

&lt;p&gt;Responses looked like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Hellohowareyoutoday?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Instead of:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Hello how are you today?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The cause turned out to be Server Sent Events.&lt;/p&gt;

&lt;p&gt;Leading whitespace inside tokens was being lost during transmission.&lt;/p&gt;

&lt;p&gt;The fix was surprisingly simple.&lt;/p&gt;

&lt;p&gt;Instead of sending raw text, I wrapped every token in JSON.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="nf"&gt;jsonToken&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;"{\"t\":\""&lt;/span&gt;
            &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt;
                &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;replace&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"\\"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"\\\\"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
                &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;replace&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"\""&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"\\\""&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
                &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;replace&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"\n"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"\\n"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;"\"}"&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The client then extracts the value from the JSON payload.&lt;/p&gt;

&lt;p&gt;Problem solved.&lt;/p&gt;

&lt;p&gt;Sometimes the hardest bugs are not AI-related at all.&lt;/p&gt;

&lt;p&gt;They're just spaces.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. Working Memory
&lt;/h2&gt;

&lt;p&gt;One of the most common questions I receive is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;How does Jarvis know today's date?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The answer is simple.&lt;/p&gt;

&lt;p&gt;We provide that information.&lt;/p&gt;

&lt;p&gt;Before every request, Jarvis generates a small working-memory block.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Component&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;WorkingMemoryBuilder&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="nf"&gt;build&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
            &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
            &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
            &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;sessionId&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
            &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;modelName&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

        &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;currentTime&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
                &lt;span class="nc"&gt;ZonedDateTime&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;now&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
                        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;format&lt;/span&gt;&lt;span class="o"&gt;(...);&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"""
                Date: %s
                User: %s
                Role: %s
                Session: %s
                Model: %s
                """&lt;/span&gt;
                &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;formatted&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
                        &lt;span class="n"&gt;currentTime&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;sessionId&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;modelName&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This memory is injected into every prompt.&lt;/p&gt;

&lt;p&gt;The AI isn't magically aware of the current date.&lt;/p&gt;

&lt;p&gt;The application simply tells it.&lt;/p&gt;

&lt;p&gt;Understanding that distinction helped me better understand how modern LLM applications actually work.&lt;/p&gt;

&lt;p&gt;Much of what appears intelligent is often carefully engineered context.&lt;/p&gt;




&lt;h2&gt;
  
  
  5. Prompt Assembly
&lt;/h2&gt;

&lt;p&gt;Every user request passes through a component called &lt;code&gt;PromptAssembler&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Its job is to construct the final prompt.&lt;/p&gt;

&lt;p&gt;The assembled prompt contains four pieces:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;System instructions&lt;/li&gt;
&lt;li&gt;Working memory&lt;/li&gt;
&lt;li&gt;Session history&lt;/li&gt;
&lt;li&gt;Current user message&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Simplified version:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;add&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;systemPrompt&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;add&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;workingMemory&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;addAll&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;add&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;UserMessage&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;userMessage&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;

&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;Prompt&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This process gives the AI everything it needs to generate contextual responses.&lt;/p&gt;

&lt;p&gt;Without prompt assembly, the AI would only see the current message.&lt;/p&gt;

&lt;p&gt;With prompt assembly, it understands:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;who the user is&lt;/li&gt;
&lt;li&gt;previous conversation history&lt;/li&gt;
&lt;li&gt;current date and time&lt;/li&gt;
&lt;li&gt;session context&lt;/li&gt;
&lt;li&gt;assistant instructions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is where much of the "assistant" behavior actually comes from.&lt;/p&gt;




&lt;h2&gt;
  
  
  6. Spring Shell 4.0
&lt;/h2&gt;

&lt;p&gt;Jarvis uses Spring Shell as its primary interface.&lt;/p&gt;

&lt;p&gt;One challenge was adapting to the changes introduced in Spring Shell 4.&lt;/p&gt;

&lt;p&gt;Previous versions used annotations such as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@ShellComponent&lt;/span&gt;
&lt;span class="nd"&gt;@ShellMethod&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Those annotations were removed.&lt;/p&gt;

&lt;p&gt;The new approach uses:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Component&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AuthCommands&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

    &lt;span class="nd"&gt;@Command&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"login"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;description&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"Login to Jarvis"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="nf"&gt;login&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;"OK"&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The migration wasn't difficult.&lt;/p&gt;

&lt;p&gt;The real challenge came from JLine integration.&lt;/p&gt;

&lt;p&gt;I encountered a circular dependency involving &lt;code&gt;LineReader&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The solution was lazy injection.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nf"&gt;AuthCommands&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
        &lt;span class="nc"&gt;CliStateManager&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
        &lt;span class="nc"&gt;CliHttpClient&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
        &lt;span class="nd"&gt;@Lazy&lt;/span&gt; &lt;span class="nc"&gt;LineReader&lt;/span&gt; &lt;span class="n"&gt;lineReader&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;http&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;lineReader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lineReader&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That single annotation solved hours of debugging.&lt;/p&gt;




&lt;h2&gt;
  
  
  7. Reactive Security
&lt;/h2&gt;

&lt;p&gt;Spring Security behaves differently in reactive applications.&lt;/p&gt;

&lt;p&gt;Traditional applications rely heavily on &lt;code&gt;ThreadLocal&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Reactive applications cannot.&lt;/p&gt;

&lt;p&gt;Requests may move across multiple threads.&lt;/p&gt;

&lt;p&gt;Instead, WebFlux uses Reactor Context.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;chain&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;filter&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;exchange&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;contextWrite&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
        &lt;span class="nc"&gt;ReactiveSecurityContextHolder&lt;/span&gt;
            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;withAuthentication&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Authentication information travels with the reactive stream itself.&lt;/p&gt;

&lt;p&gt;Once I understood that concept, many WebFlux security patterns suddenly made much more sense.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick Start
&lt;/h2&gt;

&lt;p&gt;Getting Jarvis running locally takes only a few minutes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Java 21+&lt;/li&gt;
&lt;li&gt;Docker&lt;/li&gt;
&lt;li&gt;Ollama&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  1. Clone the Repository
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/sujankim/jarvis-ai-platform.git

&lt;span class="nb"&gt;cd &lt;/span&gt;jarvis-ai-platform
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Download a Local Model
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama pull llama3.1:8b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a one-time download of approximately 5 GB.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Configure Environment Variables
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cp&lt;/span&gt; .env.example .env
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Update the &lt;code&gt;.env&lt;/code&gt; file and set a secure JWT secret.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;JARVIS_JWT_SECRET=your-secret-key
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. Start PostgreSQL
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker-compose up &lt;span class="nt"&gt;-d&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  5. Run Jarvis
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;server

./mvnw spring-boot:run
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Example Session
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;jarvis:&amp;gt; login

Username: dravin
Password: ******

Welcome back, Dravin!

jarvis:&amp;gt; chat

You: Hello Jarvis! What day is it today?

Jarvis: Today is Tuesday, June 3, 2026.

You: exit
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At this point, everything is running locally on your machine.&lt;/p&gt;

&lt;p&gt;No cloud dependency is required.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;Building Jarvis taught me far more than I expected.&lt;/p&gt;

&lt;p&gt;Some lessons came from AI.&lt;/p&gt;

&lt;p&gt;Most came from software engineering.&lt;/p&gt;




&lt;h2&gt;
  
  
  Reactive Programming Is Harder Than Traditional MVC
&lt;/h2&gt;

&lt;p&gt;There is no point pretending otherwise.&lt;/p&gt;

&lt;p&gt;A traditional Spring MVC application is easier to build.&lt;/p&gt;

&lt;p&gt;A traditional JPA repository is easier to understand.&lt;/p&gt;

&lt;p&gt;A blocking HTTP client is easier to debug.&lt;/p&gt;

&lt;p&gt;But AI applications are fundamentally streaming applications.&lt;/p&gt;

&lt;p&gt;Responses often take several seconds to generate.&lt;/p&gt;

&lt;p&gt;Blocking threads while waiting for tokens simply doesn't make sense.&lt;/p&gt;

&lt;p&gt;The reactive stack allowed me to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Stream responses in real time&lt;/li&gt;
&lt;li&gt;Handle multiple conversations efficiently&lt;/li&gt;
&lt;li&gt;Avoid thread starvation&lt;/li&gt;
&lt;li&gt;Build a true end-to-end streaming pipeline&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The learning curve was steep.&lt;/p&gt;

&lt;p&gt;However, AI workloads are fundamentally different from typical CRUD applications.&lt;/p&gt;

&lt;p&gt;When a language model spends 10–30 seconds generating a response, blocking threads becomes expensive.&lt;/p&gt;

&lt;p&gt;Reactive streaming solves that problem elegantly.&lt;/p&gt;

&lt;p&gt;Instead of waiting for the entire response to finish, tokens flow through the system as they are generated.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Ollama
   ↓
Spring AI
   ↓
Flux&amp;lt;String&amp;gt;
   ↓
Server-Sent Events
   ↓
CLI Client
   ↓
Terminal Output
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The result is a much more responsive experience.&lt;/p&gt;

&lt;p&gt;Users begin receiving output immediately instead of waiting for a complete response.&lt;/p&gt;

&lt;p&gt;For AI applications, that difference feels enormous.&lt;/p&gt;

&lt;p&gt;The payoff was worth it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Spring AI Feels Like Spring
&lt;/h2&gt;

&lt;p&gt;One thing I appreciate about Spring AI is that it doesn't feel like a separate ecosystem.&lt;/p&gt;

&lt;p&gt;It feels like Spring.&lt;/p&gt;

&lt;p&gt;Builders.&lt;/p&gt;

&lt;p&gt;Dependency injection.&lt;/p&gt;

&lt;p&gt;Configuration properties.&lt;/p&gt;

&lt;p&gt;Auto-configuration.&lt;/p&gt;

&lt;p&gt;The same conventions Java developers already know.&lt;/p&gt;

&lt;p&gt;Creating an Ollama client feels familiar.&lt;/p&gt;

&lt;p&gt;Creating a Gemini client feels familiar.&lt;/p&gt;

&lt;p&gt;Switching between providers feels familiar.&lt;/p&gt;

&lt;p&gt;That consistency significantly reduces friction.&lt;/p&gt;




&lt;h2&gt;
  
  
  Local AI Is Better Than Most People Think
&lt;/h2&gt;

&lt;p&gt;Before building Jarvis, I assumed local models would be too slow or too limited.&lt;/p&gt;

&lt;p&gt;I was wrong.&lt;/p&gt;

&lt;p&gt;Running &lt;code&gt;llama3.1:8b&lt;/code&gt; locally produces surprisingly useful results.&lt;/p&gt;

&lt;p&gt;For:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;General questions&lt;/li&gt;
&lt;li&gt;Brainstorming&lt;/li&gt;
&lt;li&gt;Coding assistance&lt;/li&gt;
&lt;li&gt;Documentation help&lt;/li&gt;
&lt;li&gt;Learning&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;it performs remarkably well.&lt;/p&gt;

&lt;p&gt;Is it as capable as the largest cloud models?&lt;/p&gt;

&lt;p&gt;No.&lt;/p&gt;

&lt;p&gt;Does it need to be?&lt;/p&gt;

&lt;p&gt;Also no.&lt;/p&gt;

&lt;p&gt;For many personal workflows, local models are already good enough.&lt;/p&gt;

&lt;p&gt;And the privacy benefits are enormous.&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture Matters More Than Models
&lt;/h2&gt;

&lt;p&gt;This was probably the biggest lesson.&lt;/p&gt;

&lt;p&gt;People often focus entirely on the model.&lt;/p&gt;

&lt;p&gt;GPT.&lt;/p&gt;

&lt;p&gt;Claude.&lt;/p&gt;

&lt;p&gt;Gemini.&lt;/p&gt;

&lt;p&gt;Llama.&lt;/p&gt;

&lt;p&gt;Mistral.&lt;/p&gt;

&lt;p&gt;But real AI applications are mostly architecture.&lt;/p&gt;

&lt;p&gt;Prompt management.&lt;/p&gt;

&lt;p&gt;Memory.&lt;/p&gt;

&lt;p&gt;Security.&lt;/p&gt;

&lt;p&gt;Persistence.&lt;/p&gt;

&lt;p&gt;Streaming.&lt;/p&gt;

&lt;p&gt;Observability.&lt;/p&gt;

&lt;p&gt;Provider routing.&lt;/p&gt;

&lt;p&gt;Error handling.&lt;/p&gt;

&lt;p&gt;The model is only one piece of the system.&lt;/p&gt;

&lt;p&gt;Building Jarvis reinforced that idea repeatedly.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Next?
&lt;/h2&gt;

&lt;p&gt;Jarvis is still early.&lt;/p&gt;

&lt;p&gt;Version 0.1.0 focuses on the foundation.&lt;/p&gt;

&lt;p&gt;Future releases will add significantly more capabilities.&lt;/p&gt;




&lt;h2&gt;
  
  
  Phase 2 — Memory System
&lt;/h2&gt;

&lt;p&gt;Current conversations are session-based.&lt;/p&gt;

&lt;p&gt;Future versions will introduce persistent memory.&lt;/p&gt;

&lt;p&gt;Planned features include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Long-term memory&lt;/li&gt;
&lt;li&gt;User preferences&lt;/li&gt;
&lt;li&gt;Redis caching&lt;/li&gt;
&lt;li&gt;Semantic retrieval&lt;/li&gt;
&lt;li&gt;pgvector integration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal is simple:&lt;/p&gt;

&lt;p&gt;Jarvis should remember useful information across sessions.&lt;/p&gt;




&lt;h2&gt;
  
  
  Phase 3 — RAG Engine
&lt;/h2&gt;

&lt;p&gt;Retrieval-Augmented Generation is one of the most requested features.&lt;/p&gt;

&lt;p&gt;Planned capabilities:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PDF ingestion&lt;/li&gt;
&lt;li&gt;Knowledge bases&lt;/li&gt;
&lt;li&gt;Semantic search&lt;/li&gt;
&lt;li&gt;Document chat&lt;/li&gt;
&lt;li&gt;Context-aware answers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead of asking only the model, users will be able to ask their own documents.&lt;/p&gt;




&lt;h2&gt;
  
  
  Phase 4 — Tool Engine
&lt;/h2&gt;

&lt;p&gt;The next major step is action-taking.&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Weather tools&lt;/li&gt;
&lt;li&gt;Search tools&lt;/li&gt;
&lt;li&gt;Calculators&lt;/li&gt;
&lt;li&gt;External integrations&lt;/li&gt;
&lt;li&gt;MCP support&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At that point Jarvis becomes more than a conversational assistant.&lt;/p&gt;

&lt;p&gt;It becomes an assistant that can actually do things.&lt;/p&gt;




&lt;h2&gt;
  
  
  Phase 5 — Voice
&lt;/h2&gt;

&lt;p&gt;Eventually Jarvis will gain voice capabilities.&lt;/p&gt;

&lt;p&gt;The long-term vision is a genuinely useful local AI assistant that remains private and self-hosted.&lt;/p&gt;







&lt;h2&gt;
  
  
  Phase 6 — Agent System
&lt;/h2&gt;

&lt;p&gt;Longer-term plans include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Agent planning&lt;/li&gt;
&lt;li&gt;Multi-step execution&lt;/li&gt;
&lt;li&gt;Workflow automation&lt;/li&gt;
&lt;li&gt;Tool orchestration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The ultimate goal is to move beyond chat and build a true personal AI assistant.&lt;/p&gt;




&lt;h2&gt;
  
  
  Phase 7 - Web UI
&lt;/h2&gt;

&lt;p&gt;Beautiful web interface powered by the same backend.&lt;/p&gt;

&lt;p&gt;Features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Real-time streaming chat&lt;/li&gt;
&lt;li&gt;Session sidebar&lt;/li&gt;
&lt;li&gt;Document upload UI&lt;/li&gt;
&lt;li&gt;Memory management&lt;/li&gt;
&lt;li&gt;Settings panel&lt;/li&gt;
&lt;li&gt;Agent dashboard&lt;/li&gt;
&lt;li&gt;Voice interface&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Contributing
&lt;/h2&gt;

&lt;p&gt;Jarvis is open source and actively looking for contributors.&lt;/p&gt;

&lt;p&gt;Whether you're experienced with Java or just learning Spring Boot, contributions are welcome.&lt;/p&gt;

&lt;p&gt;Some beginner-friendly areas include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Documentation improvements&lt;/li&gt;
&lt;li&gt;Unit tests&lt;/li&gt;
&lt;li&gt;CLI enhancements&lt;/li&gt;
&lt;li&gt;New provider integrations&lt;/li&gt;
&lt;li&gt;Bug fixes&lt;/li&gt;
&lt;li&gt;Architecture diagrams&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Repository:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://github.com/sujankim/jarvis-ai-platform
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you'd like to contribute, start with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CONTRIBUTING.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and look for issues labeled:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;good first issue
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;When I started this project, I wasn't trying to build the next ChatGPT.&lt;/p&gt;

&lt;p&gt;I was trying to answer a simple question:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Can modern AI applications be built effectively in Java?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;After building Jarvis, my answer is absolutely yes.&lt;/p&gt;

&lt;p&gt;The Java ecosystem has matured rapidly.&lt;/p&gt;

&lt;p&gt;Spring Boot 4 provides an excellent foundation.&lt;/p&gt;

&lt;p&gt;Spring AI removes much of the complexity involved in provider integrations.&lt;/p&gt;

&lt;p&gt;WebFlux enables real-time streaming.&lt;/p&gt;

&lt;p&gt;Ollama makes local AI practical.&lt;/p&gt;

&lt;p&gt;Most importantly, the ecosystem finally feels ready.&lt;/p&gt;

&lt;p&gt;If you're a Java developer who has been watching the AI space from the sidelines, there has never been a better time to start building.&lt;/p&gt;

&lt;p&gt;The tools exist.&lt;/p&gt;

&lt;p&gt;The frameworks exist.&lt;/p&gt;

&lt;p&gt;The community is growing.&lt;/p&gt;

&lt;p&gt;Now it's time to build.&lt;/p&gt;

&lt;p&gt;If you found this article useful, I'd love to hear your thoughts.&lt;/p&gt;

&lt;p&gt;Questions, suggestions, architecture feedback, and contributions are always welcome.&lt;/p&gt;

&lt;p&gt;⭐ If you'd like to support the project, consider starring the repository:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://github.com/sujankim/jarvis-ai-platform
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Your AI. Your Data. Your Machine.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>java</category>
      <category>ai</category>
      <category>opensource</category>
      <category>springboot</category>
    </item>
  </channel>
</rss>
