How we're teaching a Java AI assistant to remember you across sessions
What Is Jarvis?
A month ago, I released Jarvis AI Platform — an open-source, local-first AI assistant built entirely with Java and the Spring ecosystem.
Phase 1 (released v0.1.0) gave Jarvis the ability to:
- Chat via CLI with streaming token responses
- Authenticate with JWT + Argon2id
- Persist conversations to PostgreSQL
- Fall back from Ollama to Gemini automatically
But Jarvis had one critical limitation: it forgot everything between sessions.
Here is the first part article:
Building a Local-First AI Assistant with Spring Boot 4 and Spring AI 2.0 - DEV Community
Phase 2 changes that.
The Memory Problem
Without memory, every session starts blank.
Session 1
You: "I'm building an AI platform in Java"
Jarvis: "That sounds interesting!"
Session 2 (next day)
You: "How is my project going?"
Jarvis: "I don't know what project you're referring to."
This is the fundamental limitation of LLMs. They have no persistent memory. The AI model itself forgets everything the moment the conversation ends.
Phase 2 solves this by building memory AROUND the AI.
The Architecture
Our memory system has four layers.
Layer 1: Working Memory (Phase 1 — Done)
Fresh context injected into every single prompt:
- Current date and time
- Username and role
- Session ID and active model
This is why Jarvis knows "it's Tuesday" without searching the internet.
Layer 2: Session Memory (Phase 1 — Done)
Conversation history within a single session:
- Last 20 messages loaded from PostgreSQL
- Cached in Redis for 1ms access (vs 50ms from DB)
- Invalidated after each message exchange
Layer 3: Long-Term Memory (Phase 2 — Building)
Facts Jarvis learns about you across ALL sessions:
-
FACT: "User's name is Dravin" -
GOAL: "Building Jarvis AI Platform" -
PREFERENCE: "Prefers detailed explanations" -
CONTEXT: "Uses Windows 11, 16GB RAM" -
EVENT: "Published first article on Dev.to"
Stored in PostgreSQL with importance scores (0.0–1.0).
Higher importance memories are injected into prompts first.
Layer 4: Semantic Memory (Phase 2 — Building)
Meaning-based search using pgvector embeddings.
User asks:
How is my coding project going?
Text search finds:
nothing
Semantic search finds:
User is building Jarvis AI Platform
Even though the words are completely different.
How pgvector Works In Jarvis
pgvector is a PostgreSQL extension that adds vector similarity search.
Here is how we use it.
Step 1: Install pgvector
We build a custom Docker image because the standard PostgreSQL image does not include pgvector.
FROM postgres:16-alpine
RUN apk add --no-cache git build-base clang llvm postgresql-dev
RUN ln -sf "$(which clang)" /usr/local/bin/clang-19
RUN git clone --branch v0.7.4 --depth 1 \
https://github.com/pgvector/pgvector.git /tmp/pgvector \
&& cd /tmp/pgvector && make && make install
This was one of the hardest parts. Alpine Linux packages clang under different names than PostgreSQL expects.
Step 2: Add Vector Column To Memories
ALTER TABLE memories
ADD COLUMN embedding vector(768);
We use Ollama's nomic-embed-text model which produces 768-dimensional vectors.
Step 3: Semantic Search Function
CREATE FUNCTION search_memories_by_embedding(
p_user_id UUID,
p_embedding vector(768),
p_limit INTEGER DEFAULT 5
)
RETURNS TABLE (content TEXT, similarity FLOAT)
AS $$
SELECT m.content,
1 - (m.embedding <=> p_embedding) AS similarity
FROM memories m
WHERE m.user_id = p_user_id
AND m.embedding IS NOT NULL
ORDER BY m.embedding <=> p_embedding ASC
LIMIT p_limit;
$$;
The <=> operator computes cosine distance.
Lower distance means higher similarity.
Redis Session Caching
Every chat message used to query PostgreSQL for session history (~50ms).
With Redis caching:
- First message: PostgreSQL (~50ms) then cached in Redis
- Subsequent messages: Redis HIT (~1ms)
- Cache format: JSON Lines with role, ID, content, error flag, timestamp
We learned the hard way that caching full R2DBC record objects causes Jackson deserialization issues.
Our solution was to serialize only the essential fields.
What Is Left To Build
Phase 2 is approximately 40% complete.
Remaining:
- Memory entity + repository
- MemoryService CRUD operations
- Memory extraction from conversations
- Memory injection into prompts
- Semantic memory search
- CLI memory commands
- Conversation summarization
All of these are open issues on GitHub and we are actively looking for contributors.
What We Learned
1. Building pgvector in Docker is harder than expected
PostgreSQL on Alpine uses clang-19 for LLVM/JIT support, but Alpine only provides newer versions.
We had to create compatibility symlinks for the PostgreSQL build process.
2. WebFlux + R2DBC + Redis requires careful architecture
You cannot mix blocking and reactive code carelessly.
Our rule:
- R2DBC for application queries
- JDBC for Flyway migrations and pgvector setup
- Redis via ReactiveRedisTemplate
- Never call
.block()outside the CLI layer
3. Spring AI 2.0 is ready for production
We have been on Spring AI M8 since day one.
The ChatClient API is clean, provider abstraction works well, and streaming is reliable.
Java developers no longer need Python for AI applications.
Contributing
Jarvis is open source (Apache 2.0) and actively seeking contributors.
Good First Issues
- Add memory list CLI command
- Add memory REST API endpoints
- Write unit tests for MemoryService
Advanced Issues
- Memory extraction from conversations
- pgvector semantic search integration
- Conversation summarization
GitHub:
https://github.com/sujankim/jarvis-ai-platform
If you are a Java developer who has felt left out of the AI revolution, you no longer have to be.
The tools are here.
Your AI. Your Data. Your Machine.
Follow me for Part 3: implementing semantic memory retrieval with pgvector.
Top comments (0)