Sujan Lamichhane

Posted on Jun 11

Jarvis AI Platform: Building Long-Term Memory with pgvector and Spring AI

#ai #java #springboot #opensource

How we're teaching a Java AI assistant to remember you across sessions

What Is Jarvis?

A month ago, I released Jarvis AI Platform — an open-source, local-first AI assistant built entirely with Java and the Spring ecosystem.

Phase 1 (released v0.1.0) gave Jarvis the ability to:

Chat via CLI with streaming token responses
Authenticate with JWT + Argon2id
Persist conversations to PostgreSQL
Fall back from Ollama to Gemini automatically

But Jarvis had one critical limitation: it forgot everything between sessions.

Here is the first part article:

Building a Local-First AI Assistant with Spring Boot 4 and Spring AI 2.0 - DEV Community

Phase 2 changes that.

The Memory Problem

Without memory, every session starts blank.

Session 1

You: "I'm building an AI platform in Java"

Jarvis: "That sounds interesting!"

Session 2 (next day)

You: "How is my project going?"

Jarvis: "I don't know what project you're referring to."

This is the fundamental limitation of LLMs. They have no persistent memory. The AI model itself forgets everything the moment the conversation ends.

Phase 2 solves this by building memory AROUND the AI.

The Architecture

Our memory system has four layers.

Layer 1: Working Memory (Phase 1 — Done)

Fresh context injected into every single prompt:

Current date and time
Username and role
Session ID and active model

This is why Jarvis knows "it's Tuesday" without searching the internet.

Layer 2: Session Memory (Phase 1 — Done)

Conversation history within a single session:

Last 20 messages loaded from PostgreSQL
Cached in Redis for 1ms access (vs 50ms from DB)
Invalidated after each message exchange

Layer 3: Long-Term Memory (Phase 2 — Building)

Facts Jarvis learns about you across ALL sessions:

FACT: "User's name is Dravin"
GOAL: "Building Jarvis AI Platform"
PREFERENCE: "Prefers detailed explanations"
CONTEXT: "Uses Windows 11, 16GB RAM"
EVENT: "Published first article on Dev.to"

Stored in PostgreSQL with importance scores (0.0–1.0).

Higher importance memories are injected into prompts first.

Layer 4: Semantic Memory (Phase 2 — Building)

Meaning-based search using pgvector embeddings.

User asks:

How is my coding project going?

Text search finds:

nothing

Semantic search finds:

User is building Jarvis AI Platform

Even though the words are completely different.

How pgvector Works In Jarvis

pgvector is a PostgreSQL extension that adds vector similarity search.

Here is how we use it.

Step 1: Install pgvector

We build a custom Docker image because the standard PostgreSQL image does not include pgvector.

FROM postgres:16-alpine

RUN apk add --no-cache git build-base clang llvm postgresql-dev

RUN ln -sf "$(which clang)" /usr/local/bin/clang-19

RUN git clone --branch v0.7.4 --depth 1 \
    https://github.com/pgvector/pgvector.git /tmp/pgvector \
    && cd /tmp/pgvector && make && make install

This was one of the hardest parts. Alpine Linux packages clang under different names than PostgreSQL expects.

Step 2: Add Vector Column To Memories

ALTER TABLE memories
ADD COLUMN embedding vector(768);

We use Ollama's nomic-embed-text model which produces 768-dimensional vectors.

Step 3: Semantic Search Function

CREATE FUNCTION search_memories_by_embedding(
    p_user_id UUID,
    p_embedding vector(768),
    p_limit INTEGER DEFAULT 5
)
RETURNS TABLE (content TEXT, similarity FLOAT)
AS $$
    SELECT m.content,
           1 - (m.embedding <=> p_embedding) AS similarity
    FROM memories m
    WHERE m.user_id = p_user_id
      AND m.embedding IS NOT NULL
    ORDER BY m.embedding <=> p_embedding ASC
    LIMIT p_limit;
$$;

The <=> operator computes cosine distance.

Lower distance means higher similarity.

Redis Session Caching

Every chat message used to query PostgreSQL for session history (~50ms).

With Redis caching:

First message: PostgreSQL (~50ms) then cached in Redis
Subsequent messages: Redis HIT (~1ms)
Cache format: JSON Lines with role, ID, content, error flag, timestamp

We learned the hard way that caching full R2DBC record objects causes Jackson deserialization issues.

Our solution was to serialize only the essential fields.

What Is Left To Build

Phase 2 is approximately 40% complete.

Remaining:

Memory entity + repository
MemoryService CRUD operations
Memory extraction from conversations
Memory injection into prompts
Semantic memory search
CLI memory commands
Conversation summarization

All of these are open issues on GitHub and we are actively looking for contributors.

What We Learned

1. Building pgvector in Docker is harder than expected

PostgreSQL on Alpine uses clang-19 for LLVM/JIT support, but Alpine only provides newer versions.

We had to create compatibility symlinks for the PostgreSQL build process.

2. WebFlux + R2DBC + Redis requires careful architecture

You cannot mix blocking and reactive code carelessly.

Our rule:

R2DBC for application queries
JDBC for Flyway migrations and pgvector setup
Redis via ReactiveRedisTemplate
Never call .block() outside the CLI layer

3. Spring AI 2.0 is ready for production

We have been on Spring AI M8 since day one.

The ChatClient API is clean, provider abstraction works well, and streaming is reliable.

Java developers no longer need Python for AI applications.

Contributing

Jarvis is open source (Apache 2.0) and actively seeking contributors.

Good First Issues

Add memory list CLI command
Add memory REST API endpoints
Write unit tests for MemoryService

Advanced Issues

Memory extraction from conversations
pgvector semantic search integration
Conversation summarization

GitHub:

https://github.com/sujankim/jarvis-ai-platform

If you are a Java developer who has felt left out of the AI revolution, you no longer have to be.

The tools are here.

Your AI. Your Data. Your Machine.

Follow me for Part 3: implementing semantic memory retrieval with pgvector.