Sunil Prakash

Posted on Apr 7

The State of Memory in Java AI Agents (April 2026)

#ai #java #rust #opensource

This post was originally published on jamjet.dev.

TL;DR

If you're building AI agents in Java today, your options for persistent memory range from "store the last 20 chat messages in Postgres" to "run a Python service in a sidecar container and call it over HTTP." There is no Java-native equivalent to Mem0, Zep, or Letta — the libraries Python developers reach for when they need real memory.

This post is a tour of every option a Java developer has in April 2026, why most of them stop at chat history, what "real memory" should actually mean, and one library we shipped to fill the gap.

The scenario every Java AI developer recognises

You're building an AI agent in Spring Boot. Maybe it's a customer support copilot, maybe it's a coding assistant, maybe it's a research agent. You wire up Spring AI or LangChain4j, write a few tools, and the first conversation works.

Then your user comes back the next day. The agent doesn't remember them. It doesn't remember they're allergic to peanuts. It doesn't remember they're working on the Acme migration. It doesn't remember they prefer verbose explanations. Every conversation starts from zero.

You search for "Java AI agent memory" and end up with three kinds of results:

Tutorials on how to store chat messages in Postgres
Marketing pages for Mem0 and Zep — Python only
GitHub issues asking why there's no Java SDK

This is the gap.

"Memory" means three different things

Before we tour the libraries, we need to be precise. There are at least three different things people mean when they say "agent memory":

1. Conversation history. The last N messages of the current session. Solved problem — every framework ships this.

2. State checkpointing. Snapshots of agent execution state for resume and replay. Solved by LangGraph, Koog persistence, Temporal-style runtimes.

3. Long-term knowledge memory. Facts about the user, their preferences, their projects, their history — extracted from conversations, stored durably, retrievable across sessions, and de-conflicted when they change. This is what Mem0 and Zep do. It is not solved on the JVM.

The rest of this post is about the third one.

What real memory needs

Fact extraction. An LLM reads a conversation and pulls out discrete, atomic facts.
Conflict detection. When a new fact contradicts an old one, the system invalidates the old fact.
Hybrid retrieval. Vector + keyword + graph walk fused together.
Temporal reasoning. Facts have validity windows.
Token-budgeted context assembly. Pick which facts go in the prompt and respect the budget.
Decay and consolidation. Stale facts fade, frequent facts get promoted, duplicates merge.

Tour: every option Java developers have today

LangChain4j ChatMemory

Most popular JVM AI framework. Ships ChatMemory interface with MessageWindowChatMemory and TokenWindowChatMemory. Persistence via developer-implemented ChatMemoryStore.

What it does: stores message objects, respects token/count limits. What it does not do: extract facts, deduplicate, retrieve semantically, reason about time. The docs are explicit — ChatMemory is a container abstraction.

Spring AI ChatMemory

Shipped GA in 2025 with broad backend support: JDBC, Cassandra, Mongo, Neo4j, Cosmos DB. Three advisors plug it into ChatClient.

The VectorStoreChatMemoryAdvisor is the closest thing to "semantic memory" — it indexes raw messages in your VectorStore. But it indexes raw messages, not extracted facts. No entity model, no relationship graph, no conflict detection.

Google ADK for Java

Ships 1.0.0 with two memory implementations: InMemoryMemoryService (keyword matching only) and VertexAiMemoryBankService (Vertex AI only). Memory Bank is excellent but Google Cloud-locked.

Koog (JetBrains)

Kotlin-first framework with AgentMemory storing facts by Concept, Subject, Scope. Closest competitor on the "facts about subjects" axis.

Two caveats: Java consumption is awkward, and GitHub issue JetBrains/koog#1001 documents that AgentMemory floods prompts as facts accumulate — no token budgeting.

Embabel

Rod Johnson's JVM agent framework. Uses a blackboard pattern — shared state per agent run.

Per the maintainers: "in Embabel it's not about conversational memory so much as domain objects that are stored in the blackboard during the flow." Long-term memory is an explicit non-goal.

Mem0 Java SDK (the one that doesn't exist)

The top Google result is me.pgthinker:mem0-client-java, a community wrapper at version 0.1.3, last updated nine months ago, with 9 GitHub stars. It's a thin REST client requiring a Python Mem0 server alongside your JVM app.

No official Mem0 Java client exists. Python and Node.js only.

Zep Java SDK (also doesn't exist)

Zep's official clients are Python, TypeScript, and Go. No Java SDK.

DIY (what most teams actually do)

When Java teams need real memory today, they assemble:

Postgres + pgvector (or Qdrant) for embeddings
JdbcChatMemoryRepository for messages
Custom advisor that calls an LLM to extract facts
Custom retrieval layer combining vector and keyword search
Nightly cron job for decay and dedup
Custom token-budgeting in the prompt builder

Roughly 1,500–3,000 lines of bespoke Java per team. Quietly diverges between projects. Rarely gets temporal reasoning right. Almost never gets consolidation right.

The pattern

Every Java memory option lives in one of two boxes:

Chat history persistence (LangChain4j, Spring AI core, Embabel)
State checkpointing (LangGraph4j, Koog persistence)

Nothing in between. No JVM-native library that does fact extraction + conflict resolution + temporal graph + hybrid retrieval + consolidation in one dependency.

The Python ecosystem has had Mem0 since 2024 and Zep/Graphiti since early 2025. The Java ecosystem is roughly 18 months behind.

What we built

I run JamJet. As we built our agent runtime, the memory gap kept showing up. So we built a memory layer.

Engram is a durable memory system that does the things on the list:

Fact extraction from conversation messages via LLM
Conflict detection — vector similarity threshold plus LLM resolution
Hybrid retrieval — vector + SQLite FTS5 keyword + graph walk
Temporal knowledge graph with validity windows
Token-budgeted context assembly with three output formats
5-operation consolidation engine: decay, promote, dedup, summarize, reflect
MCP server option

Runs against SQLite by default. No Postgres, no Qdrant, no Neo4j, no Python sidecar.

import dev.jamjet.engram.EngramClient;
import dev.jamjet.engram.EngramConfig;

import java.util.List;
import java.util.Map;

try (var memory = new EngramClient(EngramConfig.defaults())) {
    memory.add(
        List.of(
            Map.of("role", "user",      "content", "I'm allergic to peanuts and live in Austin"),
            Map.of("role", "assistant", "content", "Got it, I'll remember that.")
        ),
        "alice", null, null
    );

    var context = memory.context(
        "what should I cook for dinner",
        "alice", null, 1000, "system_prompt"
    );

    System.out.println(context.get("text"));
}

Maven Central:

<dependency>
    <groupId>dev.jamjet</groupId>
    <artifactId>jamjet-sdk</artifactId>
    <version>0.4.3</version>
</dependency>

Apache 2.0. Rust runtime published as jamjet-engram on crates.io.

What it doesn't do (yet)

No Spring Boot auto-configuration yet (starter on roadmap)
No JDBC backend (SQLite-first, Postgres in 0.5.x)
No managed cloud option
No published LongMemEval / DMR scores yet (benchmarks running, not going to cherry-pick)

Try it

<dependency>
    <groupId>dev.jamjet</groupId>
    <artifactId>jamjet-sdk</artifactId>
    <version>0.4.3</version>
</dependency>

Or run it as an MCP server:

cargo install jamjet-engram-server
engram serve --db memory.db

GitHub: github.com/jamjet-labs/jamjet

If you've been quietly rolling your own memory layer in Java, I'd love to hear what you ended up with. Reach out via GitHub issues or the comments below.

Top comments (3)

Vadim Briliantov • Apr 23

Hi, thanks for the thorough overview — this is a really useful breakdown of the current landscape.

Quick question on Koog: you mention that “Java consumption is awkward.” That was definitely true pre–0.7, but about a month ago we introduced Koog for Java (presented at JavaOne), which provides a native Java API specifically to address that gap.

Have you had a chance to look at it?
blog.jetbrains.com/ai/2026/03/koog...

Curious whether your assessment would change based on that, especially in the context of JVM-native options.

Also, small note on AgentMemory: we’re planning to deprecate it. It was an early experiment, and Koog now has a more conventional split:

ChatMemory (conversation history)
LongTermMemory (vector-based retrieval)
Persistence (state + checkpoints)

Plus advanced History Compression for reducing token usage while preserving key facts.

Would be great to hear how you think this updated model compares to the “real memory” definition you outlined.

Sunil Prakash • May 12

@vadim_briliantov_3670bd6a , thanks for the thoughtful pushback. I went through the Koog-for-Java post and the Koog docs on Agent Memory, persistence, and history compression before replying.

I think two things are worth separating.

Koog-specific. You're right that my "JVM consumption is awkward" line is stale, and the AgentMemory reference needs updating. I'll revise both.

Also, credit where due: nodeSaveToMemoryAutoDetectFacts is real LLM-driven fact extraction from agent history, and that is more advanced than I gave Koog credit for in the original post. The public docs describe it as detecting concepts and saving facts from history, which is clearly beyond simple append-only chat memory.

Ecosystem-level. My broader point was about Java/JVM AI agent memory as an ecosystem. Koog improves the picture, especially because it is JVM-native and provides idiomatic Kotlin and Java APIs. But I still don't think the JVM ecosystem has first-class coverage for the part where systems like Mem0, Zep, and Engram spend much of their complexity budget: conflict resolution when a newer fact supersedes an older one, cross-session consolidation, temporal memory behavior, and benchmarked recall quality.

From the public Koog memory docs, I can see fact extraction, loading, storage, encryption, and integration into agent strategies. What I could not find is a public supersession or consolidation layer: something that says "this later fact replaces that earlier fact," reconciles contradictions across sessions, or reports benchmarked recall quality. Is that layer present internally or planned?

For a concrete contrast: Engram exposes supersede(old_id, new_id) as an explicit primitive, combines vector retrieval, full-text search, temporal ranking, and cross-encoder reranking, and publishes LongMemEval-S results with per-category numbers. That does not mean Engram has solved memory either. Real long-term consolidation is still an open problem. But the memory semantics are more explicit.

Sunil Prakash • May 12

Also updated at java-ai-memory.dev