This post was originally published on jamjet.dev.
TL;DR
If you're building AI agents in Java today, your options for persistent memory range from "store the last 20 chat messages in Postgres" to "run a Python service in a sidecar container and call it over HTTP." There is no Java-native equivalent to Mem0, Zep, or Letta — the libraries Python developers reach for when they need real memory.
This post is a tour of every option a Java developer has in April 2026, why most of them stop at chat history, what "real memory" should actually mean, and one library we shipped to fill the gap.
The scenario every Java AI developer recognises
You're building an AI agent in Spring Boot. Maybe it's a customer support copilot, maybe it's a coding assistant, maybe it's a research agent. You wire up Spring AI or LangChain4j, write a few tools, and the first conversation works.
Then your user comes back the next day. The agent doesn't remember them. It doesn't remember they're allergic to peanuts. It doesn't remember they're working on the Acme migration. It doesn't remember they prefer verbose explanations. Every conversation starts from zero.
You search for "Java AI agent memory" and end up with three kinds of results:
- Tutorials on how to store chat messages in Postgres
- Marketing pages for Mem0 and Zep — Python only
- GitHub issues asking why there's no Java SDK
This is the gap.
"Memory" means three different things
Before we tour the libraries, we need to be precise. There are at least three different things people mean when they say "agent memory":
1. Conversation history. The last N messages of the current session. Solved problem — every framework ships this.
2. State checkpointing. Snapshots of agent execution state for resume and replay. Solved by LangGraph, Koog persistence, Temporal-style runtimes.
3. Long-term knowledge memory. Facts about the user, their preferences, their projects, their history — extracted from conversations, stored durably, retrievable across sessions, and de-conflicted when they change. This is what Mem0 and Zep do. It is not solved on the JVM.
The rest of this post is about the third one.
What real memory needs
- Fact extraction. An LLM reads a conversation and pulls out discrete, atomic facts.
- Conflict detection. When a new fact contradicts an old one, the system invalidates the old fact.
- Hybrid retrieval. Vector + keyword + graph walk fused together.
- Temporal reasoning. Facts have validity windows.
- Token-budgeted context assembly. Pick which facts go in the prompt and respect the budget.
- Decay and consolidation. Stale facts fade, frequent facts get promoted, duplicates merge.
Tour: every option Java developers have today
LangChain4j ChatMemory
Most popular JVM AI framework. Ships ChatMemory interface with MessageWindowChatMemory and TokenWindowChatMemory. Persistence via developer-implemented ChatMemoryStore.
What it does: stores message objects, respects token/count limits. What it does not do: extract facts, deduplicate, retrieve semantically, reason about time. The docs are explicit — ChatMemory is a container abstraction.
Spring AI ChatMemory
Shipped GA in 2025 with broad backend support: JDBC, Cassandra, Mongo, Neo4j, Cosmos DB. Three advisors plug it into ChatClient.
The VectorStoreChatMemoryAdvisor is the closest thing to "semantic memory" — it indexes raw messages in your VectorStore. But it indexes raw messages, not extracted facts. No entity model, no relationship graph, no conflict detection.
Google ADK for Java
Ships 1.0.0 with two memory implementations: InMemoryMemoryService (keyword matching only) and VertexAiMemoryBankService (Vertex AI only). Memory Bank is excellent but Google Cloud-locked.
Koog (JetBrains)
Kotlin-first framework with AgentMemory storing facts by Concept, Subject, Scope. Closest competitor on the "facts about subjects" axis.
Two caveats: Java consumption is awkward, and GitHub issue JetBrains/koog#1001 documents that AgentMemory floods prompts as facts accumulate — no token budgeting.
Embabel
Rod Johnson's JVM agent framework. Uses a blackboard pattern — shared state per agent run.
Per the maintainers: "in Embabel it's not about conversational memory so much as domain objects that are stored in the blackboard during the flow." Long-term memory is an explicit non-goal.
Mem0 Java SDK (the one that doesn't exist)
The top Google result is me.pgthinker:mem0-client-java, a community wrapper at version 0.1.3, last updated nine months ago, with 9 GitHub stars. It's a thin REST client requiring a Python Mem0 server alongside your JVM app.
No official Mem0 Java client exists. Python and Node.js only.
Zep Java SDK (also doesn't exist)
Zep's official clients are Python, TypeScript, and Go. No Java SDK.
DIY (what most teams actually do)
When Java teams need real memory today, they assemble:
- Postgres + pgvector (or Qdrant) for embeddings
-
JdbcChatMemoryRepositoryfor messages - Custom advisor that calls an LLM to extract facts
- Custom retrieval layer combining vector and keyword search
- Nightly cron job for decay and dedup
- Custom token-budgeting in the prompt builder
Roughly 1,500–3,000 lines of bespoke Java per team. Quietly diverges between projects. Rarely gets temporal reasoning right. Almost never gets consolidation right.
The pattern
Every Java memory option lives in one of two boxes:
- Chat history persistence (LangChain4j, Spring AI core, Embabel)
- State checkpointing (LangGraph4j, Koog persistence)
Nothing in between. No JVM-native library that does fact extraction + conflict resolution + temporal graph + hybrid retrieval + consolidation in one dependency.
The Python ecosystem has had Mem0 since 2024 and Zep/Graphiti since early 2025. The Java ecosystem is roughly 18 months behind.
What we built
I run JamJet. As we built our agent runtime, the memory gap kept showing up. So we built a memory layer.
Engram is a durable memory system that does the things on the list:
- Fact extraction from conversation messages via LLM
- Conflict detection — vector similarity threshold plus LLM resolution
- Hybrid retrieval — vector + SQLite FTS5 keyword + graph walk
- Temporal knowledge graph with validity windows
- Token-budgeted context assembly with three output formats
- 5-operation consolidation engine: decay, promote, dedup, summarize, reflect
- MCP server option
Runs against SQLite by default. No Postgres, no Qdrant, no Neo4j, no Python sidecar.
import dev.jamjet.engram.EngramClient;
import dev.jamjet.engram.EngramConfig;
import java.util.List;
import java.util.Map;
try (var memory = new EngramClient(EngramConfig.defaults())) {
memory.add(
List.of(
Map.of("role", "user", "content", "I'm allergic to peanuts and live in Austin"),
Map.of("role", "assistant", "content", "Got it, I'll remember that.")
),
"alice", null, null
);
var context = memory.context(
"what should I cook for dinner",
"alice", null, 1000, "system_prompt"
);
System.out.println(context.get("text"));
}
Maven Central:
<dependency>
<groupId>dev.jamjet</groupId>
<artifactId>jamjet-sdk</artifactId>
<version>0.4.3</version>
</dependency>
Apache 2.0. Rust runtime published as jamjet-engram on crates.io.
What it doesn't do (yet)
- No Spring Boot auto-configuration yet (starter on roadmap)
- No JDBC backend (SQLite-first, Postgres in 0.5.x)
- No managed cloud option
- No published LongMemEval / DMR scores yet (benchmarks running, not going to cherry-pick)
Try it
<dependency>
<groupId>dev.jamjet</groupId>
<artifactId>jamjet-sdk</artifactId>
<version>0.4.3</version>
</dependency>
Or run it as an MCP server:
cargo install jamjet-engram-server
engram serve --db memory.db
GitHub: github.com/jamjet-labs/jamjet
If you've been quietly rolling your own memory layer in Java, I'd love to hear what you ended up with. Reach out via GitHub issues or the comments below.
Top comments (0)