Here's a problem every developer building AI agents has hit: your agent is smart for exactly one session. Close the chat, come back tomorrow, and it has no idea who you are, what you were working on, or what it already told you.
The standard fix is to dump the chat history back into the context window. That works until it doesn't — context windows fill up, latency spikes, costs balloon, and the agent still can't reason about when things happened or what changed.
Graphiti takes a fundamentally different approach. Instead of stuffing raw transcripts into a context window, it builds a temporal knowledge graph that tracks entities, relationships, and facts — including when those facts became true and when they were superseded.
What Graphiti Actually Is
Graphiti is an open-source framework by Zep for building and querying temporal context graphs for AI agents. It's the engine behind Zep's managed memory platform, but it's fully usable standalone.
The core idea: instead of treating memory as "a big pile of text the agent can search," Graphiti structures memory as a graph of entities (people, products, concepts), relationships between them, and facts with explicit time validity.
A fact in Graphiti looks like: "Kendra prefers Adidas shoes (as of March 2026)." If she switches to Nike in June, the old fact gets invalidated — not deleted — and the new one takes its place. Both are queryable. You can ask "what does Kendra prefer now?" and "what did Kendra prefer in March?"
This is what "temporal" means in practice. Every piece of information has a timeline.
Why This Matters More Than You Think
If you've built agents that run for more than a few turns, you've hit at least one of these:
The context window tax. Shoving full chat histories into the context window is the brute-force approach to agent memory. It works for short conversations, but at 115K tokens, you're looking at 30-second response times and massive API bills. Zep's benchmarks show their graph-based approach uses ~1.6K tokens for the same queries — roughly 2% of the baseline — with 90% lower latency.
The "it forgot" problem. Without structured memory, agents can't track state changes. If a user updates their preference, the old preference is still sitting somewhere in the transcript. The agent might retrieve the stale one. Graphiti's temporal invalidation handles this automatically — old facts are marked as superseded, not just buried under newer text.
The temporal reasoning gap. "Which happened first — when I updated the config or when the build broke?" Standard RAG can't answer this reliably. It retrieves text chunks by semantic similarity, not chronological order. Graphiti's bi-temporal tracking makes time-based queries first-class operations.
How It Works Under the Hood
Graphiti's architecture has three core layers:
1. Episodes (the raw data)
Everything that goes into Graphiti starts as an episode — a chunk of raw data, whether it's a chat message, a JSON document, or unstructured text. Episodes are the ground truth. Every derived fact traces back to the episode that produced it.
2. Entities and Relationships (the graph)
From episodes, Graphiti extracts entities (nodes) and relationships (edges). An LLM processes the raw data and identifies who/what is involved and how they relate to each other.
The interesting part: you can define your own entity and edge types upfront using Pydantic models (prescribed ontology), or let Graphiti discover structure from your data (learned ontology). Start simple, add structure as patterns emerge.
3. Temporal Validity (what makes it different)
Every fact in the graph carries a validity window. When new information contradicts an existing fact, the old fact gets an end timestamp. It's not deleted — it's invalidated. This means you can query the graph at any point in time and get the state of the world as it was then.
This is a fundamentally different model from vector-based RAG, where you're just doing similarity search over chunks with no concept of "this information replaced that information."
Graphiti vs. GraphRAG vs. Standard RAG
| Aspect | Standard RAG | GraphRAG | Graphiti |
|---|---|---|---|
| Data updates | Batch reindex | Batch recompute | Incremental, real-time |
| Time handling | None | Basic timestamps | Bi-temporal with auto-invalidation |
| Contradictions | Retrieves both old and new | LLM summarization | Automatic invalidation, history preserved |
| Retrieval | Vector similarity | LLM summarization chains | Hybrid: semantic + keyword + graph traversal |
| Query latency | Sub-second | Seconds to tens of seconds | Sub-second |
| Schema | None | Fixed clusters | Custom Pydantic models or auto-discovered |
The practical difference: GraphRAG was designed for static document corpora. It's great for summarizing a fixed set of documents. Graphiti was designed for data that changes — user preferences, business state, ongoing conversations, real-world events.
Getting Started (It's Simpler Than You'd Expect)
pip install graphiti-core
You need a graph database (Neo4j, FalkorDB, or Amazon Neptune) and an LLM API key (OpenAI, Anthropic, or Gemini all work).
The fastest way to try it:
# Start FalkorDB locally
docker run -p 6379:6379 -p 3000:3000 -it --rm falkordb/falkordb:latest
from graphiti_core import Graphiti
from graphiti_core.driver.falkordb_driver import FalkorDriver
driver = FalkorDriver(host="localhost", port=6379)
graphiti = Graphiti(graph_driver=driver)
await graphiti.build_indices_and_constraints()
# Add an episode (raw data)
await graphiti.add_episode(
name="user_chat",
episode_body="User said they switched from VS Code to Cursor last week and love the AI integration.",
source_description="chat_message"
)
# Search the graph
results = await graphiti.search("What IDE does the user prefer?")
That's it. Graphiti handles entity extraction, relationship mapping, and temporal tracking automatically. You add data, you search — the graph builds itself.
The MCP Server (This Is Where It Gets Interesting)
Graphiti ships with an MCP server that lets Claude, Cursor, and other MCP-compatible tools use Graphiti as a memory backend directly. Deploy it with Docker and Neo4j, and your AI assistant gets persistent, temporally-aware memory without writing any custom memory management code.
This is relevant if you're building agent workflows that span multiple sessions. Instead of hacking together file-based memory or hoping the context window holds everything, you get structured, queryable memory with time awareness built in.
The Benchmarks Tell a Clear Story
Zep (powered by Graphiti) scored 94.8% on Deep Memory Retrieval versus MemGPT's 93.4%. More impressively, on LongMemEval — a much harder benchmark with 500 human-curated temporal reasoning questions — Zep hit 63.8% accuracy versus the full-context baseline's 55.4% with GPT-4o-mini. And it did it with 2% of the tokens and 90% less latency.
The full-context approach (dumping everything into the context window) scored 60.2% with GPT-4o on the same benchmark. Zep scored 71.2%. That's not a marginal improvement — that's the difference between an agent that sort of remembers and one that actually tracks what happened when.
When You Should (and Shouldn't) Use This
Use Graphiti when:
- Your agent needs to remember things across sessions
- Facts change over time and you need to track what changed
- You're building personalized experiences where user state matters
- You need temporal reasoning ("what happened before X?")
- Context window costs are becoming a problem
Stick with standard RAG when:
- You're searching a static document corpus
- Your data doesn't change frequently
- You don't need time-based reasoning
- Simple semantic search is good enough for your use case
The Bigger Picture
The AI agent memory space is heating up. Mem0, Letta (MemGPT), and Zep/Graphiti are all attacking the problem from different angles. Anthropic shipped persistent memory for Claude Managed Agents in April 2026. The industry has collectively realized that stateless agents are a dead end for anything beyond simple Q&A.
Graphiti's bet is that graph-based temporal reasoning will outperform flat memory approaches as agent tasks get more complex. The benchmarks support this — especially for temporal reasoning tasks where standard approaches fall apart.
The framework is open source, actively maintained, and the community is growing. If you're building agents that need to remember things and reason about time, it's worth spending an afternoon with.
Have you tried Graphiti or any other agent memory framework? What's your current approach to handling memory across sessions? Would love to hear what's working (and what isn't) in production.
Top comments (0)