DEV Community: Gregory Dickson

MemoryGraph vs Graphiti: Choosing the Right Memory for Your AI Agent

Gregory Dickson — Fri, 26 Dec 2025 13:56:17 +0000

When general-purpose memory meets coding-specific memory

December 2025 - Gregory Dickson

You've decided your AI agent needs persistent memory. Context loss between sessions is one of the biggest friction points in AI-assisted development.

Now you're comparing options. If you've done any research, you've probably found Graphiti. With 21,000+ GitHub stars, Y Combinator backing, and a peer-reviewed architecture paper, it's the category leader in AI agent memory.

So why would you consider anything else?

Because the best tool depends on what you're building.

This post offers an honest comparison to help you choose. We built MemoryGraph, so we're biased. But we'll be fair about where Graphiti excels and where we think MemoryGraph is the better fit.

The TL;DR

If You're Building...	Consider...
A general AI agent (customer service, personal assistant, enterprise bot)	Graphiti
A coding agent (Claude Code, Cursor, Aider, Continue)	MemoryGraph
An agent that needs temporal queries across all entity types	Graphiti
An agent that needs to know what solved what, what caused what	MemoryGraph
Production infrastructure with Neo4j/FalkorDB already deployed	Graphiti
Zero-infrastructure local development	MemoryGraph

If you're building a coding agent and want to get started in 60 seconds without infrastructure, MemoryGraph is purpose-built for you. If you're building a general-purpose agent and have database infrastructure, Graphiti is excellent.

What They Have in Common

Both MemoryGraph and Graphiti are:

Graph-based: not flat vector stores
MCP-compatible: work with Claude Desktop, Cursor, and other MCP clients
Apache 2.0 licensed: open source, enterprise-friendly
Python-native: built for the AI/ML ecosystem
Relationship-aware: store entities and their connections

Both emerged from the same insight: vector similarity alone isn't enough for agent memory. When you ask "What did we decide last week?" or "What caused this bug?", you need relationships and temporal context, not just embedding similarity.

Where They Differ

1. Target Use Case

Graphiti is designed for any AI agent. Their tagline is "Build Real-Time Knowledge Graphs for AI Agents." The examples in their docs include things like:

"Kendra loves Adidas shoes"
Customer preferences across sessions
Business entity relationships
User interaction history

This generality means Graphiti can model any domain.

MemoryGraph is designed specifically for coding agents. Every feature is optimized for software development workflows:

12 memory types built for code (solution, problem, error, fix, code_pattern, etc.)
35+ relationship types for development (SOLVES, CAUSES, DEPENDS_ON, IMPROVES, etc.)
Integration patterns for Claude Code, Cursor, Aider, Continue

This specificity means less configuration for coding use cases.

2. Relationship Model

Graphiti uses a flexible triplet model where you define your own ontology:

# Graphiti: Define custom entity and edge types
class Person(EntityNode):
    name: str

class Product(EntityNode):
    name: str

class Loves(EntityEdge):
    strength: float

This flexibility enables custom ontologies for any domain, but requires upfront design work.

MemoryGraph provides 35+ pre-defined relationship types organized into 7 categories:

# MemoryGraph: Use built-in coding relationships
{
    "tool": "create_relationship",
    "from_memory_id": "solution_123",
    "to_memory_id": "problem_456",
    "relationship_type": "SOLVES"  # One of 35+ built-in types
}

The categories:

Causal: CAUSES, TRIGGERS, LEADS_TO, PREVENTS, BREAKS
Solution: SOLVES, ADDRESSES, ALTERNATIVE_TO, IMPROVES, REPLACES
Context: OCCURS_IN, APPLIES_TO, WORKS_WITH, REQUIRES, USED_IN
Learning: BUILDS_ON, CONTRADICTS, CONFIRMS, GENERALIZES, SPECIALIZES
Similarity: SIMILAR_TO, VARIANT_OF, RELATED_TO, ANALOGY_TO, OPPOSITE_OF
Workflow: FOLLOWS, DEPENDS_ON, ENABLES, BLOCKS, PARALLEL_TO
Quality: EFFECTIVE_FOR, INEFFECTIVE_FOR, PREFERRED_OVER, DEPRECATED_BY

For coding agents, these relationships are immediately useful without ontology design.

3. Entity Extraction

Graphiti uses LLM-powered entity extraction. When you add an episode (a piece of text), it automatically extracts entities and relationships:

# Graphiti: Automatic extraction
await graphiti.add_episode(
    name="user_message",
    episode_body="I fixed the timeout by adding retry logic with exponential backoff",
    source=EpisodeType.message
)
# LLM extracts: entities, relationships, timestamps

This eliminates manual data structuring, but adds latency (LLM calls) and cost (tokens).

MemoryGraph uses explicit storage. You decide what to store:

# MemoryGraph: Explicit storage
{
    "tool": "store_memory",
    "type": "solution",
    "title": "Fixed timeout with retry logic",
    "content": "Added exponential backoff with max 3 retries...",
    "tags": ["timeout", "retry", "exponential-backoff"]
}

This gives you control over exactly what's stored, with no LLM extraction overhead. The tradeoff is that your agent (or you) must explicitly store memories.

Architecture comparison:

Graphiti (automatic extraction):
┌─────────┐    LLM     ┌──────────┐   Neo4j   ┌───────────┐
│ Episode │ ─────────▶ │ Entities │ ────────▶ │ Knowledge │
│  (text) │  Extract   │ + Edges  │   Store   │   Graph   │
└─────────┘  500ms-2s  └──────────┘           └───────────┘

MemoryGraph (explicit storage):
┌────────┐   Direct    ┌────────────┐
│ Memory │ ──────────▶ │ SQLite/Neo │   No LLM required
└────────┘    <5ms     └────────────┘
     │
     ▼ (explicit)
┌──────────────┐
│ Relationship │   You control what's linked
└──────────────┘

The extraction trade-off:

Aspect	Graphiti (automatic)	MemoryGraph (explicit)
Cognitive load	Lower: just feed it text	Higher: you decide what to store
Relationship discovery	May find implicit connections	Only what you specify
Storage latency	500ms-2s (LLM call)	<5ms (direct write)
Cost per memory	$0.003-$0.01 (token cost)	$0 (no LLM)
Extraction quality	Depends on model/prompts	Deterministic

4. Infrastructure Requirements

Graphiti requires a graph database:

# Graphiti setup
docker run neo4j...              # Or FalkorDB, Kuzu, Neptune
export NEO4J_URI=bolt://localhost:7687
export NEO4J_PASSWORD=...
export OPENAI_API_KEY=...        # Required for entity extraction
pip install graphiti-core[neo4j]

This is appropriate for production systems. But it's friction for getting started.

MemoryGraph defaults to SQLite with zero configuration:

# MemoryGraph setup
pipx install memorygraphMCP
claude mcp add --scope user memorygraph -- memorygraph
# Done. Database created automatically.

You can upgrade to Neo4j, FalkorDB, or cloud sync later. But the default works immediately.

5. Temporal Model

Graphiti has a sophisticated bi-temporal model:

Valid time: When the fact was true in the real world
Transaction time: When the fact was recorded

This enables queries like "What did we know about X as of March 2024?" and handles contradictions by invalidating old edges rather than deleting them.

MemoryGraph also supports bi-temporal tracking (added in v0.10.0, inspired by Graphiti):

# MemoryGraph temporal queries
march_2024 = datetime(2024, 3, 1, tzinfo=timezone.utc)
solutions = await db.get_related_memories("error_id", as_of=march_2024)
changes = await db.what_changed(since=one_week_ago)

Both handle temporal queries well. Graphiti's bi-temporal model is more sophisticated, tracking validity intervals on every edge. MemoryGraph's temporal support (added in v0.10.0) covers the common cases: point-in-time queries and change tracking.

6. Query Model

Both systems are "graph-based" but query differently:

Graphiti uses hybrid retrieval (from the arXiv paper):

Semantic similarity search (embeddings)
BM25 full-text search (Lucene via Neo4j)
Breadth-first graph traversal from seed nodes

MemoryGraph uses:

FTS5 full-text search (SQLite) or native graph queries (Neo4j/FalkorDB)
Tag-based filtering with exact match
Typed relationship traversal with configurable depth
Three search tolerance modes: strict, normal (stemming), fuzzy (typo-tolerant)

Graphiti's hybrid approach excels at finding semantically related content across large, unstructured graphs. MemoryGraph's typed traversal excels at answering specific questions like "what solved this error?" or "what depends on this component?"

Practical Comparison: A Debugging Workflow

Here's how each tool handles a common coding scenario.

The Scenario

You're debugging a Redis timeout issue. Over several sessions, you:

Encounter the error
Try a fix (increase timeout), doesn't work
Try another fix (add retry logic), causes memory leak
Find the root cause (connection pool exhaustion)
Implement the real fix (increase pool size)

With Graphiti

# Session 1: Encounter error
await graphiti.add_episode(
    name="debug_session",
    episode_body="Got RedisTimeoutError after 30 seconds. Stack trace shows connection.execute() hanging.",
    source=EpisodeType.message
)

# Session 2: Try timeout fix
await graphiti.add_episode(
    name="debug_session", 
    episode_body="Increased Redis timeout to 60s. Still getting timeouts under load.",
    source=EpisodeType.message
)

# Session 3: Try retry logic
await graphiti.add_episode(
    name="debug_session",
    episode_body="Added retry logic with exponential backoff. Now seeing memory growth - possible leak.",
    source=EpisodeType.message
)

# ... and so on

# Later: Query what happened
results = await graphiti.search("Redis timeout fixes")

Graphiti's LLM extraction will create entities and relationships from this text. The quality depends on the extraction prompts and model.

With MemoryGraph

# Session 1: Store the error
error = store_memory(
    type="error",
    title="RedisTimeoutError under load",
    content="Connection.execute() hangs after 30s under concurrent requests",
    tags=["redis", "timeout", "production"]
)

# Session 2: Store failed attempt
attempt1 = store_memory(
    type="solution",
    title="Increased Redis timeout to 60s",
    content="Changed timeout config. Still fails under load - not the root cause.",
    tags=["redis", "timeout", "failed"]
)
create_relationship(attempt1, error, "ADDRESSES")  # Attempted to address
create_relationship(attempt1, error, "INEFFECTIVE_FOR")  # But didn't work

# Session 3: Store attempt that caused new problem
attempt2 = store_memory(
    type="solution",
    title="Added retry with exponential backoff",
    content="Implemented retry logic. Works for timeout but causes memory growth.",
    tags=["redis", "retry", "partial-fix"]
)
leak = store_memory(
    type="problem",
    title="Memory leak from retry logic",
    content="Each retry holds connection reference, causing memory growth under load.",
    tags=["redis", "memory-leak"]
)
create_relationship(attempt2, error, "ADDRESSES")
create_relationship(attempt2, leak, "CAUSES")  # This fix caused a new problem

# Session 4: Find root cause and real fix
root_cause = store_memory(
    type="problem", 
    title="Redis connection pool exhaustion",
    content="Default pool size of 10 is exhausted under load, causing queued connections to timeout.",
    tags=["redis", "connection-pool", "root-cause"]
)
real_fix = store_memory(
    type="solution",
    title="Increased Redis connection pool to 50",
    content="Set REDIS_POOL_SIZE=50. Handles concurrent load without timeouts or retries.",
    tags=["redis", "connection-pool", "fix"]
)
create_relationship(root_cause, error, "CAUSES")
create_relationship(real_fix, root_cause, "SOLVES")
create_relationship(real_fix, attempt1, "IMPROVES")
create_relationship(real_fix, attempt2, "REPLACES")

# Later: Query the full picture
recall_memories("redis timeout")

The result is a queryable graph:

[pool_exhaustion] ──CAUSES──▶ [timeout_error]
       │                            ▲
       │                            │
       ▼                    ┌───────┴───────┐
[real_fix: pool=50]         │               │
       │              [attempt1: 60s]  [attempt2: retry]
       │                    │               │
       ├──IMPROVES─────────▶│               │
       │                    │               ▼
       └──REPLACES─────────────────────▶[memory_leak]
                                            ▲
                                            │
                              [attempt2]──CAUSES──┘

When you ask "What happened with Redis?" six months later, MemoryGraph returns this entire causal chain, including what didn't work and why.

Decision Framework

Choose Graphiti If:

✅ You're building a general-purpose AI agent (not specifically for coding)

✅ You want automatic entity extraction from unstructured text

✅ You need sophisticated temporal queries across arbitrary entity types

✅ You already have Neo4j, FalkorDB, or similar infrastructure

✅ You want a commercial platform with support (Zep Cloud)

✅ You're okay with LLM costs for entity extraction

Choose MemoryGraph If:

✅ You're building with Claude Code, Cursor, Aider, or Continue

✅ You want coding-specific relationships out of the box (SOLVES, CAUSES, DEPENDS_ON)

✅ You want zero infrastructure: SQLite default, upgrade later

✅ You prefer explicit control over what gets stored

✅ You want to get started in 60 seconds, not 60 minutes

✅ You want local-first with optional cloud sync

What About Using Both?

This is a valid architecture:

Use Graphiti for your product's user-facing memory (customer preferences, conversation history, business entities)
Use MemoryGraph for your development workflow (what you learned building the product)

They solve different problems. Graphiti helps your AI agent remember your users. MemoryGraph helps your coding agent remember your codebase.

What If You Choose Wrong?

Both systems use standard data formats. Migration is possible:

MemoryGraph → Graphiti: Export memories as JSON, feed them as episodes. Graphiti's LLM will re-extract entities and relationships (you'll lose your explicit relationship types but gain Graphiti's automatic extraction).

Graphiti → MemoryGraph: Export entities and edges. Map entity types to MemoryGraph's 12 memory types, map edge types to the 35 relationship types. Manual mapping required, but no data loss.

Neither system creates vendor lock-in at the data layer. Choose based on current needs; you can migrate if requirements change.

Getting Started with MemoryGraph

If MemoryGraph sounds right for your use case:

# Install
pipx install memorygraphMCP

# Add to Claude Code
claude mcp add --scope user memorygraph -- memorygraph

# Start using
claude
> "Remember this: Use pytest fixtures for database tests"
> "What do you remember about testing?"

No database setup. No Docker. No API keys.

See memorygraph.dev for documentation, or GitHub for the source.

Getting Started with Graphiti

If Graphiti is the better fit:

# Start Neo4j
docker run -p 7474:7474 -p 7687:7687 neo4j

# Install
pip install graphiti-core[neo4j]

# Configure
export NEO4J_URI=bolt://localhost:7687
export OPENAI_API_KEY=...

See github.com/getzep/graphiti for documentation.

Conclusion

Graphiti and MemoryGraph both solve the fundamental problem of AI agent memory. They're both graph-based, both MCP-compatible, both Apache 2.0 licensed.

The difference is focus.

Graphiti is a general-purpose temporal knowledge graph for any AI agent. It's mature, well-funded, and production-proven.

MemoryGraph is a coding-specific memory system for AI coding agents. It's opinionated, zero-config, and built for developers who want to start in 60 seconds.

Choose the tool that matches your use case. For coding agents, we think MemoryGraph is the better fit. For general AI agents, Graphiti is excellent.

And if you're building both? Use both.

MemoryGraph is open source under Apache 2.0. Try it at memorygraph.dev or star us on GitHub.

Gregory Dickson is a Senior AI Developer & Solutions Architect specializing in AI/ML development and cloud architecture. He's the creator of MemoryGraph, an open-source MCP memory server using graph-based relationship tracking.

What Deep Learning Theory Teaches Us About AI Memory

Gregory Dickson — Fri, 26 Dec 2025 13:53:52 +0000

How rate reduction and lossy compression principles from Berkeley's new textbook could reshape how we build persistent memory for LLMs

The Memory Problem No One Talks About

Every AI coding assistant you use today has the same dirty secret: it forgets everything the moment your session ends. That brilliant debugging session where Claude figured out your codebase architecture? Gone. The context about your team's coding conventions that took 20 messages to establish? Evaporated.

We're building MemoryGraph to solve this problem—a graph-based memory system that gives LLMs persistent, queryable memory across sessions. But as we dove deeper into the architecture, we kept hitting the same fundamental question:

What does it actually mean to "remember" something well?

It's not enough to just store text and retrieve it. Human memory doesn't work that way. We compress experiences into schemas, organize knowledge hierarchically, and somehow retrieve exactly what's relevant from decades of accumulated experience in milliseconds.

Then we discovered a new textbook that changed how we think about this problem entirely.

"Learning Deep Representations of Data Distributions"

In August 2025, Yi Ma's lab at Berkeley released Learning Deep Representations of Data Distributions—an open-source textbook that presents a unified mathematical framework for understanding deep learning through the lens of compression.

Their central thesis is deceptively simple:

"We compress to learn, and we learn to compress."

The book argues that intelligence—whether biological or artificial—fundamentally involves discovering low-dimensional structure in high-dimensional data and transforming that data into compact, structured representations.

This isn't just philosophy. They provide rigorous mathematics showing that popular neural network architectures (ResNets, Transformers, CNNs) can be derived as iterative optimization steps that maximize something called rate reduction—a measure of how well representations compress data while preserving important distinctions.

Reading this, we realized: this framework maps directly onto the memory storage problem.

Rate Reduction: A New Way to Think About Memory Quality

The book introduces a principle called Maximal Coding Rate Reduction (MCR²). Here's the intuition:

Imagine you have a collection of memories from different categories—bug fixes, architectural decisions, API documentation, team preferences. A good memory representation should do two things simultaneously:

Maximize expansion between categories: Memories about bug fixes should live in a completely different "region" of your representation space than memories about team preferences. You want these categories to be as distinguishable as possible.
Maximize compression within categories: All your bug fix memories should cluster tightly together. They share common structure—problem, cause, solution—and your representation should capture that.

Mathematically, this is expressed as:

ΔR = R(all memories) - Σ R(memories in each category)

Where R is the "coding rate"—essentially, how many bits you'd need to encode the data. You want to maximize ΔR: the total coding rate should be high (diverse memories), but the sum of within-category rates should be low (similar memories cluster).

This gives us a concrete metric for memory quality that goes beyond simple retrieval accuracy.

How This Applies to LLM Memory Systems

The Problem with Flat Embeddings

Most vector databases treat all memories the same way: convert text to a 384 or 768-dimensional embedding, store it, retrieve by cosine similarity.

But this ignores the structure we know exists in the data. A memory about a "person" is fundamentally different from a memory about a "code pattern." Treating them identically wastes representational capacity and makes retrieval harder.

The Berkeley framework suggests a different approach: type-specific subspaces.

Structured Embedding Spaces

Instead of one flat embedding space, imagine memories organized into learned subspaces:

┌─────────────────────────────────────────────────────────────┐
│                    Embedding Space (384-dim)                │
│                                                             │
│   ┌──────────────┐  ┌──────────────┐  ┌──────────────┐     │
│   │   Person     │  │   Project    │  │   Solution   │     │
│   │  Subspace    │  │  Subspace    │  │  Subspace    │     │
│   │   (64-dim)   │  │  (128-dim)   │  │   (96-dim)   │     │
│   │              │  │              │  │              │     │
│   │  • Alice     │  │ • ProjectX   │  │ • Fix#123    │     │
│   │  • Bob       │  │ • MemGraph   │  │ • Fix#456    │     │
│   │  • Carol     │  │ • API-v2     │  │ • Pattern#7  │     │
│   └──────────────┘  └──────────────┘  └──────────────┘     │
│                                                             │
│        ↑ Orthogonal subspaces (maximally separated)         │
└─────────────────────────────────────────────────────────────┘

Each memory type gets projected into its own subspace. These subspaces are learned to be orthogonal—maximizing separation between types. Within each subspace, similar memories cluster together—maximizing compression.

This is rate reduction in action: expand between categories, compress within them.

The Graph as a Compression Mechanism

Here's where things get interesting for MemoryGraph specifically.

The Berkeley book shows that neural network layers can be understood as iterative compression steps. Each layer transforms representations to be more compact and more structured.

We realized: a knowledge graph already does this.

Consider how MemoryGraph stores information:

Raw Input: "Alice fixed the authentication bug in the login 
            service yesterday by adding proper token validation"

Graph Representation:
  (Alice:Person) --[AUTHORED]--> (Fix#892:Solution)
  (Fix#892:Solution) --[RESOLVES]--> (AuthBug:Error)
  (AuthBug:Error) --[AFFECTS]--> (LoginService:Project)
  (Fix#892:Solution) --[INVOLVES]--> (TokenValidation:CodePattern)

This graph representation is a lossy compression of the original text. We've extracted the essential structure—who, what, where, how—and discarded the rest. The entities are compressed representations (cluster centers), and the relationships define how to navigate between them.

In the language of the Berkeley book:

Entities = compressed representations of many observations (low-dimensional subspace centers)
Relations = transformation operators between subspaces
Observations = high-dimensional raw data that gets compressed into entity updates

The graph structure itself encodes the low-dimensional manifold that rate reduction seeks to discover.

Compression in Action: MemoryGraph's Inference Engine

This isn't just theory—we've already implemented automatic compression in MemoryGraph's inference engine. When you save a memory, the system automatically discovers and creates new relationships you didn't explicitly define.

Transitive Compression

Consider this scenario. You create two explicit relationships:

"Auth Service" --[DEPENDS_ON]--> "JWT Library"
"JWT Library" --[DEPENDS_ON]--> "Crypto Utils"

The inference engine automatically adds:

"Auth Service" --[DEPENDS_ON]--> "Crypto Utils" (inferred, transitive)

This is path compression—reducing a multi-hop traversal to a single edge. In information-theoretic terms, we're eliminating redundancy in the graph structure. The transitive relationship was always implicitly there; the inference engine makes it explicit and queryable.

Type Inference as Semantic Compression

The engine also performs type inference:

Rule	What It Does
`type_from_solves`	If memory SOLVES a problem → type becomes "solution"
`type_from_fixes`	If memory FIXES an error → type becomes "fix"
`type_from_causes`	If memory CAUSES a problem → type becomes "problem"

This is semantic compression: instead of storing "this memory has a SOLVES relationship to a problem-type entity," we compress that pattern into a single type label. The type is the compressed representation of the memory's structural role in the graph.

Co-occurrence Affinity

Our cloud tier includes an even more interesting rule: co-occurrence affinity. When two memories share multiple common connections (say, 3+ shared neighbors), the engine infers they're related—even if no one explicitly connected them.

Memory A --[USES]--> React
Memory A --[USES]--> TypeScript  
Memory A --[PART_OF]--> Frontend Module

Memory B --[USES]--> React
Memory B --[USES]--> TypeScript
Memory B --[PART_OF]--> Frontend Module

Inferred: Memory A --[RELATED_TO]--> Memory B (confidence: 0.45)

This is the graph equivalent of finding low-dimensional structure: memories that occupy similar "positions" in the relationship space (share many neighbors) are likely semantically related, even without explicit links.

Confidence Scores and Lossy Compression

Every inferred relationship carries a confidence score (0-1). Lower confidence means the inference is more speculative—it's a "lossier" compression of the underlying evidence.

{
  "relationship_type": "DEPENDS_ON",
  "inferred": true,
  "confidence": 0.5,
  "rule": "transitive_depends_on",
  "depth": 2
}

Users can tune this tradeoff: accept more inferred relationships (richer graph, more noise) or fewer (sparser graph, higher precision). This is exactly the rate-distortion tradeoff that information theory describes—you choose how much fidelity to sacrifice for how much compression.

The cleanup API makes this explicit:

POST /inference/cleanup?min_confidence=0.3&max_age_days=30

This removes low-confidence edges older than 30 days—literally pruning the graph to maintain a target compression quality.

Practical Implementation Ideas

The inference engine demonstrates that compression principles already work in MemoryGraph. Based on the Berkeley framework, we're exploring several enhancements that go deeper:

1. Type-Aware Embeddings

Currently, our semantic search (planned for the SDK) would treat all content identically. But the Berkeley framework suggests projecting embeddings through type-specific learned transformations:

class TypeAwareEmbedding:
    def __init__(self):
        self.base_encoder = SentenceTransformer("all-MiniLM-L6-v2")

        # Learned projections for each entity type
        # Dimensions chosen based on type complexity
        self.projections = {
            "person": LinearProjection(384, 64),
            "project": LinearProjection(384, 128),
            "solution": LinearProjection(384, 96),
            "error": LinearProjection(384, 48),
            "code_pattern": LinearProjection(384, 64),
        }

    def embed(self, content: str, entity_type: str) -> np.ndarray:
        base = self.base_encoder.encode(content)
        projection = self.projections.get(entity_type)
        return projection(base) if projection else base

The dimensionality of each subspace reflects the inherent complexity of that type. People are relatively simple to characterize; projects have more nuance. This extends our existing type inference—instead of just labeling types, we'd represent them in mathematically distinct subspaces.

2. Progressive Memory Consolidation

The book describes how deep networks progressively compress representations layer by layer. Our inference engine already does single-pass compression. We can extend this to multi-layer consolidation over time:

Layer 0: Raw Observations (high-dimensional, ephemeral)
         "User mentioned they prefer tabs over spaces"
         "User asked about Python formatting"
         "User corrected a spacing issue in code"
              │
              ▼ Compression (after session) [EXISTING: type inference]

Layer 1: Working Memory (mid-dimensional, session-persistent)  
         "User has strong code formatting preferences"
              │
              ▼ Compression (after multiple sessions) [NEW: consolidation]

Layer 2: Consolidated Knowledge (low-dimensional, long-term)
         Entity property: coding_style = "strict_formatting"
              │
              ▼ Compression (over time) [NEW: schema evolution]

Layer 3: Core Identity (minimal, permanent)
         Entity: User with trait "detail_oriented"

This mirrors how human memory consolidation works—episodic memories compress into semantic knowledge over time. Our existing similar_tags_affinity rule hints at this: memories that share structure get linked. The next step is actually merging them.

3. Expansion-Compression Retrieval

Our planned hybrid search (ADR-005) optimizes for similarity. The rate reduction framework suggests we should also optimize for distinctiveness:

def retrieval_score(query: str, memory: Memory, 
                    other_retrieved: List[Memory]) -> float:
    # Compression term: how relevant is this memory?
    similarity = cosine_similarity(
        embed(query), 
        embed(memory.content)
    )

    # Expansion term: how distinct is this from other results?
    # This prevents returning 5 near-duplicate memories
    distinctiveness = 1.0 - mean([
        cosine_similarity(embed(memory.content), embed(other.content))
        for other in other_retrieved
    ])

    # Balance both objectives (like rate reduction's R - Rc)
    return alpha * similarity + (1 - alpha) * distinctiveness

This directly implements the MCR² principle: maximize total coding rate (diverse results) while minimizing within-group coding rate (each result is relevant). It prevents retrieval from returning redundant results—a common failure mode of pure similarity search.

4. Graph-Guided Manifold Navigation

Our inference engine already exploits graph structure for discovery. We can extend this to retrieval:

async def graph_aware_search(query: str, depth: int = 2):
    # 1. Find entry points via embedding similarity
    seeds = await semantic_search(query, limit=5)

    # 2. Expand along graph edges (follow the manifold)
    # This uses our existing relationship structure
    expanded = set(seeds)
    frontier = seeds

    for _ in range(depth):
        for entity in frontier:
            # Relations define valid transitions on the manifold
            # Inferred edges from our engine help here!
            neighbors = await get_related_entities(entity)
            expanded.update(neighbors)
        frontier = expanded - set(seeds)

    # 3. Re-rank by combined graph + semantic relevance
    return rank_by_rate_reduction_score(expanded, query)

The graph provides a strong inductive bias about which memories are likely relevant together. Transitive inferred edges (from our inference engine) act as "shortcuts" on the manifold—they represent compressed paths through the relationship space.

5. Inference Rules as Compression Operators

Looking at our inference rules through the Berkeley lens reveals their true nature:

Rule	Compression Type	Information-Theoretic Interpretation
`transitive_depends_on`	Path compression	Removes redundant traversals
`type_from_solves`	Semantic compression	Encodes role in single label
`co_occurrence_affinity`	Structural compression	Finds shared subspace membership
`reverse_solves`	Bidirectional encoding	Enables queries from either direction

This suggests new rules we could add:

# Cluster compression: memories with 5+ shared tags → merge into summary entity
"high_overlap_merge": {
    "condition": "shared_tags >= 5 AND same_project",
    "action": "create_summary_entity",
    "compression_type": "lossy"
}

# Temporal compression: daily memories → weekly summary
"temporal_consolidation": {
    "condition": "age > 7 days AND same_topic",
    "action": "compress_to_summary",
    "preserve": ["key_decisions", "blockers", "outcomes"]
}

What's Next

The inference engine proves that compression principles work for AI memory. We're now exploring how to go deeper:

Already Built:

Transitive inference (path compression)
Type inference (semantic compression)
Co-occurrence affinity (structural compression)
Confidence-based cleanup (rate-distortion tradeoff)

Actively Researching:

Type-aware embeddings: How do we train type-specific projections without massive labeled datasets? Self-supervised approaches using the graph structure itself look promising—the relationships encode supervision signal.

Consolidation triggers: When should observations compress into entity updates? Too aggressive and we lose detail; too conservative and memory bloats. We're exploring information-theoretic triggers: consolidate when adding a new observation wouldn't increase the coding rate significantly.

Cross-type retrieval: How do we handle queries spanning multiple types? "Find solutions that Alice worked on for projects using Python" crosses Person, Solution, and Project subspaces. The graph edges provide a natural answer—they're the transformation operators between subspaces.

Rate reduction metrics: Can we use MCR² as an actual quality metric during development? This would let us evaluate memory architectures principally, not just via retrieval benchmarks. Early experiments suggest the metric correlates well with subjective "memory usefulness."

The Bigger Picture

The Berkeley book's framework suggests something profound: compression isn't just a storage optimization—it's the fundamental operation of learning itself.

Every time you explain a complex system by its key components, you're doing rate reduction. Every time you recognize a pattern across multiple experiences, you're finding low-dimensional structure. Every time you organize knowledge hierarchically, you're building a manifold.

For AI memory systems, this means we shouldn't think of memory as a retrieval problem with storage attached. Memory is a compression problem with retrieval as a side effect.

Get the compression right—find the true low-dimensional structure in the experiences—and retrieval becomes almost trivial. The memories that matter will naturally cluster together, and the right memory for a given context will be the one that reduces uncertainty the most.

MemoryGraph's inference engine is our first step down this path. Transitive compression, type inference, and co-occurrence affinity are all compression operators—they find structure and make it explicit. The next steps—type-aware embeddings, progressive consolidation, and rate-reduction-guided retrieval—push further toward a system that truly learns from experiences rather than just storing them.

The theoretical grounding matters because it tells us why these approaches work and what to try next. When your inference rule creates a transitive edge, that's not just a database optimization—it's the system discovering that a three-hop path can be compressed to one hop without losing essential information. When type inference labels a memory as "solution," it's compressing the memory's structural role into a single bit of semantic information.

That's what we're building toward with MemoryGraph. Not just a database that stores what AI assistants experience, but a system that truly learns from those experiences—compressing them into structured knowledge that makes every future interaction more intelligent.

MemoryGraph is an open-source graph-based memory system for AI assistants. Check out the project at github.com/gregorydickson/memory-graph or try the cloud platform at memorygraph.dev.

The Berkeley textbook "Learning Deep Representations of Data Distributions" is freely available at ma-lab-berkeley.github.io/deep-representation-learning-book.

References

Buchanan, S., Pai, D., Wang, P., & Ma, Y. (2025). Learning Deep Representations of Data Distributions. Online textbook.
Chan, K.H.R., Yu, Y., You, C., Qi, H., Wright, J., & Ma, Y. (2022). ReduNet: A White-box Deep Network from the Principle of Maximizing Rate Reduction. Journal of Machine Learning Research.
Yu, Y., Buchanan, S., Pai, D., et al. (2024). White-Box Transformers via Sparse Rate Reduction: Compression Is All There Is? Journal of Machine Learning Research.

Tags: #AI #Memory #LLM #MachineLearning #KnowledgeGraphs #Compression #Research

Building a Prolog-Inspired Inference Engine for AI Coding Agents

Gregory Dickson — Thu, 11 Dec 2025 14:32:03 +0000

How we're adding automatic relationship discovery to MemoryGraph using FalkorDB and good old-fashioned AI techniques

If you've ever used an AI coding assistant like Claude Code, Cursor, or GitHub Copilot, you've probably noticed they have the memory of a goldfish. Every session starts fresh. You explain your project architecture, your coding conventions, your preferences—and tomorrow, you do it all again.

MemoryGraph is an open-source project that gives AI coding agents persistent, graph-based memory. But storing memories is only half the battle. The real magic happens when the system starts understanding the connections you didn't explicitly create.

We're building an inference engine. Here's how.

The Prolog Connection

Before diving into implementation, let's talk about why graph databases and inference feel so natural together.

If you squint at a graph database query, it looks suspiciously like Prolog, my first (well, actually my second) programming language:

% Prolog
parent(tom, mary).
parent(mary, ann).
grandparent(X, Z) :- parent(X, Y), parent(Y, Z).

// Cypher (FalkorDB/Neo4j)
CREATE (tom)-[:PARENT]->(mary)
CREATE (mary)-[:PARENT]->(ann)

// Query: find grandparents
MATCH (x)-[:PARENT]->(y)-[:PARENT]->(z)
RETURN x, z

Both are fundamentally declarative. You describe what you want, not how to find it. The system figures out the traversal.

This insight shapes our entire approach: inference rules are just parameterized Cypher queries.

What We're Building

When a developer stores a memory like "Auth Service depends on JWT Library," and later adds "JWT Library depends on Crypto Utils," we want the system to automatically understand that Auth Service transitively depends on Crypto Utils.

More ambitiously:

If something SOLVES a problem, it's probably a solution (type inference)
If two memories share 3+ connections, they're probably related (affinity detection)
If A CAUSES problem P and B SOLVES P, then A and B are connected (problem-solution bridging)

All of this should happen automatically, in the background, without slowing down writes.

Why FalkorDB?

FalkorDB is a Redis-based graph database with full Cypher support. For MemoryGraph, it offers:

Speed - Sub-millisecond queries for the graph sizes we're dealing with
Cypher - Industry-standard query language, portable knowledge
Redis Protocol - Easy deployment, familiar ops story
In-Database Processing - We can push inference logic into the database itself

That last point is crucial. Instead of:

Read data → Process in Python → Write results

We can do:

Run Cypher query that reads AND writes in one transaction

The Architecture

Here's the high-level flow:

┌─────────────────────────────────────────────────────────────┐
│                     Memory Write                            │
└──────────────────────────┬──────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│              Store Memory (immediate)                        │
│              Return to User (< 10ms)                        │
│              Queue for Inference                            │
└──────────────────────────┬──────────────────────────────────┘
                           │ (async, batched)
                           ▼
┌─────────────────────────────────────────────────────────────┐
│                   Inference Engine                          │
│  ┌─────────────────────────────────────────────────────┐   │
│  │  Rule: transitive_depends_on                         │   │
│  │  Rule: type_from_solves                              │   │
│  │  Rule: co_occurrence_affinity                        │   │
│  └─────────────────────────────────────────────────────┘   │
│                           │                                 │
│                           ▼                                 │
│              FalkorDB (Cypher execution)                    │
│              Creates edges marked {inferred: true}          │
└─────────────────────────────────────────────────────────────┘

The key insight: inference is decoupled from the write path. Users never wait for inference to complete.

Defining Rules as Cypher

Each inference rule is a self-contained Cypher query that:

Matches a pattern involving the triggering memory
Creates new relationships (marked as inferred)
Returns a count for logging

Here's the transitive dependency rule:

InferenceRule(
    name="transitive_depends_on",
    description="Propagate DEPENDS_ON transitively (A→B→C means A→C)",
    query="""
        MATCH path = (a:Memory {id: $memory_id})-[:DEPENDS_ON*2..3]->(c:Memory)
        WHERE a <> c
          AND NOT (a)-[:DEPENDS_ON {inferred: true}]->(c)
        WITH a, c, length(path) as depth
        MERGE (a)-[r:DEPENDS_ON {
            inferred: true,
            rule: 'transitive_depends_on',
            depth: depth,
            confidence: 1.0 / depth,
            created_at: datetime()
        }]->(c)
        RETURN count(r) as created
    """
)

Let's break this down:

[:DEPENDS_ON*2..3] - Match paths of length 2-3 (we don't want infinite chains)
WHERE NOT (a)-[:DEPENDS_ON {inferred: true}]->(c) - Don't create duplicates
confidence: 1.0 / depth - Longer chains = lower confidence
inferred: true - Mark it so we can filter/weight differently in search

The beauty is that this runs entirely in FalkorDB. No data leaves the database.

The Batching Strategy

Running inference on every single write would be wasteful. If a developer is rapidly creating memories, we'd thrash the database with redundant queries.

Instead, we batch:

class InferenceService:
    def __init__(self, db, batch_size=10, batch_delay=2.0):
        self.pending_memories = deque(maxlen=1000)
        self.batch_delay = batch_delay

    async def queue_for_inference(self, memory_id: str):
        """Called on every write - returns immediately"""
        self.pending_memories.append(memory_id)

        if not self._processor_running:
            asyncio.create_task(self._batch_processor())

    async def _batch_processor(self):
        """Waits, then processes accumulated memories"""
        await asyncio.sleep(self.batch_delay)  # Let writes accumulate

        while self.pending_memories:
            batch = [self.pending_memories.popleft() 
                     for _ in range(min(self.batch_size, len(self.pending_memories)))]
            await self._run_inference_batch(batch)

The 2-second delay means:

Single writes: 2 second latency to inference (invisible to user)
Burst writes: All processed together efficiently
No thundering herd on the database

Inference-Aware Search

Creating inferred edges is useless if search doesn't leverage them. Here's how we blend explicit and inferred relationships:

MATCH (m:Memory)
WHERE m.title CONTAINS $query OR m.content CONTAINS $query
WITH m, 1.0 as base_score

// Boost from explicit (user-created) relationships
OPTIONAL MATCH (m)-[r1]-(related:Memory)
WHERE r1.inferred IS NULL OR r1.inferred = false
WITH m, base_score, count(r1) * 0.3 as explicit_boost

// Smaller boost from inferred relationships
OPTIONAL MATCH (m)-[r2 {inferred: true}]-(inferred:Memory)
WITH m, base_score + explicit_boost + count(r2) * 0.15 * coalesce(r2.confidence, 0.5) as final_score

RETURN m, final_score
ORDER BY final_score DESC
LIMIT 20

Explicit relationships get more weight (0.3) than inferred ones (0.15), and inferred edges are further scaled by their confidence score. This means:

User-created connections are always prioritized
High-confidence inferences boost results
Low-confidence guesses have minimal impact

The Type Inference Pattern

One of my favorite rules is type inference. MemoryGraph has a taxonomy of memory types: solution, problem, error, fix, pattern, etc.

But users often just dump content without classifying it. The inference engine can help:

InferenceRule(
    name="type_from_solves",
    description="If memory SOLVES a problem, infer it's a solution",
    query="""
        MATCH (m:Memory {id: $memory_id})-[:SOLVES]->(p:Memory {type: 'problem'})
        WHERE m.type = 'general' OR m.type IS NULL
        SET m.type = 'solution', m.type_inferred = true
        RETURN m.id as updated
    """
)

If you create a memory and link it with SOLVES to something typed as problem, the system infers your memory is a solution. Simple, but surprisingly useful for keeping the knowledge graph clean.

Cloud-Only Premium Features

We're building MemoryGraph as open-source with a cloud offering. Some inference rules only make sense (or are only cost-effective) in the cloud:

Affinity Detection - Find memories that share multiple connections:

MATCH (a:Memory {id: $memory_id})-[r1]-(common:Memory)-[r2]-(b:Memory)
WHERE a <> b
  AND NOT (a)-[:AFFINITY]-(b)
WITH a, b, count(DISTINCT common) as shared_count
WHERE shared_count >= 2
MERGE (a)-[r:AFFINITY {
    inferred: true,
    strength: toFloat(shared_count) / 5.0,
    shared_connections: shared_count
}]-(b)

Problem-Solution Bridging - Connect root causes to their fixes:

MATCH (cause:Memory)-[:CAUSES]->(problem:Memory)<-[:SOLVES]-(solution:Memory)
WHERE cause <> solution
MERGE (cause)-[:ADDRESSED_BY {inferred: true, via_problem: problem.id}]->(solution)

These run asynchronously in the cloud, invisible to users but enriching their knowledge graphs over time.

Handling False Positives

Inference isn't perfect. Sometimes the system will create relationships that don't make sense. Our mitigations:

Everything is marked - {inferred: true} means we can always filter it out
Confidence scores - Lower confidence = less impact on search
Periodic cleanup - A background job prunes old, low-confidence edges:

MATCH ()-[r {inferred: true}]-()
WHERE r.confidence < 0.3
  AND r.created_at < datetime() - duration('P30D')
DELETE r

User feedback - Future: let users thumbs-down bad inferences, feeding back into rule tuning

What's Next

This inference engine is the foundation for more ambitious features:

LLM-Powered Classification - For memories the rules can't classify, use a small/fast model:

async def llm_classify_memory(memory_id: str):
    memory = await db.get_memory(memory_id)

    if memory.type == "general":
        response = await anthropic.messages.create(
            model="claude-3-haiku-20240307",
            max_tokens=50,
            messages=[{
                "role": "user",
                "content": f"Classify: {memory.title}\n{memory.content[:500]}"
            }]
        )
        # Update memory type based on response

Temporal Inference - Memories created close together with shared tags are probably related:

MATCH (a:Memory), (b:Memory)
WHERE duration.between(a.created_at, b.created_at).minutes < 30
  AND any(tag IN a.tags WHERE tag IN b.tags)
MERGE (a)-[:TEMPORAL_PROXIMITY {inferred: true}]->(b)

Cross-Project Patterns - In enterprise deployments, detect common problem-solution pairs across teams (anonymized, of course).

Try It Yourself

MemoryGraph is open source: github.com/gregorydickson/memory-graph

The inference engine is coming in the next release. If you're building AI-powered developer tools and need persistent memory, give it a look.

Or if you just think graph databases and declarative inference are cool (they are), come contribute. We're always looking for new rules to add to the engine.

Building MemoryGraph at memorygraph.dev.

Tags: #ai #graphdatabase #python #opensource #devtools falkordb

Discussion Questions

I'd love to hear from the community:

What inference rules would be useful for your workflow? We're always looking for patterns that would help developers.
How do you handle "memory" in your AI tooling today? Curious what workarounds people have built.
Prolog nostalgia? Anyone else miss declarative logic programming? There's something elegant about it that modern systems have lost.

Drop a comment below or find me on GitHub.

Context-Efficient AI Coding Agent Memory Without Abandoning MCP

Gregory Dickson — Sun, 07 Dec 2025 13:29:38 +0000

The Context Window Problem Is Real

If you've worked with AI coding agents, you've experienced it: your agent slows down, token costs spike, or tasks fail because the context window hit its limit. A recent article highlighted this pain point, showing that just three popular MCP servers consumed 26% of a coding agent's context window.

The culprit? MCP servers that pre-load dozens of tool definitions into the context window whether the agent needs them or not. Some memory solutions expose 40+ tools, each with verbose descriptions that compound into thousands of tokens before your agent even starts working.

This is a legitimate concern. But the solution isn't to abandon MCP entirely—it's to design MCP servers with context efficiency as a first-class requirement.

MemoryGraph's Approach: Judicious Tool Design

MemoryGraph takes a different path. Instead of offering every conceivable memory operation as a separate tool, we designed around a core principle: minimum tools, maximum capability.

The Numbers

Solution	Default Tools	Typical Context Usage
Heavy MCP servers	40+ tools	20-30% of context
MemoryGraph Core	9 tools	~2-3% of context
MemoryGraph Extended	11 tools	~3-4% of context

Nine tools. That's it for 95% of use cases. And each tool description is crafted to be concise while remaining discoverable.

Tool Profiles: Context When You Need It

We implemented tool profiles to give users explicit control over their context footprint:

# Core mode (default) - 9 tools, minimal context
memorygraph

# Extended mode - 11 tools, adds statistics and advanced queries
memorygraph --profile extended

Most users never need extended mode. The core profile provides:

Memory CRUD: store, get, update, delete, search (5 tools)
Relationships: create links, traverse graph (2 tools)
Discovery: fuzzy recall, session briefings (2 tools)

That covers storing solutions, linking problems to fixes, recalling past work, and catching up on project context. Extended mode adds database statistics and complex relationship queries—useful for power users, but users have to opt-in.

Why We Didn't Abandon MCP

Some memory vendors have moved from MCP to CLI interfaces, arguing that agents are "natively fluent" in shell commands. While there's merit to this argument, we believe it conflates two separate concerns:

1. The Problem Isn't MCP—It's Tool Sprawl

MCP itself is a thin protocol. The context cost comes from tool definitions, not the protocol. A well-designed MCP server with 9 concise tools uses far less context than a CLI wrapper with verbose --help output that gets loaded anyway.

2. CLI Loses MCP's Ecosystem Benefits

MCP provides:

Standardized tool discovery across clients (Claude Code, Cursor, VS Code Copilot, etc.)
Consistent installation and configuration
Client-managed tool execution and error handling
Cross-platform support without wrapper scripts

Moving to CLI means maintaining separate integrations for each coding agent, handling authentication differently per environment, and losing the growing MCP ecosystem.

3. Graph Relationships Are Our Value Prop

A CLI interface forces flat, document-style storage. MemoryGraph's power comes from typed relationships:

[timeout_fix] --CAUSES--> [memory_leak] --SOLVED_BY--> [connection_pooling]

Query: "What happened with retry logic?" returns the full causal chain—something flat storage can't provide efficiently.

Concise Tool Descriptions: How We Stay Lean

Here's an example of how we approach tool descriptions. Compare a verbose approach:

# Verbose (typical)
recall_memories: This is the recommended starting point for recalling past 
memories and learnings from your knowledge graph. This tool wraps search_memories with optimal defaults for natural language queries. When you want to search for past work, solutions, problems, patterns, or project context, use this tool first. 
It automatically uses fuzzy matching which handles plurals, tenses, and case 
variations. Results always include relationship context showing what connects to what. This is simpler than search_memories for common use cases because it has optimized default settings applied. Pass a natural language query and optionally filter by memory types or project path. Results are ranked by relevance with match quality hints included.

Versus our actual approach:

# Concise (MemoryGraph)
recall_memories: Search memories with fuzzy matching and relationship context. 
Best starting point for "What did we learn about X?" queries. Handles plurals and tenses automatically.

Same capability, fraction of the tokens.

What This Means in Practice

When you add MemoryGraph to Claude Code:

claude mcp add --scope user memorygraph -- memorygraph

Your agent gets persistent memory with graph relationships while consuming roughly 2-3% of context—leaving the rest for your actual work.

Compare that to solutions that consume 20%+ before you've even asked a question.

Our Commitment

We're adding context footprint tracking to our documentation and website. Users should know exactly how much context each MCP server costs before they install it.

Upcoming improvements:

Published context token counts per tool profile
Tool description audit to minimize verbosity
Continued focus on "minimum tools, maximum capability"

Conclusion

The context window problem is real, but MCP isn't the enemy. Tool sprawl is. MemoryGraph proves you can have powerful graph-based memory with relationship tracking while staying context-efficient.

Nine tools. Graph relationships. 2-3% context usage.

That's the balance we've found.

Get started:

pipx install memorygraphMCP
claude mcp add --scope user memorygraph -- memorygraph

GitHub | Documentation

MemoryGraph is an open-source MCP memory server for AI coding agents. We believe context efficiency and powerful features aren't mutually exclusive.

MCP Gets Tasks: A Game-Changer for Long-Running AI Operations

Gregory Dickson — Fri, 05 Dec 2025 19:54:25 +0000

The Model Context Protocol is adding async task support—and it's going to fundamentally change how AI agents handle complex, time-intensive work.

December 5, 2024 - Gregory Dickson

The Model Context Protocol (MCP) has been revolutionizing how AI agents interact with external tools and data sources since its release. But there's been a significant limitation holding back more sophisticated use cases: every tool call blocks until completion. No way to check progress. No way to retrieve results later. No way to handle operations that take minutes or hours.

That's about to change.

The Problem: When Tool Calls Take Too Long

If you've built any serious MCP server, you've hit this wall. Maybe you're wrapping a workflow API that processes large datasets. Maybe you're orchestrating multiple AI agents. Maybe you're running comprehensive test suites or complex data analysis pipelines.

The current pattern forces an uncomfortable choice:

Option 1: Block and wait - Your agent sits idle for minutes or hours while a single operation completes. If the connection drops, you lose everything and start over.

Option 2: Split into multiple tools - Create start_job, check_status, and get_result tools. Now you're relying on prompt engineering to make the agent poll correctly. Sometimes it works. Sometimes the agent "forgets" to check back. Sometimes it hallucinates job IDs.

Option 3: Build a polling server - Your MCP server does nothing but poll other services. You're just moving the problem around.

None of these are good solutions.

Enter SEP-1686: Tasks

The MCP core team has accepted SEP-1686, a specification for first-class async task support in the protocol. And it's elegant.

How It Works

Tasks introduce a three-phase pattern:

// 1. CREATE - Start the operation, get task metadata back immediately
const task = await client.callTool({
  name: "analyze_dataset",
  arguments: { dataset: "large_file.csv" }
}, { 
  createTask: true,
  ttl: 3600000 // Keep results for 1 hour
});

// Returns immediately with taskId: "abc-123", status: "working"

// 2. POLL - Check status when you want
const status = await client.getTaskStatus(task.taskId);
// { status: "working", pollInterval: 5000 }

// 3. RETRIEVE - Get the actual result when complete
const result = await client.getTaskResult(task.taskId);
// Returns the actual tool call result

Your host application stays in control. The agent can do other work. You can show progress in your UI. If the connection drops, you can reconnect and fetch results using the task ID.

Key Features

Generic Primitive: This isn't just for tools. Tasks work with any MCP request type—tools, resources, prompts, sampling, you name it. The same pattern, consistently applied across the entire protocol.

Idempotent & Retry-Safe: Client-generated task IDs mean you can safely retry requests without creating duplicate tasks. Perfect for unreliable networks.

Resource Management: Built-in TTL (time-to-live) support means servers can clean up completed tasks automatically. No memory leaks from abandoned operations.

Graceful Degradation: Servers that don't support tasks just ignore the metadata and return results normally. No version negotiation needed.

Bidirectional: Either clients or servers can create tasks. A server can task-ify a sampling request that needs user input, for example.

Real-World Impact

Amazon cited several production use cases driving this specification:

Healthcare & Life Sciences: Molecular analysis jobs processing hundreds of thousands of data points over several hours
Enterprise Automation: SDLC workflows spanning multiple teams and systems
Code Migration: Automated refactoring across large codebases with dependency analysis
Test Execution: Comprehensive test suites with thousands of cases
Multi-Agent Systems: Agents that need to coordinate without blocking each other

These aren't edge cases. These are fundamental patterns for production AI applications.

MemoryGraph + Tasks = Powerful Memory Operations

I'm particularly excited about this because of what it means for MemoryGraph, my open-source MCP memory server.

MemoryGraph uses graph-based relationship tracking to give AI agents sophisticated, queryable memory. But some operations are computationally expensive:

Complex Graph Traversals

Finding all solutions related to a problem, following relationship chains, or exploring multi-hop connections across hundreds of memories—these queries can take time, especially as the graph grows.

Batch Memory Operations

Importing large conversation histories, bulk relationship creation, or memory consolidation operations that process hundreds of nodes.

Semantic Search at Scale

Vector similarity searches across large memory sets, especially with complex filtering or multi-term queries.

Memory Curation

Background cleanup operations, relationship strength decay, automated summarization of old memories, or graph optimization.

With task support, MemoryGraph can:

Return immediately for expensive queries, letting agents continue other work
Provide progress updates as complex traversals complete
Cache results so agents can retrieve them multiple times without re-computation
Support background operations without blocking the conversation
Enable proactive polling from host applications to show memory operation status in the UI

Here's what it might look like:

// Start a complex memory query
const task = await client.callTool({
  name: "memorygraph:recall_memories",
  arguments: {
    query: "authentication solutions",
    maxDepth: 3, // Deep relationship traversal
    includeRelated: true
  }
}, { createTask: true });

// Agent continues with other tasks...

// Host application polls and shows progress
const status = await client.getTaskStatus(task.taskId);
// UI shows: "Searching memories... (traversed 450 nodes)"

// Retrieve when ready
const memories = await client.getTaskResult(task.taskId);

Timeline & Implementation

The specification is already accepted and targeted for the DRAFT-2025-11-25 milestone. The full spec text is available in PR #1732, and SDK updates are in progress.

MemoryGraph will add task support once the official SDKs land. I'm planning to start with:

Semantic search operations (initial implementation)
Complex graph traversals with relationship depth > 2
Batch imports for large memory sets
Background memory curation operations

Future Possibilities

The task primitive is designed to be extensible. Future enhancements being discussed include:

Push notifications for state changes (no polling needed)
Intermediate results (stream partial outputs as they're available)
Nested tasks (hierarchical workflows with parent/child relationships)

These would enable even more sophisticated patterns, like a memory query that spawns subtasks for different relationship types, or real-time streaming of search results as they're found.

Why This Matters

Tasks aren't just a nice-to-have feature. They're a fundamental building block that unlocks entire categories of MCP applications that weren't practically feasible before.

You can now build MCP servers that:

Wrap existing workflow APIs cleanly
Handle genuinely long-running operations (minutes to hours)
Support sophisticated multi-step processes
Enable true agent concurrency

And you can do it with a standard, well-defined protocol pattern instead of ad-hoc conventions that every server implements differently.

For MemoryGraph specifically, this means more sophisticated memory operations without blocking agents, better user experience in host applications, and the ability to handle much larger memory graphs efficiently.

Get Involved

Follow the specification: SEP-1686 on GitHub
Try MemoryGraph: github.com/gregorydickson/memorygraph