Gregory Dickson

Posted on Dec 11, 2025

Building a Prolog-Inspired Inference Engine for AI Coding Agents

#agents #database #ai #architecture

How we're adding automatic relationship discovery to MemoryGraph using FalkorDB and good old-fashioned AI techniques

If you've ever used an AI coding assistant like Claude Code, Cursor, or GitHub Copilot, you've probably noticed they have the memory of a goldfish. Every session starts fresh. You explain your project architecture, your coding conventions, your preferences—and tomorrow, you do it all again.

MemoryGraph is an open-source project that gives AI coding agents persistent, graph-based memory. But storing memories is only half the battle. The real magic happens when the system starts understanding the connections you didn't explicitly create.

We're building an inference engine. Here's how.

The Prolog Connection

Before diving into implementation, let's talk about why graph databases and inference feel so natural together.

If you squint at a graph database query, it looks suspiciously like Prolog, my first (well, actually my second) programming language:

% Prolog
parent(tom, mary).
parent(mary, ann).
grandparent(X, Z) :- parent(X, Y), parent(Y, Z).

// Cypher (FalkorDB/Neo4j)
CREATE (tom)-[:PARENT]->(mary)
CREATE (mary)-[:PARENT]->(ann)

// Query: find grandparents
MATCH (x)-[:PARENT]->(y)-[:PARENT]->(z)
RETURN x, z

Both are fundamentally declarative. You describe what you want, not how to find it. The system figures out the traversal.

This insight shapes our entire approach: inference rules are just parameterized Cypher queries.

What We're Building

When a developer stores a memory like "Auth Service depends on JWT Library," and later adds "JWT Library depends on Crypto Utils," we want the system to automatically understand that Auth Service transitively depends on Crypto Utils.

More ambitiously:

If something SOLVES a problem, it's probably a solution (type inference)
If two memories share 3+ connections, they're probably related (affinity detection)
If A CAUSES problem P and B SOLVES P, then A and B are connected (problem-solution bridging)

All of this should happen automatically, in the background, without slowing down writes.

Why FalkorDB?

FalkorDB is a Redis-based graph database with full Cypher support. For MemoryGraph, it offers:

Speed - Sub-millisecond queries for the graph sizes we're dealing with
Cypher - Industry-standard query language, portable knowledge
Redis Protocol - Easy deployment, familiar ops story
In-Database Processing - We can push inference logic into the database itself

That last point is crucial. Instead of:

Read data → Process in Python → Write results

We can do:

Run Cypher query that reads AND writes in one transaction

The Architecture

Here's the high-level flow:

┌─────────────────────────────────────────────────────────────┐
│                     Memory Write                            │
└──────────────────────────┬──────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│              Store Memory (immediate)                        │
│              Return to User (< 10ms)                        │
│              Queue for Inference                            │
└──────────────────────────┬──────────────────────────────────┘
                           │ (async, batched)
                           ▼
┌─────────────────────────────────────────────────────────────┐
│                   Inference Engine                          │
│  ┌─────────────────────────────────────────────────────┐   │
│  │  Rule: transitive_depends_on                         │   │
│  │  Rule: type_from_solves                              │   │
│  │  Rule: co_occurrence_affinity                        │   │
│  └─────────────────────────────────────────────────────┘   │
│                           │                                 │
│                           ▼                                 │
│              FalkorDB (Cypher execution)                    │
│              Creates edges marked {inferred: true}          │
└─────────────────────────────────────────────────────────────┘

The key insight: inference is decoupled from the write path. Users never wait for inference to complete.

Defining Rules as Cypher

Each inference rule is a self-contained Cypher query that:

Matches a pattern involving the triggering memory
Creates new relationships (marked as inferred)
Returns a count for logging

Here's the transitive dependency rule:

InferenceRule(
    name="transitive_depends_on",
    description="Propagate DEPENDS_ON transitively (A→B→C means A→C)",
    query="""
        MATCH path = (a:Memory {id: $memory_id})-[:DEPENDS_ON*2..3]->(c:Memory)
        WHERE a <> c
          AND NOT (a)-[:DEPENDS_ON {inferred: true}]->(c)
        WITH a, c, length(path) as depth
        MERGE (a)-[r:DEPENDS_ON {
            inferred: true,
            rule: 'transitive_depends_on',
            depth: depth,
            confidence: 1.0 / depth,
            created_at: datetime()
        }]->(c)
        RETURN count(r) as created
    """
)

Let's break this down:

[:DEPENDS_ON*2..3] - Match paths of length 2-3 (we don't want infinite chains)
WHERE NOT (a)-[:DEPENDS_ON {inferred: true}]->(c) - Don't create duplicates
confidence: 1.0 / depth - Longer chains = lower confidence
inferred: true - Mark it so we can filter/weight differently in search

The beauty is that this runs entirely in FalkorDB. No data leaves the database.

The Batching Strategy

Running inference on every single write would be wasteful. If a developer is rapidly creating memories, we'd thrash the database with redundant queries.

Instead, we batch:

class InferenceService:
    def __init__(self, db, batch_size=10, batch_delay=2.0):
        self.pending_memories = deque(maxlen=1000)
        self.batch_delay = batch_delay

    async def queue_for_inference(self, memory_id: str):
        """Called on every write - returns immediately"""
        self.pending_memories.append(memory_id)

        if not self._processor_running:
            asyncio.create_task(self._batch_processor())

    async def _batch_processor(self):
        """Waits, then processes accumulated memories"""
        await asyncio.sleep(self.batch_delay)  # Let writes accumulate

        while self.pending_memories:
            batch = [self.pending_memories.popleft() 
                     for _ in range(min(self.batch_size, len(self.pending_memories)))]
            await self._run_inference_batch(batch)

The 2-second delay means:

Single writes: 2 second latency to inference (invisible to user)
Burst writes: All processed together efficiently
No thundering herd on the database

Inference-Aware Search

Creating inferred edges is useless if search doesn't leverage them. Here's how we blend explicit and inferred relationships:

MATCH (m:Memory)
WHERE m.title CONTAINS $query OR m.content CONTAINS $query
WITH m, 1.0 as base_score

// Boost from explicit (user-created) relationships
OPTIONAL MATCH (m)-[r1]-(related:Memory)
WHERE r1.inferred IS NULL OR r1.inferred = false
WITH m, base_score, count(r1) * 0.3 as explicit_boost

// Smaller boost from inferred relationships
OPTIONAL MATCH (m)-[r2 {inferred: true}]-(inferred:Memory)
WITH m, base_score + explicit_boost + count(r2) * 0.15 * coalesce(r2.confidence, 0.5) as final_score

RETURN m, final_score
ORDER BY final_score DESC
LIMIT 20

Explicit relationships get more weight (0.3) than inferred ones (0.15), and inferred edges are further scaled by their confidence score. This means:

User-created connections are always prioritized
High-confidence inferences boost results
Low-confidence guesses have minimal impact

The Type Inference Pattern

One of my favorite rules is type inference. MemoryGraph has a taxonomy of memory types: solution, problem, error, fix, pattern, etc.

But users often just dump content without classifying it. The inference engine can help:

InferenceRule(
    name="type_from_solves",
    description="If memory SOLVES a problem, infer it's a solution",
    query="""
        MATCH (m:Memory {id: $memory_id})-[:SOLVES]->(p:Memory {type: 'problem'})
        WHERE m.type = 'general' OR m.type IS NULL
        SET m.type = 'solution', m.type_inferred = true
        RETURN m.id as updated
    """
)

If you create a memory and link it with SOLVES to something typed as problem, the system infers your memory is a solution. Simple, but surprisingly useful for keeping the knowledge graph clean.

Cloud-Only Premium Features

We're building MemoryGraph as open-source with a cloud offering. Some inference rules only make sense (or are only cost-effective) in the cloud:

Affinity Detection - Find memories that share multiple connections:

MATCH (a:Memory {id: $memory_id})-[r1]-(common:Memory)-[r2]-(b:Memory)
WHERE a <> b
  AND NOT (a)-[:AFFINITY]-(b)
WITH a, b, count(DISTINCT common) as shared_count
WHERE shared_count >= 2
MERGE (a)-[r:AFFINITY {
    inferred: true,
    strength: toFloat(shared_count) / 5.0,
    shared_connections: shared_count
}]-(b)

Problem-Solution Bridging - Connect root causes to their fixes:

MATCH (cause:Memory)-[:CAUSES]->(problem:Memory)<-[:SOLVES]-(solution:Memory)
WHERE cause <> solution
MERGE (cause)-[:ADDRESSED_BY {inferred: true, via_problem: problem.id}]->(solution)

These run asynchronously in the cloud, invisible to users but enriching their knowledge graphs over time.

Handling False Positives

Inference isn't perfect. Sometimes the system will create relationships that don't make sense. Our mitigations:

Everything is marked - {inferred: true} means we can always filter it out
Confidence scores - Lower confidence = less impact on search
Periodic cleanup - A background job prunes old, low-confidence edges:

MATCH ()-[r {inferred: true}]-()
WHERE r.confidence < 0.3
  AND r.created_at < datetime() - duration('P30D')
DELETE r

User feedback - Future: let users thumbs-down bad inferences, feeding back into rule tuning

What's Next

This inference engine is the foundation for more ambitious features:

LLM-Powered Classification - For memories the rules can't classify, use a small/fast model:

async def llm_classify_memory(memory_id: str):
    memory = await db.get_memory(memory_id)

    if memory.type == "general":
        response = await anthropic.messages.create(
            model="claude-3-haiku-20240307",
            max_tokens=50,
            messages=[{
                "role": "user",
                "content": f"Classify: {memory.title}\n{memory.content[:500]}"
            }]
        )
        # Update memory type based on response

Temporal Inference - Memories created close together with shared tags are probably related:

MATCH (a:Memory), (b:Memory)
WHERE duration.between(a.created_at, b.created_at).minutes < 30
  AND any(tag IN a.tags WHERE tag IN b.tags)
MERGE (a)-[:TEMPORAL_PROXIMITY {inferred: true}]->(b)

Cross-Project Patterns - In enterprise deployments, detect common problem-solution pairs across teams (anonymized, of course).

Try It Yourself

MemoryGraph is open source: github.com/gregorydickson/memory-graph

The inference engine is coming in the next release. If you're building AI-powered developer tools and need persistent memory, give it a look.

Or if you just think graph databases and declarative inference are cool (they are), come contribute. We're always looking for new rules to add to the engine.

Building MemoryGraph at memorygraph.dev.

Tags: #ai #graphdatabase #python #opensource #devtools falkordb

Discussion Questions

I'd love to hear from the community:

What inference rules would be useful for your workflow? We're always looking for patterns that would help developers.
How do you handle "memory" in your AI tooling today? Curious what workarounds people have built.
Prolog nostalgia? Anyone else miss declarative logic programming? There's something elegant about it that modern systems have lost.

Drop a comment below or find me on GitHub.

DEV Community