Rakshit

Posted on May 27

Why I Built the "Infrastructure Layer" Under Every AI Coding Agents

#ai #webdev #programming #python

AI Coding Agents Still Forget Everything — So I Built the Memory Layer Underneath Them

AI coding agents are getting very good at editing files, running tests, and opening PRs.

After heavily using tools like Cursor, Claude Code, and GitHub Copilot, I noticed they all share the same core limitation:

They have no persistent understanding of your system.

Ask the same question next week and they:

re-read the repo from scratch
re-run expensive LLM calls
forget prior incidents
lose architectural context
and still don’t know what actually happened in production

So instead of building another coding agent, I built the layer underneath them.

Introducing ASIL

ASIL (Engineering Intelligence Infrastructure) is a persistent, temporal, causal knowledge graph for software systems.

It connects:

code
commits
deployments
incidents
logs
metrics
architecture drift
AI memory

into one queryable system that any AI agent can access through MCP.

The goal is simple:

Stop making AI agents rediscover the same engineering knowledge over and over again.

The Core Idea

Most coding agents understand:

the current codebase

ASIL understands:

how the system evolved
what changed
what broke
why it broke
and what evidence supports that conclusion

Instead of:

“GPT thinks this caused the outage”

ASIL derives causal chains from observable system state:

deployment timelines
incident timestamps
metric shifts
runtime dependencies
postmortems
service relationships

Every conclusion includes:

evidence
confidence scores
derivation chains
citations

No black-box “AI intuition.”

What ASIL Can Do

Ask Questions About a Repo

uv run asil ask "How does auth work in this repo?"

ASIL combines:

graph retrieval
vector search
verifier passes
episodic memory

to return:

cited answers
confidence scoring
cached reasoning for future sessions

Replay Production Incidents

uv run asil replay INC-2026-04-12

ASIL reconstructs:

deployment timelines
causal chains
affected services
architecture drift
metric changes

as a dependency-aware replay graph.

Think:

Time-travel debugging for distributed systems.

Detect Architecture Drift

uv run asil drift report

ASIL learns expected dependency boundaries and flags:

undocumented coupling
boundary violations
dependency creep

before the PR merges.

Work With Any Coding Agent

ASIL exposes 13 MCP tools usable from:

Cursor
Claude Code
OpenHands
Aider
or custom agents

The agents become clients of the intelligence layer.

The Unexpected Benefit: Massive LLM Cost Reduction

ASIL stores every verified engineering conclusion in persistent memory.

When someone asks a semantically similar question later, ASIL can reuse the prior verified reasoning instead of re-running the full LLM pipeline.

On cache hits, the cost drops close to:

just the embedding lookup

Repeated engineering queries become dramatically cheaper over time — especially across teams.

The Part I Care About Most

ASIL does not let the LLM invent causality.

That rule shapes the entire architecture.

Causal links come from deterministic signals:

temporal proximity
lagged correlation
explicit references
runtime graph relationships

The LLM consumes evidence.

It does not fabricate it.

That distinction matters once AI systems start participating in production engineering workflows.

Built for Local-First Engineering

Everything runs locally:

Neo4j
Qdrant
Postgres
Redis
Prometheus
Loki
Grafana

No central server.

No telemetry.

Your graph stays yours.

The only optional network dependency is the reasoning LLM.

Why I’m Building It Open Source

Most AI tooling is racing toward:

“make the agent better at editing code”

I think the more important problem is:

“give agents persistent engineering intelligence”

That means:

memory
causality
runtime awareness
architecture understanding
confidence-weighted reasoning
reproducible evidence

That’s the layer ASIL is trying to build.

Built solo over 6 months with:
Python, FastAPI, Neo4j, Qdrant, Postgres, Tree-sitter, Next.js, Tailwind, ReactFlow, and MCP tooling.

Top comments (1)

Harjot Singh • May 31

"No persistent understanding of your system" is the right diagnosis, and calling it the infrastructure layer is the right altitude, because this isn't a feature any single agent should own, it's a substrate they should all share. The re-read-the-repo-from-scratch tax is two costs in one: tokens (you pay to re-ingest the codebase every session) and quality (the agent rebuilds a shallow understanding each time instead of accumulating a deep one). A persistent memory layer underneath Cursor, Claude Code, and Copilot is exactly right because your system's architecture and your team's decisions don't change per-tool, so the understanding shouldn't be trapped in one vendor's session. The detail that decides whether it works is what you persist: raw file content re-loaded is just context-stuffing with extra steps, the win is durable distilled understanding (the architecture, the conventions, the why-we-did-it-this-way, and the never-do-this constraints) that the agent retrieves rather than re-derives. Memory as shared infrastructure, not a per-agent toggle. That cross-tool durable-understanding instinct is core to how I think about agent memory in Moonshift. What are you storing, distilled architecture/decisions, or an index of the code itself that the agent queries?