DEV Community

Cover image for I Built a Code Archaeology Engine for AI — Here's Why Claude and Cursor Keep Forgetting Your Architecture
EliotShift
EliotShift

Posted on

I Built a Code Archaeology Engine for AI — Here's Why Claude and Cursor Keep Forgetting Your Architecture

I Built a Code Archaeology Engine for AI — Here's Why Claude and Cursor Keep Forgetting Your Architecture

TL;DR: AI coding assistants have zero architectural memory.

LORE MCP Server Cover Every session starts from scratch. I built LORE — an open-source MCP server with 13 analyzers that gives your AI deep understanding of your codebase structure. Works with Claude Desktop, Cursor, and Windsurf.


The Problem: AI Has Amnesia

Every time you start a new AI coding session, the same ritual happens.

You explain: "We use PostgreSQL because..."

Then: "Auth uses JWT with 24h expiry..."

And: "Our API follows REST with /api/v1/..."

The AI nods, understands, writes some code. Session ends.

Next session? Complete amnesia. You repeat everything. Again. And again.

After the 50th time, I stopped explaining and started building.


What I Built: LORE MCP Server

LORE (Layout-Oriented Reverse Engineering) is a code archaeology engine that reads your TypeScript/JavaScript codebase and extracts deep architectural intelligence — automatically.

No manual documentation. No prompts to paste. No CLAUDE.md files to maintain.

npx lore-mcp init
Enter fullscreen mode Exit fullscreen mode

That's the setup. One command. LORE scans your entire project, runs 13 parallel analyzers, and feeds structured results to your AI assistant through the Model Context Protocol.


What Does LORE Actually Analyze?

LORE isn't a simple dependency checker. It runs 13 deep analyzers in parallel:

# Analyzer What It Finds
1 AST Parser Full TypeScript/TSX structure via ts-morph
2 Dependency Graph Every import, export, re-export in your project
3 Circular Dependencies Import cycles ranked by severity
4 Dependency Direction Layer violations (controller importing DB code)
5 Shannon Entropy Complexity scoring per file
6 Hotspot Analysis Files that change too often (git churn)
7 Import Impact Downstream blast radius of every import
8 Type Safety Scorer any usage, explicit types, strictness grades
9 Hidden Coupling Implicit dependencies through shared types
10 AI Recommendations Prioritized fix suggestions (P0–P3)
11 Tooling Config ESLint, Prettier, tsconfig validation
12 Breaking Changes High-risk deprecation patterns
13 Gap Analysis Missing error handling, testing gaps

Here's what lore status looks like on a real project:

$ lore status

  LORE MCP Server v0.1.6
  ────────────────────────────────────

  Architecture Analysis Complete

  ├─ Overall Score:      87/100
  ├─ Type Safety:        92/100
  ├─ Tooling Config:     78/100
  └─ Architecture:       91/100

  Circular Dependencies: 3 found (2 critical)
  Hotspot Modules:       5 detected
  Hidden Coupling:       8 links
  AI Recommendations:    12 suggestions

  Analysis complete — 0 errors, 0 crashes
Enter fullscreen mode Exit fullscreen mode

LORE Status Terminal

Real output from lore status on Express.js project


How It Works: 3 Steps

Step 1: Scan Your Codebase

LORE recursively walks your project tree, parsing every .ts and .tsx file. It reads your AST, maps imports/exports, tracks types, and understands your configuration files.

Step 2: Run 13 Analyzers in Parallel

All 13 analyzers fire simultaneously through a plugin pipeline:

  • Coupling matrices are computed
  • Dependency graphs are mapped
  • Hotspot scoring runs against your git history
  • Type safety is evaluated across every file
  • Circular dependencies are detected and ranked

Step 3: Feed Results to Your AI

Via MCP, Claude Desktop, Cursor, or Windsurf queries LORE on-demand. Your AI now reasons about real structural data — not guesses.

Ask Claude:

  • "What are the hidden coupling risks in my codebase?"
  • "Which modules are the biggest hotspots?"
  • "Show me all circular dependencies and their severity."
  • "What are the P0 recommendations?"

LORE runs the analysis and returns structured data. Claude interprets it and gives you actionable answers.


MCP Integration (60-Second Setup)

Add LORE to your MCP client configuration:

Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "lore": {
      "command": "npx",
      "args": ["-y", "lore-mcp"]
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Cursor — Add to your MCP settings:

{
  "mcpServers": {
    "lore": {
      "command": "npx",
      "args": ["-y", "lore-mcp"]
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Restart your AI tool. That's it. Your AI now has architectural memory.


Dependency Graph

LORE generates dependency graphs showing module relationships


Battle-Tested on Real Projects

LORE isn't a toy. I tested it on 16 major open-source TypeScript projects:

Project Files Analyzed Result
Express 42 100% Pass
Next.js 68 100% Pass
NestJS 38 100% Pass
Fastify 55 100% Pass
Prisma 45 100% Pass
Zod 35 100% Pass
TypeORM 60 100% Pass
React 73 100% Pass

16 projects. 100% pass rate. Zero crashes.


Cross-Platform: Runs Everywhere

LORE is built on Node.js with zero native dependencies. If Node.js runs on your system, LORE runs too.

  • macOS (Intel + Apple Silicon M1/M2/M3/M4)
  • All Linux distros (Ubuntu, Debian, Kali, Fedora, Arch, CentOS, Alpine)
  • Windows 10/11
  • CI/CD pipelines (GitHub Actions, GitLab CI, Jenkins)

Only requirement: Node.js 18+. No Docker, no VM, no Rosetta.


The Tech Stack

  • Language: TypeScript 5.5+
  • AST Parsing: ts-morph 21
  • Protocol: Model Context Protocol (MCP) SDK 1.0
  • Transport: Stdio (Claude Desktop / IDE compatible)
  • Validation: Zod schemas
  • Output: ANSI terminal, Markdown, SARIF

CLI Commands

lore [path]            # Analyze project (default: cwd)
lore init              # Extract architectural decisions
lore status            # View decisions by category
lore diff              # Diff against saved baseline
lore doctor            # Environment + tooling check
lore doctor --fix      # Auto-fix project setup
lore watch             # Watch + re-analyze on change
lore mcp inspect       # Inspect MCP server setup
lore mcp config        # Claude Desktop config snippet
lore version           # Show version
Enter fullscreen mode Exit fullscreen mode

Open Source & Local-First

LORE is 100% open source under MIT license. No data leaves your machine. No cloud. No API keys. No telemetry.

Everything runs locally. Your code never gets sent anywhere.


What's Next

  • [ ] LORE INTEGRITY — verify decisions are actually implemented
  • [ ] VS Code Extension
  • [ ] LORE NETWORK — share anonymous architectural patterns
  • [ ] Plugin API — write your own analyzers

Try It Now

# No install needed
npx lore-mcp init

# Or install globally
npm install -g lore-mcp
lore status
Enter fullscreen mode Exit fullscreen mode

GitHub: github.com/EliotShift/lore-mcp
npm: npmjs.com/package/lore-mcp
Docs: eliotshift.github.io/lore-mcp


Built with care from Morocco.

If you found this useful, give LORE a star on GitHub. It helps more than you think.

Top comments (9)

Collapse
 
deadbyapril profile image
Survivor Forge

We solved this differently — instead of generating context files, we built an MCP server that queries a live 130k-node knowledge graph (Neo4j). The agent reads AND writes to the graph mid-session, so architectural context persists across sessions naturally rather than being re-extracted each time.

The tradeoff vs LORE's approach: your graph schema becomes load-bearing infrastructure, and schema drift is brutal when multiple tools write to it. We ended up needing explicit migration tooling for the graph itself — something you'd avoid with LORE's read-only extraction model.

Curious about LORE NETWORK — when you add writable state in v0.2, how are you handling conflicts when Claude Code and Cursor both update the .lore/ graph simultaneously? That was our hardest problem.

Collapse
 
eliotshift profile image
EliotShift

This is an incredibly thoughtful critique — and you've pinpointed exactly why we're deliberately delaying writable state until we can do it without creating a "load-bearing schema."

Your Neo4j approach solves the persistence problem elegantly (130k nodes is no joke), but as you noted, the tradeoff is that the graph becomes the infrastructure. Schema drift becomes technical debt, and concurrent writes from multiple agents become a distributed systems nightmare.

Our answer to the conflict problem in v0.2 is: we refuse to solve it traditionally.

Instead of a centralized graph, LORE NETWORK uses an append-only event log per agent + CRDT-based merge strategies. Think Git for architectural memory.

Claude writes its analysis to ~/.lore/events/claude_123.lore

Cursor writes to ~/.lore/events/cursor_456.lore

No database locks. No central authority.

When you query, LORE reads all event logs and merges them using conflict-free data types designed for architectural metadata:

    Add-Wins Set for discoveries (duplicates collapse naturally)

    Multi-Value Register for severity scores (consensus emerges statistically)
Enter fullscreen mode Exit fullscreen mode

If Claude says a gap is HIGH and Cursor says MEDIUM, we don't see a conflict — we see a confidence distribution. The final severity becomes the weighted average, not a merge conflict error.

The philosophy difference: you built a living brain (Neo4j). We're building a distributed nervous system. Both approaches have merit, but ours avoids the schema migration hell you described by making the raw events immutable and the "truth" a computed view rather than stored state.

Would love to hear how you handled the "schema evolution" problem — did you end up versioning the graph itself or using something like conditional queries?

Appreciate the deep technical exchange — this is exactly the kind of conversation that pushes the space forward. 🙏

Collapse
 
deadbyapril profile image
Survivor Forge

The schema evolution problem is real, and I've handled it mostly by leaning into Neo4j's schemaless flexibility — MERGE patterns let you add properties without migrations, and conditional queries (using CASE or WHERE exists()) handle versioned data gracefully. The messier problem is relationship evolution: when you realize KNOWS should have been COLLABORATED_ON with temporal properties, there's no clean migration path. My answer has been to keep relationship types semantic and broad, then differentiate via properties. Your CRDT approach is genuinely elegant for the concurrency problem. The piece I'd push back on: materialized views vs computed-on-read. Append-only is clean for writes, but if your truth is a computed view over N agent event logs, read latency scales with history depth. At 130k nodes I'm already feeling this on complex traversals. How are you handling read performance as event logs grow?

Collapse
 
eliotshift profile image
EliotShift

This is exactly the kind of pushback I was hoping for thank you for going deep on the read-side implications.

You're absolutely right: append-only event logs are a write-optimized structure, and if every query requires replaying the entire history from N agents, read latency will eventually choke. This is the classic CQRS/Event Sourcing tension, and we're not immune to it.

Our answer is lazy, incremental checkpointing.

We don't replay all events on every read. Instead, LORE maintains a materialized snapshot of the merged architectural state (the "truth") that gets updated only when new events arrive. Think of it as a cache of the computed CRDT merge result.

Each agent's event log is timestamped with a session ID.

When you run lore analyze, we check if any new event files have appeared since the last snapshot was built.

If yes, we incrementally merge only the new events into the existing snapshot and write a new checkpoint.

If no, we serve directly from the snapshot O(1) read.
Enter fullscreen mode Exit fullscreen mode

This means read performance stays flat regardless of history depth. The tradeoff shifts to write-time (or more accurately, ingest-time) when new events arrive. But since analysis runs are already I/O-bound (reading source files), the incremental merge cost is negligible compared to the initial analysis.

The elegant part: Because we're using CRDTs, the merge operation is commutative and idempotent. We can merge events in any order, from any agent, and arrive at the same state. This makes incremental checkpointing trivial no complex dependency resolution between events.

The honest part: We haven't yet battle-tested this at 130k nodes. We're at ~500-2000 architectural "facts" per project, not graph-scale yet. When we hit the scaling wall, we may need to introduce sharded checkpoints by architectural domain (security facts vs coupling facts vs entropy facts) or adopt a log-structured merge tree approach.

But the core bet remains: architectural memory has different access patterns than knowledge graphs. Reads are bursty (triggered by lore analyze), writes are sparse (agent sessions), and the data is naturally partitionable by project. This gives us breathing room that a general-purpose knowledge graph doesn't have.

Your point about relationship evolution is spot-on. We sidestep it by making relationship types closed-set and semantic from day one (DEPENDS ON, AUTHENTICATES, CALLS), and pushing all nuance into properties. It's a constraint, but constraints are what keep the system predictable when multiple agents are writing.

Collapse
 
globalchatapp profile image
Global Chat

The amnesia framing is right, but there is a layer below it I keep hitting. Even with LORE feeding architectural context, different agents working on the same project (Claude Code, Cursor, a background CI agent) each build their own model from scratch. They do not share LORE's output or write back when one of them notices something new. Are you treating LORE as a read-only index, or could it become the write path too, where agents post findings back so the next session starts from an updated graph instead of re-scanning?

Collapse
 
eliotshift profile image
EliotShift • Edited

This is exactly the right question. LORE today is read-only —
but you're describing LORE NETWORK, which is the next major
milestone. The plan is:

  1. .lore/ directory as persistent knowledge graph (JSON)
  2. Every agent session writes findings back after analysis
  3. Next session starts from updated graph, not re-scan
  4. Multi-agent: Claude Code, Cursor, CI all read/write same graph

You're not the only one hitting this wall. It's the biggest gap
in AI-native development right now. LORE NETWORK is designed
to solve exactly this. Stay tuned it's coming in v0.2

Collapse
 
inpacchi profile image
Yovarni Yearwood

Definitely is the way things need to go to take agentic coding to the next level. I’ll be looking into this.

Collapse
 
eliotshift profile image
EliotShift • Edited

Thank you! 🙏 Honestly, hearing that from another dev means a lot. We're all collectively figuring out what "agentic memory" actually looks like, and I'm just glad LORE resonates with where you see the space heading.

If you do give it a spin, fair warning: v0.1.6 is still rough around the edges (working on v0.1.7 right now to fix the gap detectors and MCP config). But the architectural analysis core is solid entropy scoring, hidden coupling, circular deps, and incremental caching all work.

Would genuinely love to hear your thoughts if you try it on a real project. The feedback from actual usage is what shapes the roadmap.

Repo's at: github.com/EliotShift/lore-mcp

Appreciate you taking the time to comment it's these small interactions that make open source worth building. 🚀

Collapse
 
itskondrat profile image
Mykola Kondratiuk

the harder problem isn't the decision itself - it's "why NOT the alternative." that counter-reasoning disappears between sessions and is the part context engines struggle to surface.