CyborgNinja1

Posted on Apr 11 • Originally published at shieldcortex.ai

We Studied Claude Code's Source. Here's How Anthropic's AI Actually Remembers — And Why It's Broken.

#ai #security #opensource #machinelearning

When Claude Code's source was exposed via npm sourcemaps on March 31, 2026, we did what any security company would — we audited it.

Not to exploit it. Not to clone it. To understand how the most popular AI coding agent handles the thing that matters most: memory.

Here's what we found, what's broken, and what we built to fix it.

How Claude Code Remembers Things

Deep inside Claude Code's TypeScript source, there's a module called memdir — the memory directory system. It's more sophisticated than you'd expect:

1. Four Memory Types

Claude Code doesn't just dump everything into one bucket. It classifies memories into four types:

User — who you are, your preferences, your expertise level
Feedback — corrections you've given ("don't mock the database in tests")
Project — what's being built, deadlines, who's doing what
Reference — documentation, API specs, stable knowledge

Each type has rules about when to save and how to use it. This is smart design — it prevents the agent from treating a casual preference the same as a critical project deadline.

2. LLM-Powered Recall

Here's the surprising part: Claude Code doesn't just use embeddings for memory search. It uses Sonnet as a selector.

When you ask something, it:

Scans all memory file headers and descriptions
Sends the manifest + your query to Sonnet
Asks: "Which 5 memories are relevant?"
Loads only those files into context

This is smarter than pure vector similarity because the LLM understands intent, not just keyword overlap. But it's also slower and costs tokens on every recall.

3. DreamTask — The Agent That Sleeps

This is the most fascinating feature. Claude Code has a background task called DreamTask that runs while you're idle.

Like biological sleep, it:

Reviews recent sessions
Consolidates short-term memories into long-term storage
Merges duplicates
Prunes contradictions

The codebase literally calls it "dreaming." An AI agent that processes experiences into lasting memories while idle. That's not a gimmick — it's architecturally sound.

4. Two-Tier Architecture

Memory is split into:

MEMORY.md — an index file (max 200 lines) loaded every session
Topic files — detailed memories loaded on demand

The index acts as a router. The agent always knows what it remembers. It only loads how much it remembers when needed. This keeps context windows manageable.

The Three Critical Flaws

But here's where it gets concerning.

Flaw 1: No Staleness Decay

Claude Code has a memoryAge.ts module that calculates how old a memory is and adds warnings like "This memory is 47 days old. Claims may be outdated."

But this is just a text warning appended to the memory. There's no actual confidence decay. A 90-day-old memory about your codebase architecture is treated with the same weight as something saved today. The warning exists, but the system doesn't act on it.

In practice, this means stale code-state memories get asserted as fact. The agent "remembers" that UserService is in src/services/ — but you refactored it 3 weeks ago. The citation makes the stale claim more authoritative, not less.

Flaw 2: No Security Pipeline

This is the big one. Any content goes into memory without security scanning.

There's no:

Prompt injection detection on memory writes
Credential leak scanning
Encoding attack detection
Trust scoring by source
Anomaly detection on write patterns

If an attacker can get text into your agent's context (via a malicious README, a poisoned API response, a crafted error message), that text can end up in permanent memory. Next session, the agent loads it as trusted context.

This is memory poisoning, and Claude Code has zero defences against it.

Flaw 3: Single-Agent Only

Claude Code's memory is scoped to one user on one machine. There's a teamMem feature (behind a feature flag), but it's rudimentary — shared files in a team directory with no access control.

In a world where companies are deploying fleets of AI agents (we run 6), you need:

Private vs shared memory scopes
Per-agent access control
Cross-agent knowledge sharing with trust boundaries
Audit trails on who wrote what

Claude Code has none of this.

What We Built in 24 Hours

After studying the source, we shipped ShieldCortex v4.0.0 — taking the best architectural ideas and fixing the security gaps.

From Claude Code's Design (borrowed and improved):

Memory Type Taxonomy — user, feedback, project, reference types with validation
Dream Mode — background consolidation that merges duplicates, archives stale memories, and detects contradictions (shieldcortex consolidate)
Positive Feedback Capture — Claude Code only saves corrections. We also save confirmations: "This approach worked because..." Agents that only learn from failure become overcautious.

What Claude Code Is Missing (we added):

Staleness Scoring — actual confidence decay, not just text warnings. Memories older than 2 days get flagged. 30+ days triggers archival review.
6-Layer Defence Pipeline — every memory write is scanned for prompt injection, credential leaks, encoding attacks, and anomalous patterns before storage
Memory Scopes — private vs team scoping for multi-agent deployments
LLM Reranking — optional Sonnet-powered reranking on top of embedding search (inspired by Claude Code's approach, but configurable)
Save Filtering — blocks saving information that's derivable from code/git (file paths, import statements, env vars). Only stores what the codebase can't tell you.
Supply Chain Scanner — shieldcortex audit --deps catches malicious packages, typosquats, and suspicious postinstall scripts

npm install -g shieldcortex
shieldcortex consolidate          # Dream mode
shieldcortex cortex confirm       # Capture what worked
shieldcortex audit --deps          # Supply chain scan

589 tests. Full backward compatibility. Open source.

The Lesson

Claude Code's memory architecture is genuinely well-designed. The type taxonomy, LLM-powered recall, and DreamTask consolidation are smart engineering decisions.

But good architecture without security is a liability. Every memory write is an attack surface. Every recalled memory is an assertion the agent trusts. If you can poison the memory, you control the agent.

Anthropic built a brain. They forgot the immune system.

That's what ShieldCortex is.