Austin Starks

Posted on Apr 13 • Originally published at nexustrade.io

Cursor beats Claude Code. Here's the memory architecture that proves it.

#ai #typescript #programming #tutorial

Note from the author: You're reading a Dev.to adaptation. The original on NexusTrade includes interactive trace viewers, animated diagrams, equity curve visualizations, and embedded course exercises. Read it there for the full experience.

If you are a serious AI practitioner, you know that Cursor is better than Claude Code.

The surveys disagree for now, but they won't for long. The Pragmatic Engineer's March 2026 survey found Claude Code has 46% developer love vs Cursor's 19%. Among the 55% of developers who regularly use AI agents, Claude Code is the clear leader at 71% usage.

Claude Code has real advantages. The Max plan gives you parallel agents with high usage limits. It's token-efficient, using 50-75% fewer tokens than older Anthropic models. Its SWE-bench scores are genuinely better.

But I use Cursor. The UX is in a different league. Composer 2 is fast, maintains coherence across hundreds of actions, plans across files before writing a single line. It lets me switch models per task. And most importantly, it does a better job managing memory. Cursor's memory is your codebase, and your codebase doesn't lie.

That last point is the one most people argue about without understanding the engineering underneath. This article is about the engineering underneath.

The problem every agent builder hits

The base model has no memory. You knew that. But the implications for agents are worse than they sound.

Every time you start a new session, the agent starts from zero. It doesn't know what it built last week. It doesn't know which strategies it already tested. It doesn't know that you told it three sessions ago to always use 3-year backtests instead of 1-year. It has to rediscover everything.

In practice, that means the agent wastes your iterations on exploration that already happened. When Aurora launched without memory, it took 20+ iterations just to get to a half-decent strategy on every single run. The agent was smart. It just couldn't remember. Every run was a cold start.

Memory is the engineering layer that fixes this. The question is how.

Q: Why does an AI agent forget everything between sessions, even if it performed well the last time?

A: Large language models are stateless by design. They process the tokens in the current context window and produce output. When the session ends, nothing persists inside the model. Anything the agent "learned" during the run only existed in the conversation context, which is gone. Memory is an external system built on top of the stateless model, not something the model itself provides.

Four ways to give an agent memory. Only one scales.

There's a spectrum of approaches, from naive to production-grade. Most tutorials only cover the first two. Here's all four, including what Cursor and Claude Code actually do.

Approach	How it works	Who uses it
In-context (file dump)	Copy the entire conversation into a file. Prepend it to the next session's prompt.	Toy projects, short sessions
LLM-maintained notes (Dream Mode)	Model reads the session and writes structured notes to organized markdown files. Consolidates during idle time.	Claude Code
Vector / RAG	Documents converted to embeddings. Semantic similarity search retrieves the most relevant chunks.	Cursor (codebase), many production apps
Structured DB + typed queries	Typed memory records with domain fields. LLM generates DB query fields. Retrieval is targeted, not fuzzy.	NexusTrade (Aurora)

How Claude Code actually handles memory: no vector database

When Anthropic accidentally shipped Claude Code's source code in March 2026, the most surprising finding wasn't the hidden virtual pet system or the "undercover mode." It was the memory architecture.

It doesn't use RAG or Pinecone. It's plain markdown files in a directory with a 25KB index cap. Anthropic invested in the maintenance loop instead of the storage layer.

The interesting part is Dream Mode: a 4-phase consolidation loop that runs during idle time.

Phase 1 — Orient: Reads the ENTRYPOINT.md index to understand what's already stored.

# Memory Index

## user-preferences.md
Prefers TypeScript. Strict mode always.
Dislikes verbose comments.

## project-context.md
Working on NexusTrade agent memory.
Stack: Node, MongoDB, Redis.

## corrections.md
2026-03-14: No any-type in TS.
2026-04-01: Short paragraphs preferred.

[index: 18.4 KB / 25 KB cap]

Phase 2 — Gather: Greps transcripts narrowly. The prompt is explicit: don't read everything. Look only for things you already suspect matter.

grep -n "prefer|always|never|told" ~/.claude/logs/2026-04-12.md

→ line 47:  "I prefer snake_case for vars"
→ line 203: "never use console.log in prod"
→ line 891: "always add JSDoc to exports"

# Full transcript: ignored.

Phase 3 — Consolidate: Merges new findings into existing notes. Converts relative dates to absolute. Deletes contradictions when new information overrides old.

BEFORE:
  "Yesterday: prefers short paragraphs"
  "Told to avoid console.log"
  "Prefers verbose comments"  ← stale

AFTER:
  "2026-04-11: short paragraphs"
  "never use console.log in prod"
# "verbose comments" deleted — contradicted by newer entry

Phase 4 — Prune: Enforces the 25KB cap. Removes stale pointers and compresses low-priority entries.

measure(ENTRYPOINT.md)  → 27.2 KB  (over cap)
remove_stale_pointers() → 25.8 KB
compress_entries(priority="low", n=3) → 24.1 KB  ✓
# Loop complete. Next run: next idle window.

The prompt literally says: "Don't exhaustively read transcripts. Look only for things you already suspect matter."

LLMs are already good at reading and writing text. The hard part of memory isn't storage. It's maintenance: keeping notes accurate, consolidated, and bounded. That's what Dream Mode solves, and it works.

There's a deeper limitation too. Dream Mode memory is an LLM writing notes about what it thinks happened. Those notes can drift. They can miss things. And as you'll see, they can be poisoned when the underlying data is wrong.

Cursor's memory is your codebase. It doesn't interpret. It doesn't summarize. It indexes what's actually there. For software development, that's the more trustworthy foundation.

How Cursor handles memory: a vector index of your codebase

Cursor solves a different problem than Claude Code. It's not trying to remember what you told it last week. It's trying to navigate a codebase it has never seen, at a scale where reading every file would be prohibitively expensive.

The architecture: Cursor uses tree-sitter to parse code into Abstract Syntax Trees, creating semantic chunks (classes, methods, functions). Those chunks get converted to vector embeddings and stored in Turbopuffer, a specialized vector database. Change detection uses Merkle trees to identify exactly which files changed without re-indexing the whole codebase.

If you haven't worked with vector embeddings before: an embedding model converts any piece of content into a list of numbers that captures its meaning. Things that mean similar things end up close together in that space. At query time, your question gets embedded the same way, and the database returns whichever stored chunks are mathematically nearest to it.

Cursor runs this same pipeline across your entire codebase. The indexing pipeline in five steps:

Step 1 — Parse: tree-sitter walks every source file and produces an Abstract Syntax Tree.

tree = parser.parse(source_file)

for node in tree.walk():
  if node.type in ['function_definition', 'class_definition', 'method_definition']:
    chunks.append({
      "text": node.text,
      "type": node.type,
      "file": file_path,
      "lines": (node.start, node.end)
    })

Step 2 — Chunk: Each AST node becomes an independent chunk with metadata. One function = one chunk.

Step 3 — Embed: Each chunk's text gets converted to a high-dimensional vector stored in Turbopuffer. Two chunks doing similar things will be close in vector space even if they share no words.

vector = embed_model.encode(chunk["text"])
# vector = [0.021, -0.14, 0.88, ...]  (1536 dimensions)

turbopuffer.upsert(id=chunk["id"], vector=vector, metadata={...})

Step 4 — Diff: When you save a file, Cursor hashes it and compares against a Merkle tree. Only changed files get re-chunked and re-embedded.

Query time — Retrieve: Your question gets embedded the same way. Turbopuffer finds the top-K closest chunks and injects them into the model context.

query_vec = embed_model.encode("how does auth middleware work?")
results = turbopuffer.query(vector=query_vec, top_k=20, filters={"language": "typescript"})
# inject results into model prompt

This is why Cursor wins for active development. If you ask it about a function defined three directories away, it finds it. The memory is your codebase, indexed semantically, updated incrementally.

The "rules" system (.cursor/rules/) is the persistent context layer on top: project conventions, coding standards, architectural decisions. Unlike Dream Mode, Cursor doesn't write these for you. You write them.

Claude Code vs. Cursor — memory comparison:

Capability	Claude Code	Cursor
Storage format	Markdown files (25KB index cap)	Vector embeddings (Turbopuffer)
Auto-writes memory	Yes (Dream Mode, idle consolidation)	No (you write the rules)
Codebase navigation	Limited (text search, no semantic index)	Excellent (AST parsing + semantic retrieval)
Cross-session learning	Yes (Dream Mode carries forward)	Partial (rules persist, embeddings persist)
Memory poisoning risk	Yes (bad sessions corrupt future context)	Low (codebase is source of truth)

Q: What is retrieval augmented generation (RAG) and when do you actually need it?

A: RAG is the pattern of storing documents as vector embeddings, then at query time retrieving the most semantically similar chunks and injecting them into the model context. You need it when your knowledge base is too large to fit in a context window, and when you need semantic similarity search across unstructured text. You don't need it when your knowledge is structured (a regular database query is faster and more precise), or when your documents are small enough to fit in the prompt directly.

What NexusTrade does differently: structured memory with semantic queries

Vector databases are great for fuzzy text matching: find me code that does authentication. They are terrible at structured constraints: find me only backtests for NVDA where the Sharpe ratio was above 1.5. For trading, memory isn't fuzzy. It's math.

NexusTrade stores memory as typed AgentSummary records in MongoDB. After each run (or at iteration 20, when the context window summarizes), the agent writes a structured document.

Each document has: semanticInsights (up to 24 deduplicated patterns from this run), proceduralLessons (up to 12 meta-lessons about how to run better next time), and structured portfolio records with tickers, strategy type, instrument type, and backtest metrics per period.

Before a new agent run, a separate fast LLM call reads the current conversation and extracts structured MongoDB query fields: which tickers are relevant, which strategy types, equity or options, any keywords. Then targeted retrieval injects only matching past summaries into the planner.

If you're asking about NVDA options, it pulls past NVDA options runs. If you're asking about momentum strategies, it pulls past momentum runs. The retrieval is typed, not fuzzy.

The result: agents that start sessions already knowing what worked, what failed, and what to avoid. Cold starts went from 20+ iterations to a fraction of that.

The part nobody talks about: memory poisoning

War story — the options backtest bug: During the options trading beta, I had a bug in the spread backtest engine. Credit spreads were being calculated incorrectly — sometimes showing catastrophic losses on strategies that would have been fine, sometimes showing phantom gains on strategies that would have lost money. The agent ran. It learned from the results. It wrote those lessons into memory. "Bull call spreads on META led to catastrophic losses." "Mean-reversion spreads on NVDA are unreliable." None of it was true. The strategies weren't bad. The backtester was wrong. The next runs came in already poisoned. The agent avoided spreads entirely — it was confident about it. It had "evidence." The fix was insightsPipelineVersion. Every AgentSummary document is written with the current pipeline version number. Retrieval only returns documents matching the current version. Bump the version and every old document goes silent instantly — still in the database for analytics, but they stop matching the filter. We're now on version 6. Each bump corresponds to a backtesting bug fix that would have corrupted agent memory if old summaries kept being injected.

This is the risk neither Cursor nor Claude Code faces at the same level. When your memory is downstream of a system that can be wrong, you need a versioning mechanism. A flat markdown file doesn't give you one.

Memory that makes the agent better over time

Storing memory is one thing. Using it to improve is another. NexusTrade runs a background worker that scores every completed agent run, extracts what worked, and injects those patterns into the next run automatically.

A prompt enhancer queries top performers and extracts their successful tool sequences. A planner enhancer finds few-shot examples relevant to the current task by keyword. Both cache results and inject learned patterns into future prompts automatically. Most agents plateau at run 1. The same output on run 50 as run 1, because nothing carries forward. This is the gap between a demo and a system.

Reading about memory isn't the same as using it.

The concepts in this article are straightforward on paper. File dumps, LLM summarization, vector embeddings, structured queries. But the questions that actually matter only come up when you run it: what does Aurora remember from your past sessions? What gets injected, and what gets filtered out?

Module 4 of AI Agents from Scratch puts you in the system directly. You query Aurora's memory and see what it has stored from your past runs. You watch the LLM read your prompt, generate query fields, pull matching summaries, and inject them before the agent starts.

You'll see exactly what the agent knows before it knows what you're about to ask. This is not a simulation. It's the live production system, and what it has stored is specific to you.

Start Module 4 — free, no credit card

Or open Aurora directly

Part 4 of 5 in the AI Agents from Scratch series.

Try NexusTrade's AI trading agent free: https://nexustrade.io