DEV Community: martinlepage26-bit

I tried 4 approaches to AI agent memory. Here's what actually worked.

martinlepage26-bit — Thu, 07 May 2026 18:41:46 +0000

I tried 4 approaches to AI agent memory. Here's what actually worked.

Six months ago I started building a governance SaaS product with Claude Code as my primary dev partner. The codebase grew. The context problem grew faster.

I tried four approaches to keeping the agent oriented across sessions. Three of them failed in predictable ways. Here's what I learned from each.

Approach 1: Long CLAUDE.md

The obvious starting point. One file, everything in it — project description, architectural decisions, tech stack, naming conventions, open questions, constraints, active tasks.

What happened: It worked for the first two months. Then the file hit ~600 lines and started failing silently. The agent would read it, acknowledge constraints, then propose something that violated a constraint buried in paragraph 14. It wasn't hallucinating — it was attending correctly to the first ~300 tokens and poorly to the rest.

The failure mode: Flat context doesn't scale. The most relevant information competes with everything else. As the file grows, the signal-to-noise ratio drops and you can't fix it by curating better — the file just becomes a negotiation between what you cut and what the agent needs.

When it works: Projects with a short, stable context that fits in ~200 lines and doesn't evolve much. Anything living longer than a month will outgrow it.

Approach 2: Raw note dump + grep/search

Second attempt: put everything in a directory of Markdown files, let the agent search when it needs context.

What happened: The agent searched correctly but retrieved fragments. A decision log retrieved in isolation — without the surrounding context of what problem it was solving, what came before it, what constraints shaped it — is almost useless. The agent would find the "what" without the "why."

The failure mode: Full-text search retrieves by keyword match, not by meaning. And even when it retrieves the right note, a standalone note without links to adjacent concepts gives the agent a fragment, not understanding.

When it works: Narrow, well-scoped queries where the right answer is self-contained. Not for architectural context that depends on a web of prior decisions.

Approach 3: Embeddings + semantic search (RAG)

Third attempt: embed all notes with sentence-transformers, query by cosine similarity, feed top-k results as context.

What happened: Better recall than keyword search, but a new failure mode appeared — similarity isn't relevance. A note about your authentication design and a note about your deployment checklist can be equally "similar" to a query about your API architecture. The model retrieved plausible-sounding context that wasn't actually the right context.

More importantly, RAG returns chunks. Chunks don't have relationships. The agent got the right paragraph but not the connected decision that made that paragraph meaningful.

The failure mode: Semantic similarity measures distance in embedding space, not logical relevance in your project. And chunked retrieval destroys the graph structure that makes notes meaningful to each other.

When it works: Finding notes you forgot existed, surfacing material you didn't know to search for. Good as a discovery layer, bad as the primary retrieval mechanism for agent context.

Approach 4: Structured knowledge graph with mandatory inline linking

What finally held up: restructuring all project knowledge as a linked graph, where the agent navigates by traversal rather than by reading or searching.

The structure:

raw/       unsynthesized captures, never modified
wiki/      synthesized notes — each requires ≥2 inline [[links]]
CLAUDE.md  ~50 lines pointing to the project hub

The key constraint: every note in wiki/ must link to at least two related notes in the body — not in a trailing "Related" section. A link in the body means the connection is part of the reasoning, not an afterthought. Orphan notes don't exist to the agent.

Why traversal beats retrieval: The agent starts at CLAUDE.md, follows the link to the project hub, follows links from there to the decision log and active constraints, and reaches relevant context in 3 hops — without searching, without reading everything, without similarity scoring.

The note type that changed everything: Decision logs with a "Rejected Alternatives" section. Not just what was decided, but what was explicitly ruled out and why. The agent reads this before suggesting anything architectural. It doesn't re-propose the rejected approach because it already knows why it was rejected.

What broke the pattern: Notes without links. An insight captured in isolation is invisible to traversal. The discipline of linking before saving — finding the related project, person, concept, or decision and wiring the new note into the graph — is what makes the whole system work. It takes 30 extra seconds per note. It saves 15 minutes per session.

What I'd do differently

Start with the graph structure from session one, not after the context problem appears. The worst time to restructure your knowledge is when you're 150 notes in with no backlinks.

The minimum viable version: one project hub note, one decision log, one active constraints note, wired together. Three notes, all linked. Everything else can come later.

I packaged the vault structure — skeleton, templates, note types, skill guides, optional local runtime — as a $49 template: Obsidian Agent Vault

The four approaches above are what I went through before landing on the graph structure. Hopefully this saves you the same six-month detour.

Why Your CLAUDE.md Fails at Scale (and What to Replace It With)

martinlepage26-bit — Thu, 07 May 2026 18:41:10 +0000

CLAUDE.md is one of the best ideas in AI-assisted development. A project-level instruction file the agent reads before every session — architecture overview, tech stack, constraints, conventions. For small projects and short contexts, it works well.

At scale it breaks. Here's exactly where, why, and what to use instead.

The three failure modes

Failure 1: Attention dilution past ~300 tokens

Language models don't read flat files the way humans do. They weight content throughout the context, and that weighting isn't uniform — content earlier in a long file tends to receive more attention than content buried in the middle.

Past about 300 tokens (roughly 50-60 lines), a CLAUDE.md becomes a competition. The specific architectural constraint you need the agent to apply right now is competing for attention with the boilerplate at the top, the tech stack list, the deployment instructions, the conventions section.

The result: the agent reads the file but acts on the parts that got weighted highest — which may not be the parts relevant to your current task.

Failure 2: Stale context mixed with current context

CLAUDE.md files accumulate. You add the new constraint when it's established. You rarely remove the old one when it's superseded. Six months in, the file contains:

Constraints that no longer apply (the auth approach you changed in February)
Architectural decisions that have been superseded (the caching strategy you replaced)
Notes that were relevant during one sprint but shouldn't be shaping AI behavior now
Actual current constraints buried among the outdated ones

The agent can't distinguish fresh from stale. It treats a constraint from eight months ago with the same weight as one established last week, unless you've carefully dated and annotated everything — which adds length and compounds the attention problem.

Failure 3: One file can't serve multiple retrieval needs

Different sessions need different context. A session working on the auth flow needs auth architecture, the security constraints, the token handling decisions. A session working on the data pipeline needs schema decisions, the ETL constraints, the performance requirements.

A flat CLAUDE.md either includes everything (long, diluted) or misses something (short, incomplete). There's no way to load exactly what's relevant for the current session without pre-loading everything and hoping attention lands on the right parts.

The breaking point

In practice, CLAUDE.md works reliably up to about 50-80 lines (300-500 tokens). Most projects hit this limit within the first two months. After that, adding more content actively degrades the context quality — longer file, more competition, worse attention on the sections that matter.

The symptom: the agent starts making suggestions that violate constraints you know are in the file. Not because it didn't read it — because the constraint was weighted below the threshold of effective influence.

The replacement: entry file + knowledge graph

The pattern that scales is a short entry file that boots traversal into a structured knowledge graph. Instead of one long file the agent reads linearly, you give it a 50-line entry point that points to exactly what it needs.

The entry file (CLAUDE.md) — stays short, always:

# [Project Name]

## Start here
1. Read [[Session State — Project Name]] → current position, next step, open questions
2. Read [[Active Constraints — Project Name]] → non-negotiable limits for this project
3. Navigate to [[Project Hub]] for architecture and decisions

## Quick reference
- Stack: React 18 + FastAPI + Cloudflare Workers
- Repo: ~/repos/project-name  
- Deploy: `npm run build && wrangler deploy`
- Tests: `pytest` (backend) / `npm test` (frontend)

## Agent behavior
Load session state before proposing anything.
Check active constraints before suggesting solutions.
Do not re-propose items in decision logs marked as rejected.

That's the whole CLAUDE.md. 30 lines. Clear traversal instructions. No content — only pointers.

The knowledge graph (in Obsidian or any linked markdown system):

Session State — Project Name.md    ← 5-field operational record, updated every session
Active Constraints — Project Name.md  ← deployment, compliance, architecture, scope limits
Project Hub.md                     ← links to all decision logs, architecture notes, open work
Decision Log — Auth Layer.md       ← decision + reasoning + rejected alternatives
Decision Log — API Gateway.md
Architecture — Data Pipeline.md
Open Questions — Project Name.md

Each file is focused. Each file is current — because you update the specific file when something changes, not the monolithic CLAUDE.md. The agent reads only what's relevant to the current session.

How the agent traverses it

At session start, the agent reads the entry file (30 lines, under 200 tokens). The entry file points to session state and active constraints — the two files relevant to every session. The session state points to the project hub; the hub points to the specific decision logs and architecture notes relevant to what you're working on.

The agent follows links. Two or three hops reaches any relevant context. No attention dilution — it's loading targeted files, not scanning a wall of text.

Migration: what to pull out of CLAUDE.md

If you have a long CLAUDE.md that's already hitting these failure modes, migration is straightforward:

What's in your CLAUDE.md	Where it goes
Current project state	`Session State` — update it every session
Non-obvious rules and limits	`Active Constraints` — organized by category
Architectural decisions	`Decision Log — [Topic]` — with reasoning and rejected alternatives
Out-of-scope items	`Active Constraints` → Scope Boundaries section
Tech stack / quick reference	Stays in `CLAUDE.md` — this is genuinely useful at the top level
Deployment commands	Stays in `CLAUDE.md`

After migration, your CLAUDE.md should be under 50 lines. If it's still longer than that, you haven't moved enough out.

The compounding benefit

The graph structure doesn't just fix the scale problem — it builds compounding context. Each decision log entry becomes a durable record. The session state carries the project forward. The constraints file stays current as the project evolves.

Six months in, you have an accumulated record of every decision, every rejected alternative, every constraint — all retrievable in two hops, all informing every session. The context quality improves continuously as you work, not just when you remember to update CLAUDE.md.

The Obsidian vault template that ships with this graph structure — entry file pattern, session-state format, active constraints template, decision log format, hub template, skill guides — is $49.

→ https://pharosml.gumroad.com/l/kvbhdo

If you're hitting CLAUDE.md length problems on your current project, the migration takes an afternoon. The template has the target structure ready to populate.

Managing AI Context Across Multiple Projects Without Context Bleed

martinlepage26-bit — Thu, 07 May 2026 18:40:34 +0000

Single-project AI workflows are tractable. You have one CLAUDE.md, one context, one set of decisions. You brief the agent once, keep the session-state current, and it works.

Multi-project workflows break this in three specific ways.

Context bleed. You've been working on Project A all morning. The auth architecture, the deployment constraints, the open questions — all of that is live in your session. You switch to Project B. The agent carries residual context from A into B. It makes suggestions that would be correct for A but are wrong for B.

Re-briefing overhead. You switch projects cleanly, but now you need to re-establish context for B from scratch. Fifteen minutes of re-orientation. You do this every context switch, several times a day.

Decision contamination. You've made decisions on Project A that you're tempted to apply to Project B because they're fresh in mind — even though B has different constraints that make those decisions wrong.

I run six simultaneous workstreams. Here's the structure that eliminated all three.

The core principle: one session-state per project, never shared

The session-state is the working memory for a single project at a single point in time. It must be completely isolated. Nothing from Project A's session-state should ever appear in Project B's.

This sounds obvious. The implementation is less obvious.

The naive approach — one CLAUDE.md per project directory — works if your projects live in separate repos you never open simultaneously. It breaks the moment you're switching between projects in a shared knowledge environment, or when projects share common concepts and you're not careful about which project's context the agent is loading.

The structure that actually works has three layers.

Layer 1: Project-isolated context files

Each project gets its own set of context files:

wiki/
  projects/
    project-alpha/
      Alpha Hub.md          ← entry point, links out
      Alpha Session State.md ← 5-field operational record
      Alpha Constraints.md   ← non-obvious limits
      Alpha Decision Log.md  ← decisions + rejected alternatives
    project-beta/
      Beta Hub.md
      Beta Session State.md
      Beta Constraints.md
      Beta Decision Log.md

Nothing is shared between these directories. If two projects share a technology or pattern, they each link to the same concept note in wiki/concepts/ — but their session-states, constraints, and decision logs are completely separate.

Layer 2: The project hub as session entry point

Each project hub is a clean entry point that loads exactly what the agent needs for that project — and nothing else.

# Project Alpha Hub

**Active as of:** 2026-04-20

## What This Is
One sentence. What Project Alpha is building.

## Current State
→ Read [[Alpha Session State]] — updated at every session close

## Constraints
→ [[Alpha Constraints]] — non-obvious limits, always check before proposing

## Decisions
→ [[Alpha Decision Log]] — what was decided and what was rejected

## Key Context
- Stack: [specifics]
- Repo: ~/repos/alpha
- Deploy: [specifics]

## For the agent
Load session state first. Check constraints before proposing solutions.
Do not apply patterns from other projects unless explicitly asked.

The last line is load-bearing. It explicitly tells the agent not to transfer patterns from other sessions. This won't prevent all context bleed (the agent doesn't have true memory isolation), but it establishes a clear behavioral norm.

Layer 3: The routing entry file

With six projects, you need a routing layer — a master entry that points to the right hub without loading everything at once.

# Project Router

## Active Projects

| Project | Hub | Last Session | Status |
|---------|-----|-------------|--------|
| Alpha | [[Alpha Hub]] | 2026-04-20 | Active |
| Beta | [[Beta Hub]] | 2026-04-19 | Blocked — waiting on client |
| Gamma | [[Gamma Hub]] | 2026-04-18 | Active |
| Delta | [[Delta Hub]] | 2026-04-15 | Parked |
| Epsilon | [[Epsilon Hub]] | 2026-04-20 | Active |
| Zeta | [[Zeta Hub]] | 2026-04-10 | Closing |

## How to start a session
1. Name the project in your first message
2. Load the hub for that project
3. Read session state before anything else
4. Work within that project's context only

Your CLAUDE.md points to this router. At session start, you tell the agent which project you're working on. It loads the hub, reads the session-state, and operates in that project's context.

The key: you only load one hub per session. The router exists so you can see all projects at a glance, but loading it doesn't load all the session-states. You navigate to the right one explicitly.

The context-switch protocol

When switching projects mid-session:

Close the current project. Update its session-state (what changed, what was decided, next step). This takes 3–4 minutes.
Start a new session (or explicitly tell the agent you're switching projects). Don't try to continue in the same session with a different project context — the residual is too strong.
Open the new project. Tell the agent which project you're switching to. Load its hub. Read its session-state. Work.

The session-state update at step 1 is the mechanism that makes the switch clean. If you don't update before switching, you lose the current project's state and you can't re-establish it without re-reading everything.

Handling shared concepts

Projects often share concepts: a shared API, a common design system, a regulatory framework both projects operate under.

These go in wiki/concepts/ or wiki/shared/ — notes that any project can link to but that don't belong to any single project.

wiki/
  concepts/
    EU AI Act Requirements.md   ← both Projects Alpha and Gamma link here
    Design System Tokens.md     ← shared by Alpha, Beta, Epsilon
    PIPEDA Compliance Notes.md  ← shared regulatory context

Each project's constraint file links to the relevant shared concepts:

# Alpha Constraints

## Regulatory
- PIPEDA applies — see [[PIPEDA Compliance Notes]] for specifics
- EU AI Act: deferred to v2 (see [[EU AI Act Requirements]] for scope)

The shared concept note is updated once. All projects that link to it get the update automatically. No duplicated content, no risk of projects having different versions of the same regulatory information.

What this looks like in practice

A typical day running this system:

9:00am — Project Alpha session
Open Claude. Load Alpha Hub. Read Alpha Session State. Work on the endpoint spec that was the "next step" from yesterday's close. 90 seconds to orient.

11:30am — Switch to Project Gamma
Close Alpha: update session-state (decided on caching strategy, rejected option C, next step is load testing). Start fresh session. Load Gamma Hub. Read Gamma Session State. 90 seconds to orient. Work.

2:00pm — Back to Alpha
Start fresh session. Load Alpha Hub. Read Alpha Session State (which I updated at 11:30). Pick up from the next step I left. No re-briefing. 90 seconds.

The context bleed is gone because each session is a clean start from an isolated session-state. The re-briefing overhead is gone because the session-state captures everything. The decision contamination is gone because each project's decision log is separate.

The cost without this structure

Without isolated session-states, context-switching has a compounding tax. Each switch costs 15 minutes of re-orientation. Each project degrades the next one slightly through residual context. Decisions made in one project bleed into another.

For six projects, that tax is 1–2 hours of overhead per day. Over a month, it's real time lost.

The structure above takes half a day to set up once. It pays back within the first week.

The vault template that ships with the per-project hub structure, session-state format, constraint files, decision logs, and routing pattern is $49.

→ https://pharosml.gumroad.com/l/kvbhdo

$299 for a guided setup — I configure the multi-project structure for your specific workstream mix. Worth it if you're running 4+ simultaneous projects and the context overhead is already noticeable.

Why AI Code Review Keeps Flagging the Wrong Things (and How to Fix It)

martinlepage26-bit — Thu, 07 May 2026 18:39:58 +0000

AI code review has a consistent failure mode: it surfaces generic issues and misses the ones that actually matter for your project.

You paste a function. The AI flags potential null pointer issues, suggests adding error logging, recommends extracting a constant. These are fine observations — for a codebase it knows nothing about. But it misses the actual problem: this function violates the constraint that all auth-path operations must complete under 200ms, which you established three months ago when you separated the auth layer from the API gateway.

The AI didn't miss this because it's bad at code review. It missed it because you never told it the constraint existed.

What AI code review is actually reviewing

When you paste code for review without context, the AI is doing generic static analysis with pattern matching against common issues. It's reviewing your code against:

General best practices for the language
Common error patterns it's seen in training
Standard code quality heuristics

It is not reviewing against:

Your architectural decisions
Your explicit constraints
The alternatives you already ruled out and why
The tech debt you're intentionally carrying
The SLA requirements that define what "good" looks like for this specific function

The gap between "generic code review" and "project-aware code review" is the difference between a smart stranger looking at your code and a teammate who's been on the project for six months.

The four things AI code review doesn't know without you

1. Architectural decisions that constrain "correct"

Your auth layer is separate from the API gateway. Any suggestion that couples them is wrong — not because coupling is bad in general, but because you've decided it's wrong for this project for specific reasons (independent scaling, latency isolation, deploy cycle separation).

Without knowing this, the AI will periodically suggest approaches that violate your architecture. These suggestions look reasonable in isolation.

2. Intentional tech debt

You have a known N+1 query in the profile endpoint. You know about it. You've decided to fix it in Q3 when you migrate to the new data layer. Until then, don't touch it.

Without knowing this, the AI will flag it every time you paste code that touches the profile endpoint. You'll spend time explaining why you're not fixing it right now.

3. Non-obvious constraints

Your application runs on Cloudflare Workers. There's no filesystem. No long-running processes. Any suggestion that involves either of those is invalid — not occasionally, but categorically, for every function in the codebase.

Without knowing this, the AI will suggest solutions that are architecturally impossible for your deployment target.

4. Rejected alternatives

You evaluated three caching strategies last month. Two were ruled out for specific reasons. The AI doesn't know this. It will suggest one of the rejected approaches as a "potential improvement."

Without knowing this, you'll spend time re-evaluating an approach you already rejected.

The fix: a context file for code review

Before asking for code review, give the AI a context file. Not a full architecture document — a focused summary of what the AI needs to review correctly.

# Code Review Context — [Project Name]

## Architecture Constraints
- Auth layer is separate from API gateway (independent scaling + latency isolation)
  - Do NOT suggest coupling these
- Deployment: Cloudflare Workers (no filesystem, no long-running processes, no Node.js APIs)
  - Any suggestion requiring filesystem or persistent process is invalid

## Performance Requirements
- Auth-path endpoints: < 200ms P95
- Data-path endpoints: < 500ms P95
- Flag anything that adds synchronous operations on the hot path

## Known Tech Debt (Do Not Flag)
- N+1 query in profile endpoint — tracked, fixing in Q3 migration
- Legacy error format in /api/v1/* routes — maintained for backwards compat

## Rejected Approaches
- In-memory caching: rejected — Workers are stateless, cache doesn't persist between requests
- Unified middleware: rejected — couples auth and data deploy cycles
- Session tokens in KV: rejected — doesn't meet compliance requirements

## What to Focus On
- Correctness against the constraints above
- Edge cases specific to the Cloudflare Workers runtime
- Anything that violates the auth/data separation

With this context loaded, the AI reviews against your actual project requirements. It won't suggest unified middleware — it knows that was rejected. It won't suggest filesystem operations — it knows the deployment target. It will flag the 200ms constraint violation you actually need to catch.

Where this lives in a structured knowledge system

If you're using a knowledge vault for project context, the code review constraints belong in your active constraints note — the same one your CLAUDE.md points to for every session.

# Active Constraints — [Project Name]

## Deployment
- Cloudflare Workers only — no filesystem, no persistent processes

## Performance
- Auth path: < 200ms P95
- Data path: < 500ms P95

## Architecture (locked decisions)
- Auth layer separate from API gateway (see [[Decision: Auth Layer Separation]])
- No unified middleware (see [[Decision: API Gateway Architecture]])

## Known Tech Debt (intentional, do not flag)
- N+1 query in profile endpoint — scheduled for Q3 migration

When this note is linked from your project hub and the CLAUDE.md points to the hub, the agent reads the constraints before every session — including code review sessions. You don't have to paste the context file manually each time.

The shift in review quality

Code review without project context: generic, noisy, misses architectural violations, surfaces known tech debt as new findings.

Code review with project context: project-specific, accurate, flags actual violations of your constraints, respects intentional debt decisions.

Same model. Same code. Different context.

The vault structure that maintains this context automatically — active constraints note, decision logs with rejected alternatives, hub template, session-state — is packaged as a $49 Obsidian template.

→ https://pharosml.gumroad.com/l/kvbhdo

$299 guided setup for teams who want it configured for their specific stack. The code review context pattern above is one of eight note types included in the template.

How to Architect AI Agent Memory That Survives Context Window Limits

martinlepage26-bit — Thu, 07 May 2026 18:39:23 +0000

The most common advice for giving Claude Code project context: write a CLAUDE.md file. Put your architecture decisions, tech stack, constraints, and current state in there. Keep it updated.

This works until it doesn't.

Past about 300 tokens, attention dilutes. The most relevant constraint competes with everything else in the file. You end up with a CLAUDE.md that's 2,000 lines long and still misses context you need. The agent reads the whole thing and effectively prioritizes the first third.

The fix isn't a better CLAUDE.md. It's a different retrieval architecture.

The core problem: retrieval, not storage

When you give an agent a long flat file, the retrieval model is "read everything." That's fine for small amounts of context. For anything beyond a few hundred tokens, it degrades — not because the model is bad, but because you're asking it to find a needle in a haystack it has to read start-to-finish.

The architecture that works is graph traversal: the agent starts from a short entry point and follows links to reach relevant context. Three hops covers anything specific. You never load everything at once.

The three-zone structure

raw/              ← unsynthesized captures (never modified by the agent)
wiki/             ← synthesized, linked knowledge notes
session-state.md  ← live operational context per project
CLAUDE.md         ← 50-line entry point → graph

Zone 1: raw/

Captures go here exactly as written — meeting notes, paste-ins, half-formed thoughts. The agent knows this is staging material, not established knowledge. It can reference it but should never reason from it as settled fact.

Zone 2: wiki/

Synthesized notes only. Each note requires at least two inline [[links]] — not a trailing "Related" section, but links woven into the body where the connection is actually made. This creates the traversal graph.

Zone 3: session-state.md

Five fields, updated at every session close:

## Objective
What this project is trying to accomplish.

## Active Constraints
- Deployment: Cloudflare Workers only (no Node.js runtime)
- Compliance: PIPEDA in scope; EU AI Act deferred to v2
- Timeline: Revenue-positive by June 22

## Decisions Made
- API gateway scoped separately from auth layer
  (reasons: latency isolation, independent scaling)
- Rejected: unified middleware — couples deploy cycles,
  adds latency on every data request

## Open Questions
- [ ] Whether to proxy through gateway on internal calls
      (UNRESOLVED — blocker for /auth/verify implementation)
- [ ] Caching strategy for user profile endpoint

## Next Step
Write /api/auth/verify endpoint spec.
Internal proxy question must be resolved first.

The CLAUDE.md entry pattern

The entry file is not where context lives. It's where traversal starts.

# Project: [Name]

## Current State
→ Read [[Session State — Project Name]] for live context.

## Architecture
→ [[Project Hub]] — technical decisions entry point
→ [[Active Constraints]] — non-obvious limits in effect
→ [[Decision Log Index]] — decisions made + alternatives rejected

## Quick Context
- Stack: React 18 + FastAPI + MongoDB + Cloudflare Pages
- Repo: ~/repos/project-name
- Deploy: `npm run build && wrangler publish`

## Agent Behavior
Read session-state first. Follow links to relevant context
before proposing solutions. Do not re-propose options listed
in "Alternatives Rejected" in any decision log.

~50 lines, ~400 tokens. The agent reads this, then follows links to retrieve exactly what it needs. Context in 2–3 hops; never loads everything.

The decision log — highest ROI note type

# Decision: [Title]

**Date:** 2026-03-12  
**Status:** Locked

## Decision
[What was decided]

## Reasoning
[Why this approach. The non-obvious parts.]

## Alternatives Rejected
- [Option A]: rejected because [specific reason]
- [Option B]: rejected because [specific reason]

## Open Questions
- [ ] [Anything still unresolved about this decision]

The "Alternatives Rejected" section is what earns the most. When this note is linked from the project hub, the agent reads it before proposing anything. It doesn't re-propose Option A — it already knows why you said no.

Without this record: the agent periodically re-proposes rejected approaches because the reasoning that ruled them out only existed in a closed chat window.

The mandatory linking rule

Every wiki/ note requires at least two inline [[links]]. Not links in a trailing section — links woven into the body where the connection is made.

This isn't aesthetic. A note with no backlinks is an orphan: the agent can't traverse to it. Your decision log doesn't exist to the agent if nothing links to it.

The enforcement pattern that works: hub templates that scaffold the link structure before you fill in content.

# [[Project Name]] Hub

## Current State
→ [[Session State — Project Name]]

## Technical Architecture
→ [[Architecture Overview]]
→ [[Active Constraints]]
→ [[API Design Decisions]]

## Open Work
→ [[Sprint Log]]
→ [[Open Questions — Project Name]]

## Key People
→ [[Client Name]]
→ [[Stakeholder Name]]

Hub creates entry points. Entry points create traversal paths. Traversal paths mean context reaches the agent without the agent reading everything.

Optional: local runtime for offline or local-model use

For Ollama / LM Studio / llama.cpp workflows, the vault ships with a Python runtime:

setup.sh — installs deps, builds the vector index:

pip install sentence-transformers
python embed.py   # ~2 min for 200+ notes, all-MiniLM-L6-v2

ask.py — hybrid query (vector similarity + backlink traversal):

python ask.py "what constraints apply to the auth module?"
python ask.py "what did we decide about the API gateway?" --top 5
python ask.py "current project state" --full

vault_watcher.py — watches for new notes, updates index on save.

Why hybrid? Vector similarity retrieves semantically close notes; backlink traversal then widens the result by following links from those notes. You get related content by meaning and by structure — chunks with relationships, not just chunks.

Runs fully on-device. No cloud required.

Results

After 212 notes and six months daily use:

Metric	Before	After
Session startup	15 min	90 sec
Re-proposed rejected approaches	Weekly	Never
Handoff cost on context-switch	Full re-briefing	Read the hub

Same model. Different retrieval architecture.

The skeleton

The vault — note types, hub templates, decision-log format, skill guides, session-state protocol, and local runtime — is packaged as a $49 Obsidian template.

→ https://pharosml.gumroad.com/l/kvbhdo

Also: $299 guided setup (structure configured for your specific project type), $2,500 for teams who want a shared memory layer.

The architecture above is the complete system. The template is six months of iteration baked into a skeleton you can drop into an existing Obsidian vault in an afternoon.

The 3 Notes Every AI-Assisted Project Needs Before the First Session

martinlepage26-bit — Thu, 07 May 2026 18:38:26 +0000

Most advice about AI context management assumes you're building a full system. Vault architecture, note types, skill guides, linking rules — useful eventually, but a lot to absorb before you've even started.

This is the minimal version. Three notes. You can create them in 20 minutes. They'll cut your session startup time in half before you've built anything else.

Note 1: The project hub

One file per project. Call it [ProjectName] Hub.md. Put it somewhere you'll remember.

Contents:

# [Project Name] Hub

## What This Is
One sentence. What this project is trying to accomplish.

## Current State
What's true right now. Not the goal — the present position.
- What's built / done
- What's in progress
- What's blocked

## Stack / Environment
The non-obvious technical details:
- Language/framework versions
- Deployment target
- Key external dependencies
- Anything that constrains how the AI should suggest solutions

## Key Files
The 3–5 files the AI should know about. Paths, not descriptions.

## Links
→ [[Active Constraints — Project Name]]
→ [[Session State — Project Name]]

This is the file your CLAUDE.md (or equivalent) points to. Instead of loading everything at once, the agent reads the hub and follows links to what it needs.

Keep it under 100 lines. If it grows past that, the project needs a more structured setup — but for most projects, this is enough.

Note 2: Active constraints

A separate file: Active Constraints — [ProjectName].md.

This is the note that earns its keep fastest. It captures the non-obvious rules — the ones an agent wouldn't know from reading your code, and that you've been re-explaining at the start of every session.

# Active Constraints — [Project Name]

## Deployment
- [e.g. Cloudflare Workers only — no Node.js filesystem access]
- [e.g. Must stay within free tier limits for now]

## Compliance / Legal
- [e.g. PIPEDA in scope — no US-only data storage]
- [e.g. Client NDA — no third-party AI APIs on their data]

## Technical
- [e.g. No new dependencies without approval — bundle size constraint]
- [e.g. Postgres only — no other databases]

## Timeline
- [e.g. Hard launch: June 22 — no scope creep past MVP definition]

## Explicitly Out of Scope (This Phase)
- [Thing you keep getting asked about that's intentionally deferred]
- [Another deferred decision]

The "Explicitly Out of Scope" section is the one most people skip. It's also the one that saves the most time. When the agent suggests something you've already decided not to do this phase, you need to be able to say "it's in the out-of-scope list" rather than re-arguing the point.

Fill this in before your first session. Update it whenever a constraint changes or a scope decision gets made.

Note 3: Session state

One file per project: Session State — [ProjectName].md. Five fields. Update it at the end of every session. Read it at the start of the next one.

# Session State — [Project Name]

**Last updated:** [date]

## Objective
[What this project is currently trying to accomplish — may be more specific than the hub]

## Decisions Made
- [Decision]: [brief reason why]
  - Rejected: [alternative] — [why it was ruled out]
- [Another decision]

## Open Questions
- [ ] [Something explicitly unresolved — not a to-do, an open decision]
- [ ] [Another unresolved question]

## Blockers
- [Anything currently preventing forward progress]

## Next Step
[The exact action to take at the start of the next session. One sentence. Specific enough that the agent can start without clarification.]

The "Decisions Made" and "Open Questions" fields are the ones that matter most.

Decisions Made prevents re-litigation. When you write "Rejected: unified middleware — couples deploy cycles, adds latency on every data request," the agent reads that before suggesting anything architectural. It doesn't re-propose unified middleware. The no is on record.

Open Questions prevents silent gap-filling. Questions not marked open get filled in — by you with guesses, by the agent with plausible inference. Writing them down forces both of you to hold the uncertainty rather than pretend it doesn't exist.

Next Step is what makes session re-entry frictionless. Not "continue working on auth." Specific: "Write the /api/auth/verify endpoint spec. The internal proxy question is a blocker — resolve that first."

Wiring it up

In your CLAUDE.md (or whatever entry file you use), add three lines:

## Context
→ Read [[ProjectName Hub]] for current state and stack.
→ Read [[Active Constraints — ProjectName]] before proposing solutions.
→ Read [[Session State — ProjectName]] to understand where we are and what's next.

That's it. The agent reads the hub, checks constraints, loads current state. The session starts with context instead of a briefing.

What this gets you

Without these three notes, every session opens cold. You re-explain the project. The agent suggests something you've already ruled out. You correct it. You remember a constraint at minute eight and add it to the prompt. You spend 15 minutes getting to useful work.

With these three notes, you read them at session open (2 minutes), the agent reads them too, and the session starts from current state. Not from zero.

That shift — from 15 minutes to 2 minutes — is the whole value. Everything else (full vault structure, skill guides, linked knowledge graph) is an amplification of this core pattern.

These three notes are part of a larger vault system I've been running for six months on a 212-note Obsidian vault. The full skeleton — note types, hub templates, decision-log format, linking rules, skill guides, session-state protocol, optional local runtime — is packaged as a $49 template.

→ https://pharosml.gumroad.com/l/kvbhdo

But you don't need the template to start. Create the three notes above this week. See what happens to your session startup time. Then decide if the larger system is worth building.

Every new Claude session started from zero. Here's how I fixed it.

martinlepage26-bit — Thu, 07 May 2026 18:37:49 +0000

Every new Claude session started from zero. Here's how I fixed it.

There's a specific kind of frustration that comes after you've been working with Claude Code for a few weeks.

You open a new session. You start describing your project. The agent asks a clarifying question — and you realize it's the same question it asked in session three. You've explained this before. You've made this decision. You've already ruled out that approach, and you even remember why. But the agent doesn't.

The context window closes. The session ends. Everything that wasn't written somewhere permanent evaporates.

I hit this wall hard while working on a governance software product with interconnected decisions across infrastructure, product architecture, compliance scope, and research writing. Every session felt like onboarding a new contractor who needed to be caught up from scratch. I was spending 20% of every session on re-explanation.

The fix wasn't a better prompt. It was a memory layer.

What the problem actually is

Claude Code is stateless by design. Each session starts with whatever you hand it at the top: a CLAUDE.md, some file paths, maybe a few recent commits. That's the entire context budget for "what is this project and how does it work."

Most people solve this by making CLAUDE.md longer. That doesn't scale. You can't fit the full reasoning behind a year of architectural decisions into a header file.

What you actually need is a knowledge graph that the agent can traverse — not a flat dump it has to read all at once.

The architecture that works

I use Obsidian as the memory layer. The structure has three zones:

raw/         ← unsynthesized captures, preserved exactly as written
wiki/        ← synthesized notes with inline [[links]] to related concepts
CLAUDE.md    ← entry point: points to the project hub, not a wall of text

The wiki/ directory is the knowledge graph. Every note in it must have at least two meaningful inline [[links]] to other notes — not links dumped at the bottom in a "Related" section, but links woven into the body where the actual connection is made.

The CLAUDE.md is deliberately short. It points to a project hub note, which links outward to decision logs, architecture notes, people, open questions, and constraints. The agent traverses the graph instead of reading a monolith.

The note types that matter

Decision log:

# Decision: Auth Layer Scope

**Status:** Locked  
**Date:** 2026-03-12

## Decision
Keep auth separate from the API gateway layer.

## Alternatives Rejected
- Unified middleware: rejected — couples deploy cycles, adds latency on every request.
- Auth-in-gateway: rejected — obscures auth logic, harder to test independently.

## Open Questions
- [ ] Whether internal calls should bypass gateway or proxy through

When this note is linked from the project hub, the agent sees it before making architectural suggestions. It won't re-propose the unified middleware. It already knows why you rejected it.

Active constraints note:

# Project Constraints

- [[Deployment Target]]: Cloudflare Pages + Workers only (no VMs)
- [[Language]]: TypeScript frontend, Python backend — no language additions without discussion
- [[Compliance]]: PIPEDA + Quebec Law 25 in scope; EU AI Act out of scope v1
- [[Timeline]]: Revenue-positive by 2026-06-22

Short, linked, updated as constraints change. The agent loads this in the first traversal hop and never proposes something that violates a locked constraint.

Open questions note:

# Open Questions

- [ ] Railway vs Hetzner for backend hosting — decision pending cost analysis
- [ ] Whether regulatory corpus ingestion is in scope for v1
- [ ] CF Pages recreate vs rename — CF dashboard action required first

This replaces the end-of-session "TODO" you write in chat that disappears when the session closes.

The linking rule that makes it work

The structure only holds if every note is connected into the graph — reachable via traversal, not just search.

A note that exists in wiki/ but has no backlinks from anywhere is invisible to the agent unless it reads every file. That defeats the purpose.

The test I use: can the agent get from CLAUDE.md to this note in three hops or fewer? If not, something in the chain is broken.

I enforce this with a simple protocol:

Before creating any note, search for related existing notes
Always add [[links]] inline in the body (not just at the end)
After creating a note, update the relevant hub or MOC page so it links back
Never create orphan notes

With 212 notes in my vault, the agent reliably finds relevant context without brute-force reading. It traverses. That's the difference.

What I packaged

After building and refining this over several months, I packaged the vault structure as a template:

Vault skeleton with the raw/wiki/maps/templates structure pre-configured
Hub templates (project, person, concept, decision) that enforce the linking pattern before you fill in content
Four named skill guides (Caelir for research synthesis, Ilyris for topic mapping, Ariun for linking hygiene, Mnara for archiving stale material) — these are plain-language instruction files the agent follows when invoked
Optional local runtime (setup.sh + ask.py + vault_watcher.py) for running queries against the vault without a cloud subscription
CLAUDE.md patterns that actually scale as the vault grows

The template is $49 and available here: Obsidian Agent Vault

There's also a $299 guided setup option if you want help adapting the structure to your specific project type, and a $2,500 team license for small teams wanting a shared memory layer.

The core insight: an AI agent with a well-structured graph of your project's decisions, constraints, and open questions is a fundamentally different tool than one reading a long CLAUDE.md cold. The structure is what makes context persistent. The linking discipline is what makes it traversable.

Happy to share the hub template or the linking-hygiene skill guide in the comments if there's interest.

Most AI "Hallucinations" Are Context Failures, Not Model Failures

martinlepage26-bit — Thu, 07 May 2026 18:37:14 +0000

"Hallucination" is the word we use when an AI model produces something plausible but wrong. It's treated as a fundamental model failure — an inherent limitation of probabilistic text generation that we're stuck managing.

I want to challenge that framing, at least for applied AI work.

Most of what gets called hallucination in real workflows isn't the model inventing things out of nothing. It's the model filling in missing context with its best guess. And most of the time, the context isn't actually missing from reality — it's just missing from what you gave the model.

That's a context problem, not a model problem. And it's fixable.

What hallucination actually looks like in practice

You're writing a technical specification. You ask Claude to add a section on authentication. It writes something technically reasonable but inconsistent with your actual architecture — it suggests OAuth when you've already decided on JWT for specific reasons, proposes a token rotation period that conflicts with your security policy, and doesn't mention the session handling constraint that came out of last month's incident review.

Is that a hallucination? Sort of. The model generated plausible content. But the failures are all traceable to missing context:

It didn't know you'd decided on JWT (no decision log)
It didn't know your security policy constraints (no active-constraints record)
It didn't know about the incident and what you learned from it (no session history)

Give the model all three pieces of context explicitly, and those specific errors disappear. The model wasn't wrong because it can't reason about authentication — it was wrong because it was reasoning from incomplete information.

The gap between capability and consistency

Modern LLMs are remarkably capable. They can reason about complex domains, maintain logical consistency within a context window, and produce high-quality output when given the right inputs.

The problem in applied work isn't capability — it's consistency. The same model, asked the same question at different times with different context, produces different answers. Some of those differences are fine (appropriate variation). Some are errors that get called hallucinations.

The consistency gap is almost entirely explained by context variation. What you told the model in this session vs. last session. What you remembered to include vs. forgot. What changed in your project since the last time you worked on this.

What structured context actually prevents

When I introduced a session-state protocol into my workflow — a structured record of active constraints, decisions made, open questions, and current project state, read by the AI at the start of every session — the incidence of the errors I was calling "hallucinations" dropped substantially.

Not to zero. There are genuine model errors that context doesn't fix. But the large majority of my workflow failures were context failures that looked like model failures.

Specific patterns that disappeared:

Constraint violations. "The AI keeps suggesting X even though I've told it we can't do X." Once the constraint was written into the active-constraints field and read every session, this stopped. The model was never incapable of respecting the constraint — it just wasn't being told about it consistently.

Decision revisiting. "The AI keeps re-opening questions I've already answered." Once the decisions-made field included not just the decision but the rationale and the rejected alternatives, this stopped. The model could see not just what was decided but why — making the closed question clearly closed.

Stale-reference errors. "The AI is working from an outdated version of the spec." Once the current-state field explicitly named the version and what had changed since the last session, this stopped.

Cross-project contamination. "The AI seems to be mixing up details from different projects." Once each project had its own hub note with its own explicit context, this stopped.

The architecture that makes this work

The context needs to be:

Structured — not a prose dump, but specific fields: constraints, decisions, state, next step. Structured context is more reliably loaded than narrative context.
Current — updated at session close, not just at project start. Context that's six weeks stale is almost as bad as no context.
Connected — linked to other relevant notes. An authentication constraint note that links to the threat model, the incident log, and the relevant decision records is more useful than a standalone note.
Scoped — per project, not global. A hub note for each active engagement prevents cross-project contamination.

This is graph-structured, file-native knowledge — Obsidian's model, applied to AI workflow.

The vault

The Obsidian vault skeleton that operationalises this architecture — hub templates, session-state protocol, decision-log format, linking rules, note types — is packaged as a $49 template.

→ Obsidian Agent Vault on Gumroad

If your AI workflow is producing errors you're attributing to model failures, run the diagnosis first: how much of that context does the model actually have access to in each session? The answer is often "less than you think."

Tags: #ai #llm #productivity #obsidian #pkm #softwareengineering #devtools

The Real Cost of Rebuilding AI Context (And How to Stop Paying It)

martinlepage26-bit — Thu, 07 May 2026 18:36:21 +0000

Here's a calculation most people haven't done:

How much time per week do you spend getting an AI back up to speed on what you're working on?

I did this calculation six months ago. The number was uncomfortable.

The baseline

At the time I was doing about 8 meaningful AI-assisted work sessions per week — writing, research, code, analysis. Each session started with some amount of context-setting: explaining the project, pasting in relevant documents, reminding the AI of decisions we'd already made, re-establishing the constraints.

I timed it. The average context-reconstruction overhead per session was 12–15 minutes.

8 sessions × 13 minutes = ~1.75 hours per week on context reconstruction.

That's one full work session. Every week. Just getting back to where I was.

What you're actually reconstructing

Context reconstruction isn't a single task. It breaks down into:

1. State recovery — "Where were we?" You paste in the document, scan for where you left off, remind yourself what you were trying to do. Even if you remember perfectly, the AI doesn't.

2. Decision archaeology — "What did we already decide?" The thing you're about to ask the AI might be something you explicitly resolved two sessions ago. Without a record, you won't remember. You'll explore the same territory again.

3. Constraint re-establishment — "What are the rules?" The specific requirements, client preferences, or non-obvious constraints that shape this work. They live in your head. Every session, you rediscover how many of them matter when the AI violates them.

4. Approach rejection — "What didn't work?" The approaches you've already tried and discarded. Without a record, the AI will suggest them again. You'll spend time re-evaluating options you've already closed.

Each of these has a time cost. Together, they're the overhead that prevents AI-assisted work from compounding.

The ROI of a session-state protocol

The fix is simple: write a structured note at the end of every session. Not a transcript — a state snapshot.

Five fields:

## Objective
[What this project is trying to accomplish]

## Active constraints  
[Non-obvious rules that shape the work]

## Decisions made
[What's been decided, with brief rationale — especially what was rejected]

## Open questions
[What's still unresolved]

## Next step
[The concrete next action, specific enough to act on immediately]

This takes 3–5 minutes to write at session close. At session open, the AI reads it. Context-reconstruction time drops from 13 minutes to 90 seconds.

Weekly time saved: ~1.5 hours. Per year: ~75 hours. At any reasonable hourly rate, the compounding value is significant.

The less obvious ROI: decision quality

The time calculation understates the value. The bigger return is decision quality.

When you have a record of why you made previous decisions, you make better decisions on subsequent sessions. You don't re-open questions that are already closed. You don't lose constraints in the noise. You don't repeat failed approaches.

The AI's output quality also improves — not because the model got better, but because it's receiving better context. A well-contextualised session with an average prompt outperforms a poorly-contextualised session with a perfect prompt.

What this looks like at scale

The session-state protocol is the core habit. But at 10+ active projects, you need infrastructure around it:

A hub note per project (canonical entry point, current state, decisions log)
A raw-sources zone (captures that haven't been synthesized yet)
A wiki layer (synthesized, permanently linked knowledge)
MOC notes (indexes into the graph for fast navigation)

This is what I built. It's now a 212-note Obsidian vault that serves as my persistent AI memory across all projects.

The skeleton — note types, hub templates, linking patterns, session-state protocol, optional local runtime — is packaged as a template.

→ Obsidian Agent Vault on Gumroad — $49

If your AI sessions currently start with 10+ minutes of context-setting, that's the symptom. The vault addresses the cause.

Tags: #productivity #ai #obsidian #pkm #devtools #timemanagement

The one Obsidian note that stopped Claude from re-proposing ideas I'd already rejected

martinlepage26-bit — Thu, 07 May 2026 18:36:16 +0000

The one Obsidian note that stopped Claude from re-proposing ideas I'd already rejected

There's a specific failure pattern that shows up after a few months of AI-assisted development.

You're in a session. The agent proposes an approach. You push back — you tried that, it didn't work, here's why. The session continues. Two weeks later, different session, different context: the agent proposes the same approach again. With confident reasoning that sounds better than your original objection.

This isn't hallucination. It's a record-keeping problem. The reasoning behind your rejection only existed in a closed chat window. From the agent's perspective, starting fresh, the approach looks reasonable.

The fix is one note type: a Decision Log with a mandatory "Alternatives Rejected" section.

What makes a Decision Log different from a regular note

Most notes capture what is. Decision Logs capture what was chosen, what was ruled out, and — critically — why.

The "why rejected" is the load-bearing part. Without it, you have a record of decisions. With it, you have a record the agent can reason from.

Here's the format I use:

---
type: decision
status: locked
date: 2026-03-12
---

# Decision — Auth Layer Scope

## Decision
Keep authentication separate from the API gateway layer.

## Context
[[Project Hub — CompassAI]] — this decision came up during the evidence endpoint
security audit when rate-limit telemetry was being added.

## Reasoning
Latency isolation: auth path and data path have different SLA requirements.
Independent scaling: auth service can be scaled without coupling to API deploy cycles.
Test isolation: auth logic stays testable independently of gateway routing.

## Alternatives Rejected

**Unified middleware:**
Rejected because it couples deploy cycles — a change to auth requires redeploying
the gateway, and vice versa. Also adds latency on every request, not just auth ones.
Profiled at +18ms median on the evidence ingest path.

**Auth-in-gateway:**
Rejected because it obscures auth logic from the backend team. Makes unit testing
harder (can't test auth without spinning up the gateway mock). Discovered during
the initial security review that this pattern makes CORS enforcement ambiguous.

## Open Questions
- [ ] Whether internal service-to-service calls should bypass gateway or proxy through
- [ ] Token refresh: handle at gateway or push to client?

## Links
[[Active Constraints — CompassAI]] · [[Decision — Rate Limit Telemetry Scope]]

Why each section is there

## Decision — one sentence, no hedging. If you can't write the decision in one sentence, it's not a decision yet.

## Context — links back to the project hub and names the moment when this decision was made. Gives the agent temporal and project context so it understands why this mattered when it mattered.

## Reasoning — the affirmative case. What made the chosen option correct. Keep it to 3-5 sentences. If it's longer, you're defending a bad decision.

## Alternatives Rejected — the section that does the actual work. Name each alternative, give a specific reason it was rejected, and include any concrete data if you have it (latency numbers, test failure rates, code review findings). Generic rejections ("too complex") don't help the agent or future-you. Specific rejections do.

## Open Questions — the uncertainty that this decision leaves open. Checked off when resolved, linked to a new decision note when they become a decision.

## Links — inline links to adjacent notes. The decision log is useless if it's an orphan. It needs to be reachable from the project hub and linked to the constraints it produced.

How the agent uses it

When you wire the decision log into your project hub:

# Project Hub — CompassAI

## Decisions
[[Decision — Auth Layer Scope]]
[[Decision — Rate Limit Telemetry Scope]]
[[Decision — DB Schema for Evidence Records]]

And your CLAUDE.md points to the hub, the agent traverses this graph at the start of each session. It reads the decision log before making architectural suggestions. It sees "Unified middleware — rejected because it couples deploy cycles" and doesn't propose unified middleware.

The rejection is visible before the suggestion happens. The proposal never surfaces.

The pattern that makes this work at scale

A single decision log is useful. Twenty linked decision logs are a project memory.

The key discipline: create a decision log every time you make a non-trivial architectural, product, or process decision — not just the big ones. A decision about which HTTP status code to return on a specific error condition is worth logging if you spent more than five minutes reasoning about it.

The cost is ~5 minutes per decision at creation time. The return is every future session where the agent doesn't re-open a closed question.

The two mistakes that break this

Mistake 1: Rejected alternatives without specific reasons.

"Auth-in-gateway: rejected — too complex" tells the agent nothing it can reason from. The agent doesn't know what "too complex" means in this context. The next session it might propose auth-in-gateway again because the complexity trade-off looks different with a different feature set.

"Auth-in-gateway: rejected — obscures auth logic from backend team, makes unit testing without gateway mock impractical, causes CORS ambiguity during security review" is specific enough to foreclose the option.

Mistake 2: Decision logs with no backlinks.

A decision log that isn't linked from the project hub doesn't exist to the agent during traversal. It only shows up if the agent explicitly searches for it, which doesn't happen automatically. Wire every decision log into the hub under ## Decisions. Wire the constraints it produced into Active Constraints. Bidirectional links mean the agent reaches it from either direction.

Starting point

If you want to try this without restructuring your entire note system: just create one Decision Log for the last significant architectural decision you made on your current project.

Use the format above. Write the "Alternatives Rejected" section even if it feels like overkill. Link it back to whatever project hub or CLAUDE.md you already have.

Run one session with the agent, give it access to that note, and watch whether it proposes the rejected alternative.

If it doesn't — that's the pattern working.

The full vault structure (hub templates, all note types, linking discipline, skill guides, optional local runtime) is packaged as a $49 template: Obsidian Agent Vault

The Decision Log template ships as part of it, along with the other three note types (project hub, active constraints, open questions). But the format above is enough to start.

The Context Window Is Not Your Memory

martinlepage26-bit — Thu, 07 May 2026 18:35:32 +0000

There's a conflation in how most people talk about AI memory that leads them to build the wrong thing.

The context window and memory are not the same thing. They serve different purposes, operate on different timescales, and fail in different ways. Conflating them produces a system that looks like it handles memory but doesn't actually compound over time.

Here's the distinction and why it matters practically.

The context window

The context window is the model's working memory — the span of tokens it can attend to in a single inference pass. Everything outside the window is invisible. Everything inside the window is equally weighted (with some positional decay depending on architecture).

Key properties:

Session-scoped — it resets when the conversation ends
Expensive to fill — every token you put in costs inference compute
Flat — there's no inherent hierarchy or structure; a crucial constraint and a filler paragraph cost the same
Bounded — even very long context models have limits, and performance degrades with distance from the query
Non-persistent — nothing in the context window writes itself to storage automatically

The context window is useful for reasoning within a session. It's not useful as a persistence mechanism because it has no persistence.

Memory

Memory, in the sense that matters for ongoing work, is structured information that persists across sessions and is available to be selectively loaded into context when relevant.

Key properties:

Durable — survives session boundaries
Selective — not everything needs to be loaded every time; only what's relevant to the current task
Structured — organized so the right things can be found and loaded efficiently
Linked — connected to other relevant pieces so that loading one node gives you traversal hooks to related context
Accumulated — gets richer over time as more decisions, constraints, and state are recorded

Memory is what you build and maintain. The context window is what you load memory into at session time.

Why the conflation is harmful

When people treat the context window as their memory system, they build workflows that:

Paste everything in every session. Load all potentially relevant documents at the start of each conversation. This works until you hit context limits, runs the risk of missing something, costs tokens proportional to how much you paste, and requires you to manually curate what's relevant every time.

Rely on chat history. Use the conversation history as a de facto memory store. This fails when you start a new thread (history resets), when you need to find something specific (chat history is terrible for retrieval), and when you want to share context with a different model or tool.

Build "memory" features on top of context. Tools that summarize conversations and prepend them to the next session are still just filling the context window — with summaries instead of full history. Better than nothing, but fundamentally still ephemeral. The summary is lossy and unstructured.

None of these produce the compounding effect that genuine persistent memory does, because none of them build a durable, structured, queryable store that gets richer over time.

The architecture that works

Persistent memory for AI-assisted work needs three things:

1. External storage. Outside the model, in files or a database. Not in the context window. Not in chat history. Written to disk in a format you control.

2. Structure. Not a prose dump — typed fields with known semantics. A decision record has different fields than a constraint record has different fields than a session-state record. The structure makes selective loading possible and makes the content reliably interpretable.

3. A loading protocol. A defined process for deciding what gets loaded into context at session start. Not everything, every time — just the hub note for the active project, the session-state record, and any directly relevant linked notes. This keeps context costs low and ensures the most important information is loaded closest to the query.

This is what I call the hub-and-spoke pattern: a hub note per project that's always loaded (current state, active constraints, key decisions), with spokes to more detailed notes that get loaded selectively.

The file-native implementation

The simplest implementation of this is also the most portable: plain Markdown files with structured sections and wiki-style [[links]] between related notes.

No database required
No API calls for retrieval
No build step for embeddings
Works with any model that can read files
Fully auditable (it's just text)
Version-controllable

The tradeoff vs. vector databases: discovery is harder (you need to know what you're looking for, or use a lightweight vector search layer for exploration). For ongoing projects where you know what you're working on, direct file reads are faster and more reliable than probabilistic retrieval.

The vault

The Obsidian vault skeleton I use operationalises this pattern:

Hub templates (the always-loaded entry point per project)
Session-state protocol (what gets written at close and read at open)
Note types with defined structure (decisions, constraints, state, open questions)
Linking conventions that enable traversal
Optional lightweight vector search for discovery

→ Obsidian Agent Vault on Gumroad — $49

If your AI workflow today is primarily context-window management — pasting, summarizing, re-loading — you're working against the grain of how these systems are designed. The context window is for reasoning. Memory is for persistence. Build the right thing for each.

Tags: #ai #llm #obsidian #pkm #softwareengineering #productivity #machinelearning

Why Better Prompts Aren't the Fix (And What Actually Is)

martinlepage26-bit — Thu, 07 May 2026 18:35:31 +0000

I spent three months trying to get consistent results from Claude by improving my prompts.

Better structure. More examples. Chain-of-thought instructions. Role framing. Temperature tuning. The whole toolkit.

The results got marginally better, then plateaued. And then I noticed something uncomfortable: the same prompt produced wildly different output depending on when in a project I ran it.

Early in a project, when I had full context in my head — great results. Six weeks in, after I'd context-switched five times — mediocre results, even with the "good" prompt.

The prompt hadn't changed. My context had.

The actual variable

Prompts tell the model what to do. Context tells the model who's asking, what they've already decided, what constraints they're operating under, and what "good" looks like for this specific situation.

When you have rich context, a mediocre prompt works fine. When you have weak context, even a perfect prompt produces generic output.

This is the thing prompt engineering tutorials don't address: they optimize the instruction while assuming context is constant. It isn't. Context degrades continuously across sessions, context-switches, and team handoffs.

What degraded context looks like in practice

You're working on a technical document. You've been iterating on it for two weeks. You have strong opinions about what should and shouldn't be in it — but those opinions live in your head, not anywhere the AI can see them.

You open a new session. You paste in the document and your prompt. The AI helpfully adds sections you explicitly decided to exclude last week. It uses a tone you've already rejected. It misses the specific constraint that makes this project unusual.

You spend 20 minutes correcting outputs that a well-contextualised session would have gotten right in one pass.

Multiply that by every session, every project, every team member. That's the real cost.

What structured context actually looks like

The fix isn't a better prompt. It's a persistent record that travels with every session:

Decision log — what's been decided and, critically, what's been rejected. Not just the current answer but the ruled-out alternatives. When the AI suggests something you've already considered and discarded, you can point to the record: "We tried that. Here's why it failed."

Active constraints — the specific requirements, boundaries, and non-obvious rules that apply to this project. Things that wouldn't be obvious from the artifact alone.

Current state — where the project is right now. Not a full history, just the present position: what's done, what's in progress, what's blocked and why.

Next step — the concrete action that closes the gap between current state and goal. Not "continue working on X" — the actual next move.

This structure takes 10 minutes to set up per project. It eliminates the context-reconstruction overhead on every subsequent session.

The compounding effect

The real payoff isn't session 2. It's session 20.

By the time you've been working on something for three months, the accumulated decisions, rejected approaches, and learned constraints are substantial. Without a record, you reconstruct a fraction of them each session and forget the rest. With a record, they're all available to the AI immediately, every time.

The output quality doesn't plateau. It improves as the record grows.

The vault structure

I built a file-native knowledge vault that operationalises this pattern across all my AI-assisted work:

Hub notes per project — one canonical entry point with the current state, decisions made, and active constraints
Decision log format — a specific structure for recording why things were rejected, not just what was chosen
Skill notes — reusable task templates that carry their own context requirements
Session-state protocol — a start/end ritual that updates the record so the next session starts clean

The skeleton for this system — all the note types, hub templates, linking patterns, and the optional local runtime — is packaged as a $49 Obsidian vault template.

→ Obsidian Agent Vault on Gumroad

If you've been frustrated by inconsistent AI output and your first instinct has been to improve the prompt, consider that the problem might be upstream of the prompt.

Tags: #productivity #ai #obsidian #promptengineering #devtools