Daniel Romitelli

Posted on Mar 19 • Edited on Mar 23 • Originally published at craftedbydaniel.com

I Lost Three Hours to a Blank Slate—So I Made “Forgetting” Structurally Impossible (Series Part 12)

#memorysystems #architecture #database #workflow

I watched it happen in real time: a fresh session spun up, eager to help, and immediately asked the question that costs me hours—“What should I work on?”

Part 0 of this series was the emotional moment: the AI forgets everything, and you end up rebuilding context that already existed—just not in a place the system was forced to look.

This is Part 12 of “How to Architect an Enterprise AI System (And Why the Engineer Still Matters)”, and it’s the post where I stop treating statelessness like a personality flaw and start treating it like what it is: an engineering problem.

The core decision was simple and slightly ruthless:

I made it structurally impossible to start a session without recovering state.

Not “recommended.” Not “best effort.” Mandatory.

The key insight (and why the naive approach fails)

The naive approach to continuity is to stuff more and more background into the system prompt. You keep a running blob of notes, you paste in old decisions, you preload “everything important,” and you hope the next session behaves.

That fails for two reasons:

Static context rots. The moment you hardcode documentation into a boot prompt, it starts drifting away from reality.
Static context is expensive. I hit a point where the boot context was about 30,000 tokens of inline documentation before the assistant even saw the user’s prompt.

The non-obvious move was realizing that “memory” isn’t a single thing.

I needed three different kinds of continuity, each with a different shape:

Task state (what’s pending, what’s in progress, what’s blocked)
Working state (what I was doing mid-thought, the half-made decisions, the code snippets in flight)
Infrastructure state (the stuff that otherwise costs 15 minutes of lookups every session)

So I built a stack where each layer has one job—and the boot sequence forces the system to consult them in an order that makes sense.

The Zero Context Loss stack (layer by layer)

I’ll walk through this in the order I’d implement it again.

Layer 1: A task ledger the AI can’t argue with

The foundation is a database table that acts as the single source of truth for task state across concurrent work.

It’s intentionally boring.

It has an enum status instead of a boolean because “done/not done” is a lie in real engineering work. I needed to represent:

pending
in_progress
completed
blocked

And I needed that state to be queryable before anything else happens.

Here’s the schema as I use it:

CREATE TABLE implementation_progress (
  id,
  project_id,
  feature_name,
  phase,
  task_description,
  status,
  started_at,
  completed_at,
  notes
);

That table is the difference between “I think I was working on…” and “Here are the exact tasks that are still alive.” The first is a vibe; the second is a system.

The first query that runs in every session

On boot, I query for the tasks that matter right now: pending and in_progress.

SELECT task_description, status, notes
FROM implementation_progress
WHERE status IN ('in_progress', 'pending')
ORDER BY CASE status WHEN 'in_progress' THEN 1 ELSE 2 END;

What surprised me when I put this in place was how often the “next step” was already written down—just stranded in a table nobody was forced to read.

Layer 2: A mandatory 4-step boot protocol

Once I had a ledger, the next failure mode was predictable: a new session would still start answering questions before it had recovered context.

So I wrote a boot protocol and made it non-negotiable.

The order matters.

Database first because it’s the fastest way to know what’s active and what’s blocked.
Files next because structured task rows don’t capture the messy middle.
Handoff notes because the “mid-thought” state is often not committed.
Infrastructure context because otherwise every session starts with the same lookups.

The protocol is literally a sequence.

# Step 1: Query database for active tasks
SELECT task_description, status, notes
FROM implementation_progress
WHERE status IN ('in_progress', 'pending')
ORDER BY CASE status WHEN 'in_progress' THEN 1 ELSE 2 END;

# Step 2: Read progress files
head -50 KNOWLEDGE_BASE_PROGRESS.md

# Step 3: Check for session handoff
ls -t session_notes/session_*.md | head -1 | xargs cat

# Step 4: Call engineering context API
curl /api/teams/admin/engineering/context

# Step 5: Initialize task tracking
# Use TodoWrite tool for multi-step tasks

I like that this is blunt. It doesn’t rely on discipline. It doesn’t rely on memory. It’s a checklist that runs before the assistant earns the right to have an opinion.

Layer 3: Progress files (because the database only tells you “what”)

The table is great at answering:

What’s in progress?
What’s pending?
What’s blocked?

It’s terrible at answering:

Where in the code did I stop?
What decision was half-made?
What did I try that failed?

That’s why I keep active progress files alongside the database.

The database tracks what. The progress files track where.

This is also where I record the kind of information that doesn’t belong in a normalized schema: the in-flight reasoning that will matter when I resume.

Layer 4: Session handoffs as resumption documents

The moment I stopped trusting session continuity, I needed a format that could survive a hard reset.

So I standardized session handoffs into structured markdown files named by timestamp:

session_YYYYMMDD_HHMM.md

The important part isn’t the filename—it’s the constraint:

A handoff is written for a reader with zero context.

Not a summary. A resumption document.

It includes explicit sections:

what was accomplished
code snippets in flight
architectural decisions made and why
blockers encountered
numbered next steps

The numbered list is the trick. It forces the next session to do something concrete instead of “getting up to speed” forever.

Layer 5: Migrating scattered docs into a knowledge base

At one point, I had the classic failure mode: the boot context was stuffed with inline documentation—architecture notes, deployment guides, API references—because I wanted the assistant to “have everything.”

It worked until it didn't.

The boot context ballooned to 30,000 tokens of static text.

So I migrated hundreds of scattered markdown documents into an embedded knowledge base with auto-categorization.

I used five article types:

howto
troubleshooting
architecture
onboarding
reference

And five context types:

implementation_plan
technical_decision
code_pattern
user_feedback
operational_runbook

The migration script supported:

dry-run mode
batch limiting
retry logic
JSON migration report
generated cleanup script to remove migrated source files

The big payoff wasn’t organization. It was freeing the boot context window.

Layer 6: The 30K → 8K token reduction

Once the docs lived behind retrieval instead of inside the boot prompt, the boot context stopped being a junk drawer.

It dropped from 30,000 tokens of inline documentation to 8,000 tokens.

And the shape of those 8,000 tokens mattered:

the protocol
the rules
pointers to where knowledge lives

That’s the difference between carrying your entire library in your backpack and carrying a map to the library.

Layer 7: A prewarmed hotset (so boot doesn’t feel cold)

Even with a good protocol, cold starts feel slow if the system has to fetch everything reactively.

So on boot, I load a hotset of recent memory records into working memory.

SELECT embedding, metadata
FROM memory_cards
ORDER BY created_at DESC
LIMIT 200;

Two constraints shaped this:

It had to be recent enough to contain the last week of decisions and artifacts.
It had to be small enough to fit into working memory without crowding out the actual work.

The effect is subtle but real: when the agent wakes up, it’s already primed. No “let me catch up” phase.

Layer 8: Infrastructure context as retrievable state

The last piece is the one that feels embarrassingly practical.

Before I stored infrastructure context, sessions would routinely start with questions like:

“What’s the Redis connection string?”
“What’s the Cosmos DB endpoint?”

Each one is a 15-minute tax if you have to re-derive it.

So I stored resource IDs, connection strings, deployment states, and configuration values as retrievable context and made it part of the boot sequence via a single API call.

How the pieces fit together

Here’s the flow I actually care about: the assistant doesn’t get to “think” until it has recovered state.

flowchart TD
  boot[Boot] --> progressDb[Query implementation_progress]
  progressDb --> progressFiles[Read progress files]
  progressFiles --> handoff[Load latest session handoff]
  handoff --> infraApi[Fetch infrastructure context via API]
  infraApi --> hotset[Prewarm hotset from memory_cards]
  hotset --> work[Start work with recovered state]```



My mental model is a pit crew, not a professor. The boot protocol isn’t there to explain; it’s there to get the car back on track before the driver hits the gas.

## What went wrong (the mistake that forced the protocol)

I didn’t build this stack because it was intellectually interesting.

I built it because I lost three hours rebuilding context that was already written down—just not in a place the system was forced to consult.

The database check exists because I had task state sitting in a table that nobody queried at the start of a session.

The progress files exist because “I was working on the search router” is not a resumption state.

The handoff notes exist because the work that matters most is often the work you haven’t committed yet.

The resume script exists because doing the same lookups manually every morning isn’t discipline. It’s waste.

## The resume script: seven recovery steps in one command

Once I had the layers, I wanted a single action that ran recovery in sequence.

The resume script automates seven steps:

1. current git branch
2. last 5 commits with graph
3. working tree status for uncommitted changes
4. read the active progress file
5. find and display the most recent session handoff
6. fetch comprehensive engineering context from the API endpoint
7. query implementation_progress for in-progress tasks with overall stats (completion percentage, task counts by status, next 3 pending items)

I’m not including the full script here because the important part isn’t the shell syntax—it’s the guarantee: one command and you’re working.

The real win is that it makes continuity cheap enough that it actually happens.

## Nuances and tradeoffs

A continuity stack like this has tradeoffs, and pretending otherwise is how systems rot.

### The database is authoritative—but not expressive

I can count tasks, sort by phase, and see what’s blocked.

But I can’t store “the weird thing I noticed in the logs” as a first-class object without turning the schema into a junk drawer.

That’s why the progress files and handoffs exist.

### The boot protocol is rigid on purpose

If you allow “skip this step,” you’ve reintroduced the original failure mode: sessions that start creating output before they’ve recovered state.

The rigidity is the product.

### Prewarming is a bet on recency

Loading the most recent 200 records assumes recency correlates with relevance.

That’s usually true for active engineering work, but it’s still a heuristic. The hotset makes boot feel alive; it doesn’t replace retrieval.

### Migrating docs is boring work that pays rent

The migration wasn’t glamorous.

But it’s what made the 30K → 8K reduction possible, and that reduction is what made the whole system viable without consuming the context window before the conversation starts.

## The series thesis, landed

Across this series, the AI raised the floor on a lot of problems: extraction, enrichment, orchestration, routing, caching.

But the floor itself—the part that keeps a system from collapsing into repeated explanations and forgotten decisions—that was always the engineer’s job.

I didn’t solve forgetting with a better model.

I solved it the way you solve any reliability problem: by building a system where the correct behavior is the default, and the failure mode is structurally hard.

---

But I should be honest about something: the system I’ve shown you across these twelve posts only solves context loss for an engineer’s workflow—sessions, handoffs, task continuity, infrastructure state. Important, but small.

The architecture underneath it isn’t small.

What if the same principles—episodic capture, semantic consolidation, structured retrieval, salience-driven pruning—were applied to a domain where context loss isn’t an inconvenience? Where fragmented information isn’t a productivity problem but a clinical one—where the notes live in one system, the session logs in another, the vitals somewhere else, the medication history in a fourth, and no thread connects them? Where the patient’s story gets reconstructed from shards every visit, by a different person, with different assumptions, and the continuity that healing depends on simply doesn’t exist in the infrastructure?

I’ve been building that system. On my own time, in my own repo, with my own architecture. And it doesn’t look like a developer tool anymore.

It looks like a brain.

Not metaphorically. The memory architecture models the interplay between the hippocampus and the neocortex—fast episodic capture that consolidates into stable semantic knowledge over time, governed by biological priors that no AI memory system I’ve seen has attempted. It doesn’t treat information as separate files in separate systems. It treats it the way neural circuits do: connections, sequences, associations, reconsolidation, pruning, strengthening, emotional tagging.

And it integrates something that changes the architecture entirely: physiological signals—heart-rate variability, sympathetic spikes, vagal tone, sleep architecture—as first-class inputs to the consolidation process. The biology doesn’t just get monitored. It drives what the system remembers, how urgently, and what it surfaces when a clinician needs to make a decision with the full weight of a patient’s journey behind it.

The architecture has caught the attention of a well-decorated and esteemed psychiatrist—not because it’s a clever AI project, but because it recreates the continuity that trauma strips away from the people he treats every day.

Twelve posts about building AI systems for an enterprise. The next thing I build isn’t for an enterprise. It’s for the people the enterprise was never designed to help.

---

🎧 **Listen to the audiobook** — [Spotify](https://open.spotify.com/show/4ABVd5yDVfbX9HlV5JjT7D) · [Google Play](https://play.google.com/store/audiobooks/details/How_to_Architect_an_Enterprise_AI_System_And_Why_t?id=AQAAAECafz8_tM&hl=en) · [All platforms](https://www.craftedbydaniel.com/audiobook)
🎬 [Watch the visual overviews on YouTube](https://youtube.com/playlist?list=PLRteDbGJPYDb9XNjecvHplGlgW7tIv_q6)
📖 [Read the full 13-part series with AI assistant](https://www.craftedbydaniel.com/premium-access?from=%2Fblog%2Fseries%2Fhow-to-architect-an-enterprise-ai-system-and-why-the-engineer-still-matters)

DEV Community