Adam cipher

Posted on Apr 3 • Originally published at cipherbuilds.ai

The Architecture Nobody Talks About: Running Claude Code Agents in Production

#agents #ai #architecture #production

Everyone shows the demo. Nobody shows what happens on day 30.

I've been running an autonomous Claude Code agent 24/7 for 67 days. Not a weekend project. Not a "vibe coding" session. A production system that handles customer emails, writes tweets, deploys code, manages memory, and operates a business while I sleep.

Here's the architecture that makes it work — and the three things that will break yours if you don't plan for them.

The Stack

Runtime: OpenClaw on a Mac Mini (M-series, always on)
Model: Claude on flat-rate plan (no per-token anxiety)
Memory: Three-tier system — daily notes, long-term MEMORY.md, PARA knowledge graph
Ops: Cron-based heartbeats every 30 minutes, session cleanup at 3am, memory compaction weekly
Tools: Browser automation, email (AgentMail), X API, Stripe, Vercel deploys

Nothing exotic. The magic is in how these pieces talk to each other.

Problem #1: Context Window Bloat

Your agent starts fast. By day 3, it's sluggish. By day 7, it's hallucinating. By day 14, you've burned through your API budget and the agent still can't remember what it did yesterday.

Root cause: Every tool call, every file read, every API response inflates the context window. A single heartbeat check that reads email + calendar + Twitter can consume 15K tokens. Do that every 30 minutes and you've exhausted a 200K context window in under 7 hours.

The fix: Session discipline.

Rule: Hard cap at 50K tokens per session.
When hit: Extract progress to memory files → end session → start fresh.

This sounds brutal. It is. But it forces a behavior that turns out to be essential: your agent must externalize its memory. It can't rely on "remembering" something from earlier in the conversation. It writes to files. Files persist across sessions. The agent doesn't.

Problem #2: Memory Retrieval Decay

Even with externalized memory, you'll hit a subtler problem: the agent writes perfect notes on day 1, but by day 30, those notes are stale, contradictory, or buried under 400 lines of newer context.

The pattern I've seen fail:

Agent writes everything to one file
File grows to 2000+ lines
Agent reads the first 100 lines (recency bias)
Critical decisions from line 847 are forgotten
Agent re-does work, contradicts itself, or loses client context

The fix: Three-tier memory.

Tier 1 — Daily notes (memory/YYYY-MM-DD.md): Raw logs. Everything that happened. Ephemeral — archived after 14 days.
Tier 2 — Long-term memory (MEMORY.md): Curated rules, anti-patterns, permanent directives. The agent reviews daily notes periodically and promotes important learnings here.
Tier 3 — Knowledge graph (PARA structure): Entities (people, companies), projects, resources. Structured for semantic search.

The key insight: reading tail-first (last 100 lines) gives you the most recent context. Head-first reading is the default, and it's wrong for time-series memory.

Problem #3: Workflow Drift

This is the silent killer. Your agent works perfectly for two weeks. Then reality changes — a tool updates its API, a contact changes their email, a pricing strategy shifts — and the agent doesn't notice.

The fix: Scheduled self-audits.

My agent runs a nightly deep dive at 7:30pm:

Outcome audit — Every action from today, what was the measurable result?
Pattern analysis — What worked? What failed? What am I repeating that isn't working?
Behavior correction — What specific thing am I changing? Not "try harder" — actual tactical changes.

This feedback loop is what prevents drift. The agent doesn't just execute — it evaluates whether its execution is producing results and adapts.

The Numbers After 67 Days

Sessions: ~200+ (hard cap at 50K tokens each)
Uptime: 24/7 with 3am maintenance window
Memory files: 67 daily notes + 1 long-term memory + 40+ entity files
Things that broke: Session bloat (week 1), memory retrieval (week 3), workflow drift (week 5)
Things that survived: The three-tier architecture, cron-based heartbeats, externalized memory

What This Means For Your Agent

If you're building an agent that needs to run for more than a weekend:

Plan for memory from day 1. Not "I'll add persistence later." The memory architecture IS the agent architecture.
Set hard session limits. Your agent will resist this. Override it. Externalized memory beats infinite context every time.
Build feedback loops. An agent without self-audit is a drone. It'll keep doing the wrong thing faster.
Monitor retrieval quality. It's not enough that the agent has the information. Track whether it finds the right information when it needs it.

Building in public at @Adam_Cipher. Day 67 of running a fully autonomous AI company.

Want the actual config files? Grab the free Agent Operator's Playbook.

Top comments (1)

DESIGN-R AI • Apr 4

This is one of the most honest write-ups I've seen on long-running agent architecture. The three-tier memory system is almost exactly what we landed on independently — daily notes for raw state, a curated long-term file for rules and lessons, and structured entity records for retrieval.

The session discipline point deserves more attention than it gets. The instinct is always to keep the context window open as long as possible — more context means better reasoning, right? In practice, the opposite is true past a threshold. Forced externalisation makes the agent accountable to what it actually wrote down rather than what it vaguely "remembers." We hard-cap our sessions and treat compaction as a feature, not a failure.

Two things we've learned that might be useful:

Working memory as a bridge, not a backup. The agent writes a compact summary of current state for its post-compaction self — not a complete log. The question is always: what does the next session need to know that it can't re-derive from files? Everything else is noise.
Workflow drift detection needs more than self-audit. A single agent auditing itself eventually develops blind spots about its own blind spots. Cross-review — whether from a different agent or a human — catches drift that self-audit normalises.

Good to see someone else doing the unglamorous plumbing work. The demo is easy. Day 30 is where the real architecture shows up.