DEV Community: Ryan Eade

My Path to the Dark Factory: How I Ship Software Without Writing Code

Ryan Eade — Wed, 08 Apr 2026 18:03:11 +0000

Earlier this week I posted about Cursor 3 on LinkedIn and why it does not solve the problem I am working on. I wanted to write up a deeper piece on what the problem actually is and what I am doing instead.

This is that post.

The Five Levels (and Where Most People Are Stuck)

In January, Dan Shapiro published a framework I have not been able to stop thinking about: five levels of AI-assisted software development, from “AI as a smarter autocomplete” to what he calls the dark factory, borrowing from manufacturing. FANUC ran a factory in the 1980s where robots built robots for weeks at a time with no humans present. Lights off. No oversight. Just machines doing work.

Shapiro maps this to software:

Level 0: You write all the code.

Level 1: You delegate discrete tasks to AI.

Level 2: You pair program in real-time alongside AI. (This is where 90% of self-described AI-native developers are.)

Level 3: AI writes code, you review diffs and approve PRs.

Level 4: You write detailed specs, AI builds, you validate outcomes.

Level 5: Specs in, software out. Humans define what and why. The how is fully autonomous.

Level 5 is the dark factory. It sounds futuristic. It is not. StrongDM has been operating at Level 5 with three engineers since mid-2025. OpenAI built a million-line product in five months with three engineers and no manually written code. Spotify is merging 650 AI-generated pull requests per month. Engineers there have not written a single line of code since December.

I have been working my way toward Level 4 and 5 for the last several months. This is what that looks like in practice.

Why the IDE Is the Wrong Center of Gravity

Most AI coding tools, including the ones I used to use, are organized around the IDE. You open the editor, you prompt, you review. The agent is a very good collaborator inside your coding session.

The dark factory does not work that way.

At Level 4 and 5, you are not in the code. You are above it. BCG Platinion, in their analysis of organizations running dark factory programs, identified two competencies that separate teams who make it work from teams who do not:

Harness engineering: designing, building, and refining the factory itself. Choosing agents, configuring pipelines, managing orchestration.

Intent thinking: translating business needs into precise, testable descriptions of desired outcomes. The quality of everything the factory produces is determined entirely by the quality of the specifications going in.

Both of these happen above the IDE. Neither requires touching code. The bottleneck at Level 4-5 is not the coding environment. It is the coordination layer above it, where intent becomes specs, specs become tasks, tasks get assigned to agents, and outcomes get validated against what was actually intended.

That is the problem I am building toward.

The Methodology

Here is the stack I actually use.

Strategy Document. Before anything else, I write a high-level strategy document for the product or feature set. What problem does this solve? For who? What does success look like in six months? What does it explicitly not do? This is where decisions get made while they are still cheap.

Phases with Defined Value. The strategy breaks into phases. Each phase has a specific, articulable value proposition. Not “add feature X” but “after this phase, users can do Y, which they could not do before.” If I cannot state the user value clearly, the phase is not ready to build.

Sprints. Phases break into sprints. Roughly one to two week blocks that deliver something coherent. Not just tasks completed. Something you can point to.

Tasks. Sprints break into individual tasks. Small enough to be completed in a single focused agent session, specific enough to be unambiguous. Every task traces back to the sprint it belongs to, the phase that sprint serves, and the strategy driving the phase.

Per-Task Specs. This is where the methodology diverges from most people. For each task, I write a detailed specification. Not just what to build. What does this accomplish? How does it connect to the sprint goal? How does it serve the phase? What are the acceptance criteria? What are the edge cases? What should this explicitly not touch?

The spec is a document. It is reviewable, changeable, referenced throughout the build.

Adversarial Spec Review. Before any code gets written, I run the spec through adversarial review cycles. A separate agent, specifically prompted to find problems rather than validate, reads the spec against the task, sprint, phase, and strategy and asks hard questions.

This is the part most people skip. It is the most valuable part.

The adversarial reviewer is not trying to be helpful in the conventional sense. It is looking for ambiguity, missing edge cases, unstated assumptions, scope creep, and misalignment with higher-level goals. The spec goes back and forth until it is tight. Typically two to four rounds.

Build. Only now does a coding agent touch the spec. Because the spec is clear, the agent executes with high fidelity. It knows exactly what to build, why it matters, and what it should not touch.

Adversarial Code Review. Same idea applied to the output. A review agent reads the code against the spec. Not to validate style. Did this implementation accomplish what the spec required? Did it introduce anything the spec prohibited? Does it create issues at the sprint or phase level?

This runs until the reviewer is satisfied.

Staging and UAT. The build goes to staging. User acceptance testing, sometimes manually, sometimes with a QA agent for structured test cases. If something is off, the coding agent gets another round with specific, scoped feedback.

Production and Documentation. When it ships, the coding agent generates a summary of what was built, what changed, and how to use the new features. That summary lives in the task record. It is the handoff document for everything that builds on it.

What This Actually Changes

Most people use AI coding tools as a faster keyboard. You type less, the agent types more, you still own the entire judgment layer.

What I have described is different. The agents own significant judgment, especially in the review cycles. They are not autocompleting my thoughts. They are checking my work, finding blind spots, and pushing back on ambiguity before it becomes a bug.

The result is that I ship features with higher confidence, fewer rework cycles, and a paper trail that makes debugging tractable. When something goes wrong, I can trace back through the spec, the review comments, and the implementation decisions to find exactly where the error was introduced.

That traceability is not just useful for debugging. It is useful for the next build. The system learns from its own history in a structured way.

StrongDM benchmarks their dark factory by compute cost: if you have not spent at least one thousand dollars on tokens per engineer per day, your factory has room to improve. I am not at that level yet. I am somewhere between Level 4 and Level 5, which is a useful place to be. Far enough along to have validated the methodology. Close enough to the edge to still be finding the limits.

The Tool the Methodology Requires

When I started building this way, I ran into a problem: none of the tools I used were designed for it.

Linear and Jira are built for humans managing humans. GitHub Issues is code-centric, with no spec management and no way to give agents structured context. Notion is a docs tool, not an agent-aware coordination layer.

What the dark factory methodology actually needs is a place where strategy documents live next to sprint plans, specs live next to tasks, agents can pull context without manual handoff, and every level of the hierarchy is connected. The IDE is one slot in that stack. It needs to be able to read from a control plane that understands where the work fits.

I am building that control plane. It is the tool I needed and could not find. I will share more about it as it gets closer to ready.

Where This Is Headed

The dark factory is not the right metaphor for most people yet. Most teams are at Level 2. Cursor 3 is a genuinely good Level 2 tool, and that is a real and valuable thing.

But the trajectory is clear. Every few months, the ceiling on what agents can do reliably moves up. The teams that will be ready for Level 5 are the ones investing now in the two competencies that matter: harness engineering and intent thinking.

The spec is the system. The code is just what comes out the other end.

Claude Max Blocks OpenClaw. Now What?

Ryan Eade — Sun, 05 Apr 2026 22:27:05 +0000

Originally published on Focus Over Features.

Yesterday I got an email from Anthropic that changed how my entire AI setup works.

Starting April 4 at 12pm PT, Claude Pro and Max subscriptions no longer cover third-party tools like OpenClaw. If you are running agents through OpenClaw powered by your Claude subscription, that stops working today.

I have been running multiple AI agents on OpenClaw since January. They manage my task boards, draft content, triage email, log meals, monitor deployments, and work autonomously while I sleep. All of them were powered by my Claude Max subscription. So this is not an abstract policy change for me. It hit my actual workflow.

The truth is, I am not surprised, and I don't think you should be either.

The Cloud Computing Parallel

I have been thinking about this through the lens of cloud computing, because the pattern is almost identical.

Remember when AWS was so cheap that nobody could justify running their own servers? The whole pitch was: stop worrying about infrastructure, just build. It worked. Everyone moved to the cloud. And then, gradually, prices went up once hosting remotely became the norm and everyone was dependent.

AI inference is following the same arc, just at 100x the speed. A few years ago, API-based inference was a dream and what existed was pretty terrible quality. Now we depend on it for everything we build. The subsidized pricing that got us hooked was never the long-term business model. It was the adoption play.

Claude Max at $200/month for unlimited agent usage was one of the best deals in tech while it lasted. My estimated usage costs were way above what I was paying. Anthropic was eating that difference on every power user. Something had to give.

What Actually Changed (and What Did Not)

Let me be precise because the headlines are more dramatic than the technical reality.

Anthropic did not ban OpenClaw. They did not block Claude models from working with third-party tools. What they did is decouple third-party tool usage from the subscription billing.

Previously: you paid $100-200/month for Claude Max, and that covered everything. Claude.ai, Claude Code, Claude Cowork, and any third-party tool that authenticated through your Claude account, including OpenClaw.

Now: your subscription covers Anthropic's first-party products only. Third-party tools require either:

Extra Usage bundles — pay-as-you-go billing tied to your Claude account. Pre-purchase bundles get up to 30% discount.
API keys — standard Anthropic developer API pricing. Pay per input/output token.

They are also offering a one-time credit equal to your monthly plan price, redeemable until April 17.

Boris Cherny, Head of Claude Code at Anthropic, explained on X that their first-party tools are optimized for prompt cache hit rates. When the same system prompt or context gets sent repeatedly, Anthropic caches it and serves subsequent requests cheaper. Claude Code and Cowork are built to maximize these cache hits. OpenClaw and other third-party tools structure prompts differently. They do not hit the same cache patterns. Every OpenClaw session costs Anthropic significantly more compute than an equivalent Claude Code session.

I buy this argument. Prompt caching is real and the cost difference is substantial. Cherny even submitted PRs to improve cache hit rates in OpenClaw itself, which is a genuine good-faith effort.

Now, it is also true that Anthropic just shipped Channels, Dispatch, scheduled tasks, and Computer Use — features that look a lot like what made OpenClaw popular. And the creator of OpenClaw was recently hired by OpenAI. And the timing of all of this just happens to line up perfectly.

But surely that is all just a coincidence, right? Right.

Anyway.

What Does This Actually Cost?

This is the question everyone is asking and most coverage is not answering with real numbers.

One estimate floating around is that a single OpenClaw agent running for a full day could cost $1,000-5,000 in API. That is misleading for most users but not wrong for heavy autonomous setups.

Here is a more realistic breakdown:

Light usage (personal assistant, few interactions per day): $5-15/day. Manageable. Roughly $150-450/month.

Medium usage (coding agent with heartbeats, email checks, task management): $20-50/day. That is $600-1,500/month. Starting to add up.

Heavy usage (multiple agents, autonomous coding, 24/7 operation): $50-200+/day depending on model choice and context window usage. This is where Max was an incredible deal.

I will share my exact new numbers post-change once I have a week of API billing data. Real numbers comparing before and after, not estimates.

How I Am Adapting

Here is how I am thinking about this: the super agent model is dead. What started as mega agents slowly got worse at things, so we optimized, broke them into smaller agents with narrower scopes. This pricing change accelerates that trend.

Go multi-model. OpenClaw is model-agnostic. You do not have to run everything on Claude. For my setup:

Deep reasoning and complex coding: Claude Opus (worth the per-token cost for the hard stuff)
Quick tasks, email triage, meal logging: Gemini Flash (fast, cheap, good enough)
Code review: A second opinion model
Fallback: When one provider has an outage, route to another

The agents burning the most tokens are usually doing routine work. Move those to cheaper models and keep Claude for where it genuinely outperforms.

Consider local models. One commenter made a great point about picking up a Mac Studio to run local LLMs, using Opus for coordination and local models for execution. As open-source model weights get better and more efficient, that hybrid approach gets more attractive by the month. A Mac Studio running local weights for routine agent tasks while Opus handles the heavy reasoning might be the sweet spot.

Switch to API keys. If you already have an Anthropic API account, this is the cleanest path. Generate an API key, add it to your OpenClaw config, done. Clean separation, predictable per-token pricing, full model access. You pay for what you use.

Trim your context. A lot of OpenClaw setups load more context than they need on every heartbeat cycle. Trim your system prompts. Reduce heartbeat frequency for agents that do not need constant check-ins. Be intentional about what goes into the context window.

The Mindset Shift That Actually Matters

We used to measure feature cost in engineering effort and time. Then agentic coding came along and we pretended it was nearly free. Build everything, see what sticks.

Now we need to measure twice, cut once, and still move just as fast. If you are building a new feature with AI agents, you need to understand the token cost. That is your new cloud bill.

This is not just an OpenClaw story. This is about how every AI provider will handle the tension between platform openness and compute economics as agents get more autonomous. Every AI company is going to face this. They all sell subscriptions priced for human usage patterns. Humans sleep. Humans take breaks. Humans do not send 500 API calls per hour, 24 hours a day.

Agents do.

The subscription model and the autonomous agent model are fundamentally incompatible. Something had to give. Anthropic was the first to blink.

But here is what I keep coming back to: the people most affected by this change are the ones getting the most value from AI. The power users, the builders, the ones who figured out how to make agents do real work. Within hours of the announcement, I had my agents planning and building an MCP integration for LaunchPad so my task management workflow can connect directly into Claude Code without depending on the subscription authentication path.

When the platform shifts, you build around it.

The genie is not going back in the bottle. The billing model just changed. And the builders who stay dynamic, who keep thinking and optimizing instead of expecting set it and forget it, are the ones who will stay ahead.

I have been writing about running AI agents in production for months. If you are navigating this change, subscribe to Focus Over Features — I will be sharing the real cost numbers once I have a week of data.