DEV Community: Chris Korhonen

When Everyone Runs a Factory

Chris Korhonen — Sun, 11 Jan 2026 20:00:44 +0000

The Traffic Jam

We've all seen the demos. One engineer builds a clone of Uber in a weekend. Another ships a full SaaS product in a week. The productivity gains are real—I've experienced them myself.

But here's the question nobody's answering: what happens when you put ten of those engineers in a room?

You don't get ten Ubers. You get a traffic jam.

The Factory Vision

Steve Yegge's Gas Town paints a compelling picture of the future. He runs 20-30 AI coding agents simultaneously, treating work like "slopping fish into barrels." It's an "idea compiler" where the developer becomes PM and the agents are the workforce.

For high-agency solo developers, this is liberating. You become the CEO of your own software factory. You think, agents execute. Throughput is maximized.

Gas Town is built for this reality. It has a "Refinery" to manage merge conflicts. A "Witness" to monitor agent health. A "Deacon" to coordinate workflows. Yegge explicitly targets "Stage 7-8" developers—people who already juggle five or more agents daily.

The vision is seductive. But most software isn't built by lone wolves.

What happens when you try to scale this to a team of 10-50 engineers? When 50 concurrent work streams are competing for the same codebase? When everyone in the org tries to run their own factory?

The gains are real. But there's a tax we haven't accounted for.

The Coordination Tax

Every agent swarm that saves individual IC time creates coordination overhead at the team level. I call this the Coordination Tax—and it manifests in three specific failure modes.

The Merge Conflict Multiplier

This isn't just syntax conflicts. It's semantic conflicts.

Two agents refactoring the same state management library, taking different approaches. One simplifies the API while another adds abstraction layers. Both changes "work" in isolation. Together, they're incompatible.

Gas Town needed a dedicated "Refinery" agent just to manage merge conflicts for one person. Imagine ten people, each running their own factory, all merging to the same main branch.

The math gets ugly fast. Traditional conflict resolution assumes humans can coordinate through conversation. Agent swarms don't have that luxury—they move too fast, touch too much, and lack the social awareness to slow down.

The CI/CD Bottleneck

Ten times the code output means your infrastructure costs explode.

Build queues become hours long. Test suites that used to run in minutes now take the better part of a day. Your pipeline becomes the bottleneck, not your engineers.

This is a hard physical limit on "multiplying factories." You can spin up unlimited agents. You cannot spin up unlimited compute without massive infrastructure investment. And even with investment, the coordination overhead of managing that infrastructure becomes its own tax.

The Review Cliff

Here's the part nobody talks about: it's harder to review code than to write it.

If an agent writes 1,000 lines, a human can't effectively review that for security vulnerabilities, logic flaws, and architectural coherence without slowing down to the speed of careful reading. The cognitive load of reviewing generated code often exceeds the time saved generating it.

This is the ultimate cap on throughput. You can generate infinite code. You cannot generate infinite attention. And attention is what catches the bugs, spots the architectural drift, and maintains product coherence.

The Math Isn't Linear

Ten engineers with five agents each equals 50 concurrent work streams. But the coordination cost doesn't scale 50x—it scales superlinearly because of dependencies, conflicts, and review bottlenecks.

Traditional organizations are structured around human collaboration patterns. Standups, code review, architecture meetings—these assume humans move at human speed. Agent swarms follow different patterns. They move faster, touch more files, and create more surface area for conflict.

The gap between human coordination patterns and agent work patterns is the tax.

The Spectrum of Adoption

Not every organization needs to run an org-wide factory. The question isn't "should we adopt agentic development?" but "how far along the spectrum should we go?"

I see four stages, each with different trade-offs.

Stage 1: Traditional

Agents assist, humans lead. The developer writes most of the code; AI helps with autocomplete, documentation, and answering questions.

Coordination tax: Low
Productivity gain: Modest
Works for: Organizations not ready for change, regulated industries, trust-critical products

This is where most teams were in 2024. It's a safe starting point, but you're leaving significant productivity on the table.

Stage 2: Individual Agents

Each IC has their own agent workflow. Some use Claude Code. Others prefer Cursor. Nobody's coordinating agent usage—everyone's doing their own thing.

Coordination tax: Medium—conflicts start emerging, quality becomes inconsistent
Productivity gain: Significant for individuals
Works for: Small teams, independent projects, early experimentation
Fails when: Work overlaps, the codebase is shared, quality matters

This is where most forward-leaning teams are today. It works until it doesn't.

Stage 3: Team Agents

Coordinated agent usage within teams. Shared merge queues. Team-level quality gates. Someone—often the EM—actively manages the coordination overhead.

Coordination tax: High—needs active management
Productivity gain: Substantial if well-managed
Works for: Product teams with clear boundaries, teams willing to invest in infrastructure
Requires: Dedicated coordination, shared tooling, explicit processes

This is the minimum viable approach for teams that want agentic productivity without chaos. But it requires real investment.

Stage 4: Org-Wide Factory

Gas Town at scale. Multiple teams, each running their own factories, coordinating through shared infrastructure and explicit protocols.

Coordination tax: Very high—dedicated roles, continuous coordination
Productivity gain: Transformative if you can pull it off
Works for: Organizations that make this a core competency
Requires: Rethinking org structure, new roles, new quality models

Few organizations are ready for this. The infrastructure and cultural requirements are substantial. But the organizations that figure it out will have a significant competitive advantage.

The Boundary Question

As you move along the spectrum, a fundamental question emerges: what should team boundaries be based on?

The Old Model

Traditional team boundaries map to technical domains:

"I own the Button component"
"I own the API layer"
"I own the iOS app"

This made sense when humans wrote all the code. Specialization improved quality. Clear interfaces reduced coordination overhead.

The New Model

When agents can cross technical boundaries easily, ownership needs to be based on something else:

Context depth: Who knows the system well enough to coach agents to the right solution?
Customer journey ownership: Who owns the user outcome end-to-end?

Here's a concrete example: traditionally, we split teams by Frontend and Backend. In the Factory model, we split by User Intent. The Checkout team owns everything from "Add to Cart" to "Order Confirmed"—because AI handles the full-stack implementation. The API contract is no longer the boundary; the business logic is.

Conway's Law says organizations design systems that mirror their communication structure. In an agentic world, this works in reverse: if agents can cross boundaries easily, you need to restructure teams around ownership and context, not technical specialization.

Boundaries should map to who can coach and who's accountable for the outcome.

Quality is Product-Dependent

Gas Town embraces a specific quality model: ship fast, fix fast. Yegge is explicit about this—work is "slopping fish into barrels." Some fish fall out. Some bugs get fixed two or three times. Throughput matters more than perfection.

This works for some contexts. It fails catastrophically for others.

Where Ship-Fast-Fix-Fast Works

Internal tools where users are tolerant
B2B products with patient enterprise customers
Products where iteration speed matters more than polish
Well-tested, easily reversible changes

Where It Fails

Consumer products where first impressions matter
Regulated industries requiring compliance and auditability
Trust-critical products: payments, security, health
Changes with high blast radius across the product

The Legacy Code Speedrun

Here's a pattern I'm seeing: AI optimizes for the immediate prompt, not long-term maintainability.

Agents don't think about the engineer who will read this code in six months. They solve the problem at hand. The result is often verbose, locally-correct code that's hard to maintain in aggregate.

We're creating legacy code at ten times the speed. Technical debt compounds at agent velocity.

The insight: There's no universal quality model. Leaders need to identify which model fits their product context—and it may vary across the org. Your internal tooling might embrace the fish-barrel approach while your core product requires rigorous human review.

The EM Question

If ICs become factory operators—each managing their own swarm of agents—what happens to Engineering Managers?

EMs as Meta-Orchestrators

The shift I'm seeing: EMs become coordination tax managers.

They don't manage people doing work. They manage the overhead of coordinating parallel agent swarms across IC portfolios. They ensure the merge queue doesn't become a bottleneck. They maintain coherence across what multiple orchestrators ship.

This is a fundamentally different job than traditional people management. It requires understanding:

How agent workflows interact
Where bottlenecks emerge
How to sequence work to minimize conflicts
When to throttle throughput to maintain quality

The Product Skills Question

There's a deeper shift happening: at what point does "can orchestrate agents well" become less important than "can decide what to build"?

The PM/Eng boundary is getting blurry. If the hard part is no longer implementation but direction, some EMs will naturally shift toward product management. Others will double down on the coordination and infrastructure side.

The role is bifurcating.

The Junior Engineer Gap

Here's a problem nobody's talking about enough: if seniors run factories and AI does the "doing," how do juniors learn?

The Traditional Learning Model

Juniors write code. They get feedback. They build intuition. Over time, they develop the mental models that let them make good architectural decisions and coach others.

The Factory Model Gap

In the factory model, juniors... watch agents write code? Review AI output? The hands-on learning that built their seniors' intuition isn't available to them.

And here's the paradox: the person who coaches the agent effectively needs deep context—understanding of the codebase, the users, the history, the failure modes. But how do you get deep context if you never did the work yourself?

The Pipeline Problem

You need to have done the work to orchestrate it well. But if AI does all the work, the pipeline of people who can orchestrate eventually dries up.

This isn't solved here. It's an open question the essay raises, not answers. But leaders moving toward Stage 3 and 4 adoption should be thinking about it now. How do you train the next generation of orchestrators when orchestration is all that's left?

Ownership in an Ephemeral World

Gas Town treats sessions as "cattle"—ephemeral, replaceable. Agents spin up, do work, hand off, and die. The work persists; the workers don't.

But ownership requires durability. Someone needs to maintain product coherence over time. Someone needs to understand why the system is the way it is.

Three Types of Ownership

The resolution is to distinguish between types of ownership:

Execution ownership: Who runs the agents? This can be ephemeral. Different people can orchestrate the same area over time.
Product ownership: Who ensures coherence? This must be durable. Someone needs to own the user outcome across agent sessions and personnel changes.
Context ownership: Who knows why? This is the real capital. The accumulated understanding of how the system evolved, what was tried and abandoned, what invariants must hold.

When code is cheap, context becomes capital. The person who can coach the agent to the right solution—because they understand the system, the users, the history—is the valuable one.

"Tools Will Fix This"

The obvious counterargument: this is early days. Tools will improve. AI will handle coordination automatically. Eventually, AI will review the AI's code.

What's True

Yes, tooling will improve. Gas Town itself is a coordination tool—a sophisticated one. More will come. The tax will decrease.

In six months, many of the friction points I've described will be smoother. Better merge handling. Smarter conflict resolution. More sophisticated quality gates.

What Remains

But the fundamental tension isn't technical. It's organizational.

Who decides what gets built? (still humans)
Who owns product coherence? (still humans)
Who bears responsibility for failure? (still humans)

Tools can reduce the coordination tax. They can't eliminate it. And they can't solve the "who decides" and "who's accountable" problems.

The Hallucination Cascade

"AI will review the AI's code" sounds like a solution. But consider what happens when Agent A writes code and Agent B reviews it.

If both agents share similar blind spots—which they will, since they're trained on similar data—errors compound silently. There's no human ground-truth anchor. The system drifts into incoherence without someone who actually understands what "correct" means.

This is the Hallucination Cascade: errors that multiply because there's no external reference point. You can improve it with multiple models, diverse prompts, and sophisticated verification. But you can't eliminate it entirely.

You need a human in the loop. The question is where, not whether.

Locate Yourself on the Spectrum

Where is your organization on the spectrum? Where do you want to be? What trade-offs are you accepting?

These aren't rhetorical questions. They require honest assessment.

If you're at Stage 1, you're safe but leaving productivity on the table.
If you're at Stage 2, you're seeing individual gains but coordination problems are emerging.
If you're at Stage 3, you're investing in coordination infrastructure. Make sure it's working.
If you're at Stage 4, you're on the frontier. Expect to pay the highest coordination tax—and potentially reap the highest rewards.

Avoid False Dichotomies

It's not "everyone runs a factory" versus "nothing changes."
It's not "production line" versus "studio"—different work types need different approaches.
It's not "agents replace engineers" versus "agents are just tools"—it's a new kind of work with new coordination challenges.

The Real Question

Not whether to adopt agentic development—that ship has sailed. The question is how to move along the spectrum consciously, paying the coordination tax intentionally rather than being surprised by it.

The organizations that win won't be the ones that move fastest to Stage 4. They'll be the ones that match their position on the spectrum to their actual capacity for coordination—and invest appropriately in the infrastructure, roles, and processes that position requires.

The factory vision is real. But factories at scale require more than machines. They require coordination, quality control, and humans who understand what "good" looks like.

Everyone can run a factory now. The question is: can everyone coordinate?

The General-Purpose Agent Has Arrived

Chris Korhonen — Sun, 11 Jan 2026 19:58:55 +0000

I haven't triaged my own inbox in months. I haven't manually organized a research note in over a year. When tax season came around, I handed Claude three years of documents and walked away. It found patterns I'd missed, flagged deductions I'd overlooked, and organized everything into a format my accountant actually thanked me for—and asked how I'd done it.

This isn't hypothetical. It's Tuesday.

If you use Claude primarily for code, you're using a general-purpose reasoning engine as a specialized tool. You're leaving most of its value on the table.

Drowning in Context

Knowledge workers are drowning. Not in work—in information about work.

According to Forrester, knowledge workers lose 30% of their workday searching for information. Post-pandemic research from Nakash and Bouhnik found that some workers now spend up to 1.5 working days per week just gathering and organizing information. Gartner reports it takes an average of 18 minutes to locate a single document.

We tried to solve this. For a decade, we built elaborate second brain systems—Obsidian vaults, Notion databases, Roam graphs, Evernote notebooks. We developed methodologies with acronyms: CODE (Collect, Organize, Distill, Express), PARA (Projects, Areas, Resources, Archives), Zettelkasten. Thousands of people took courses on how to build these systems.

Here's the uncomfortable truth: we solved the wrong problem.

Second brains are excellent at storage. They're terrible at thinking. You can have the most meticulously organized vault in the world, and you still have to do all the reasoning yourself. The bottleneck was never storage. It was synthesis.

A Category Error

When AI coding assistants emerged, we categorized them the way we categorize most software: by their primary use case. GitHub Copilot is a coding tool. ChatGPT is a chatbot. Claude is a coding assistant.

This was a category error.

What makes Claude good at code isn't a narrow capability tuned for programming. It's a general capability: the ability to take unstructured context, reason over it, and produce structured output. Feed it a messy codebase and a feature request, and it produces working code. Feed it a pile of research papers and a question, and it produces a synthesized answer. Feed it medical records from three different providers and ask for a timeline, and it produces one.

The mechanism is identical. Only the domain changes.

Code was the first killer app—not because AI is uniquely suited to programming, but because programmers were the first users with the technical sophistication to push the boundaries. They discovered what the technology actually was—a general-purpose reasoning engine—before the marketing caught up.

The rest of the world is still waiting for permission.

Consider this yours.

The Reframe

Stop thinking of Claude as a coding tool that can do other things. Start thinking of it as a general-purpose reasoning engine that happens to be packaged for developers.

The same context window that can hold an entire codebase can hold:

Your inbox (thousands of emails, full conversation threads)
Your financial records (bank statements, tax documents, receipts)
Your medical history (records from multiple providers, lab results, prescriptions)
Your research (papers, articles, notes, bookmarks)

If you can describe the context and the desired output, Claude can likely do it. That's not a coding skill. That's all knowledge work.

The question isn't whether the technology is ready. The question is whether you've updated your mental model.

My Playbook

Let me show you what this looks like in practice.

Research & Knowledge Management

Claude lives in my Obsidian sidebar. When I'm researching a topic, I don't just search my vault—I ask Claude to synthesize across it. It connects ideas I'd filed in different folders months apart. It identifies gaps in my understanding. It suggests questions I hadn't thought to ask.

When I save a new article or paper, I don't just file it. I ask Claude to extract the key claims, identify how they relate to my existing notes, and suggest where they should connect. My vault went from a graveyard of abandoned notes to an active thinking partner.

Email Triage

I point Claude at my inbox periodically. It reads everything, identifies what actually needs my attention, drafts replies to routine messages, and extracts action items into a structured list. What used to be a 45-minute daily ritual now takes about 10 minutes of review and approval.

The key insight: most emails don't need me—they need information or a standard response. Claude handles those. I handle the ones that actually require human judgment.

Financial Analysis

Tax season used to mean a week of gathering documents, categorizing expenses, and second-guessing whether I'd missed something. Now I export my records, hand them to Claude, and ask specific questions: "What deductions might I be missing for home office expenses?" "Are there any unusual patterns in Q3 spending?" "Organize these documents for my accountant."

It's not replacing my accountant. It's making me a better client.

Medical Records

I've collected medical records from four different providers over the past decade. None of them talk to each other. Getting a coherent timeline of treatments, medications, and test results used to require hours of manual compilation.

Now I hand Claude the stack of PDFs and ask: "Create a chronological health timeline. Flag any patterns or concerns I should discuss with my doctor." I walk into appointments with questions I wouldn't have known to ask.

The Meta-Level

But the real shift happened when I stopped using Claude for individual tasks and asked it to look at everything.

"Look at my docs and pull together interesting info."

It came back with a meticulous knowledge base: projects, personal, financial, health—each section filled with synthesized information I'd scattered across years of notes. Connections I'd never made. Patterns I'd never noticed. A structure I wouldn't have thought to create.

Claude didn't just work within my system. It helped design the system.

Building Your Command Vocabulary

Once I saw what was possible, I wanted to systematize it. I noticed I was typing the same prompts repeatedly—same preamble, same instructions, same output format. So I built custom commands.

Think of it like a personal CLI for life. Unix commands each do one thing well: ls, grep, cat, sort. My Claude commands work the same way:

Category	Commands	Purpose
Capture & Research	`/note`, `/todo`, `/research`	Input goes in, structured output comes out
Daily Rituals	`/status`, `/eod`, `/standup`, `/prep <meeting>`	Woven into the rhythm of work
Maintenance	`/cleanup`, `/organize`	Keeping entropy at bay

/status gives me a current state across all projects. /eod wraps up my day—summarizes what happened, identifies loose threads, sets up tomorrow. /prep <meeting name> pulls relevant context and talking points before I walk into a call. /research <topic or url> does a deep dive and returns structured findings.

These aren't productivity hacks. They're a vocabulary. And like any vocabulary, once you have the words, you can express thoughts you couldn't before.

The progression looks like this:

Discover Claude works beyond code
Start using it for specific domains
Notice repetitive prompts
Build custom commands
Now you have a personal operating system

Trust But Verify

I'm not going to pretend this is magic.

Claude still hallucinates occasionally. Specific facts need verification. Dates and numbers deserve a second look. For anything high-stakes—medical decisions, legal documents, financial filings—Claude is a powerful first pass, not a replacement for professional advice.

Trust but verify. Let Claude do the synthesis. Apply human judgment where it matters. This isn't fundamentally different from how you'd treat any capable assistant—you'd still review their work on important matters.

The difference is the breadth. Most human assistants specialize. Claude doesn't have to.

What Are You Leaving on the Table?

If you're only using Claude for code, what else could you be doing with a general-purpose reasoning engine that can hold 200,000 tokens of context—roughly 150,000 words, or three novels' worth of your life?

The knowledge workers spending 30% of their day searching for information—that's solvable. The second brain systems that store but don't think—Claude thinks. The administrative overhead of email, scheduling, document organization—most of it is pattern recognition and text transformation, which is exactly what these models excel at.

The agent is ready. The capability is here. The bottleneck is the mental model that says "this is a coding tool."

It's not. It's a reasoning engine. And reasoning is what knowledge work is.

The question isn't whether AI can help with the rest of your work. It can. The question is whether you're willing to find out what you've been leaving on the table.

Start with one domain. Build one command.

The general-purpose agent has arrived. What are you waiting for?