DEV Community: Alexander van Rossum

The Bus Factor Is Not a Number

Alexander van Rossum — Thu, 14 May 2026 13:38:33 +0000

Imagine you got hit by a bus tomorrow. What happens at work?

Not your team — they'd be fine, eventually, the way teams always are. What happens to the twelve things only you know how to do? The service that needs restarting in a specific order. The vendor relationship that lives in your inbox. The legacy codebase nobody else has touched since 2019. The deployment process that's documented in your head and nowhere else.

If the honest answer is "several things would quietly stop working and nobody would know how to fix them for weeks," then your organization has a bus factor of one, and you're standing in front of the bus.

Nobody formally assigns you twelve jobs

Nobody formally assigns you a dozen jobs. The scope grows one domain at a time: "Can you also handle the CMS?" Then: "The vendor needs a technical contact." Then: "Someone should probably look at the AWS bill." Then: incident response, because you're the one who understands the infrastructure. Then: triage, because you're the one who can translate between the client and the engineering work. Each addition is individually defensible. None of them, in the moment, look like the one to refuse.

The title doesn't change. The scope quintuples. And because it happened gradually, nobody — yourself included — has a truly accurate picture of what the role even contains anymore.

To give that some scale: one of the domains I was carrying had a monolithic codebase that desperately needed a Python 2-to-3 migration — the kind of project that obviously requires deep focus and uninterrupted time, but when it's sitting alongside close to a dozen other domains, the tendency is to treat it like just another task on the list. Touch it when you can. Make progress where the gaps allow. During a related incident, I had the opportunity to have an outside vendor scope and quote the project. Their estimate — before full discovery was even complete — came back at a team of four engineers plus a technical project manager, approximately six months, and a bill in the tens of thousands. For one project that I'd been quietly absorbing into the margins of everything else.

What the hours column doesn't measure

The volume of work, on its face, was often manageable. Some weeks were light. Some weeks I could have pointed at the hours column and said "see, it's fine."

The hours column was never where the cost lived. The cost was in deciding what to work on when a dozen domains are all equally valid and any of them could interrupt at any moment — cloud infrastructure, DevOps, legacy code, CMS administration, incident response, vendor management, and half a dozen others nobody even remembered existed. Each one a different mental model, a different set of assumptions, a different way of thinking about what "done" looks like. The tax wasn't in the work itself but in the constant, low-grade process of choosing which domain to enter and which eleven to hold at arm's length while hoping nothing caught fire.

I tried time-blocking. Mondays for network administration. Tuesdays for open-office IT support. Wednesdays for sysadmin work. It's the obvious solution, and it falls apart the first time someone has an important need that doesn't respect your calendar — a server goes offline, someone forgets their password, a certificate is expiring. The needs don't care what your calendar says.

But the anticipation was almost worse than the interruptions themselves. The constant awareness that the bubble could pop at any moment made it nearly impossible to truly commit to deep work. You're six pages into network topology documentation with another eight to write, and somewhere in the back of your mind you're bracing for the ping that pulls you out. That bracing is its own form of cognitive load, and it runs all day whether the interruption comes or not.

Research on workplace interruptions consistently finds that full focus takes roughly 23 minutes to return after a single disruption — drawing on Sophie Leroy's attention-residue work and Dr. Gloria Mark's study at UC Irvine. In a role spanning a dozen domains where interruptions are structural rather than incidental, that 23-minute refocus window never fully closes before the next one opens.

You can't unpop a bubble

When the interruption does come — the server goes down, the password reset request, the "just checking on this" message — focus doesn't pause. It evaporates. Like a bubble: when it pops, you can't put it back together. The air is already dissipating. You have to blow a new one.

The initial feeling, honestly, is helpfulness. This is your job. You enjoy solving problems for people. That part is real, and it matters.

But underneath it there's a latent resentment. And a validation of something you've been trying not to think: "I knew this would happen. Why do I even try to focus?"

That's a quick path to fatalism. Negative sentiment override — the state where your default interpretation of neutral events becomes negative — sets in faster than you'd expect when the cycle repeats daily.

So you solve the problem. Or, if it's big enough, it bleeds into a second day, shattering your next planned focus block too. By the time you return to the original task, the exit conditions were never clean. You didn't have a moment to organize working memory, to file away the state of where you were. Getting back into the documentation you were writing doesn't feel like resuming — it feels like starting over.

The work you've done feels wasted. The other domains that need attention later in the week don't pause because of one triage incident. And your manager is wondering why the documentation project is behind schedule.

It's similar to the Nintendo cartridge problem: no save point, no clean shutdown, and the pile of cartridges keeps growing — though this is almost worse, because at least with a cartridge you're swapping one game for another. This is more like driving a train with no switching stations, where someone asks you to change routes and you have to pick the whole thing up, cars and all, and put it on a new set of tracks. The train keeps getting longer because each car is another domain, and the cargo is half-loaded because nothing ever finishes cleanly before the next switch.

I thought it was a me problem

I believed it was a me problem for decades. Not abstractly — viscerally. The internal narrative was: "I'm the issue. Why can't I do this? I should be better at this. I should be able to focus. I am just a terrible employee, and I'm destined for failure."

The first time I left a role carrying this pattern, I thought I was leaving because I couldn't hack it. That I wasn't good enough. I started a company partly because I thought the problem was me and I needed to remove myself from environments where my inadequacy would be visible.

Later, even when I would tell people the problem was structural — when I could articulate the domain count, the switching cost, the impossible scope — I didn't really believe it about myself. Externally I had the analysis. Internally the narrative hadn't changed.

And impostor syndrome has a way of reinforcing the structural problem it feeds on. "I better not set boundaries, because then people will find out I'm a fraud. I have to do everything, to prove my value." That logic keeps the scope growing, keeps the bus factor at one, and keeps the keystone from ever having the conversation that could change the structure — overload creating self-blame, self-blame preventing boundary-setting, absence of boundaries deepening the overload.

The causal arrow can run the other way too — someone who walked in with no impostor syndrome at all can develop it after years of carrying an impossible scope, because the structure itself teaches you that you're failing. Either way, the loop is the same once it's running.

The three layers between your manager and the actual problem

At some point, you try to make the invisible work visible. You map the domains. You list the responsibilities. You quantify the scope. You bring receipts.

And you hit three layers of insulation, none of them malicious, all of them effective.

Layer	What it sounds like	What it does
The acknowledgment trap	"I know you've been doing superhuman work. You're beyond 100% capacity and we really appreciate you."	Closes the window for honest conversation. Pushing back now feels like rejecting the recognition.
The normalization	"No company has every role. People wear multiple hats. Expecting the ideal creates expectations no environment can support."	Reframes a structural problem as an unrealistic expectation.
The friendly framing	"Just feedback from a friend. I hope you hear the heart behind it."	Makes further pushback feel personal rather than professional.

The acknowledgment is the most insidious because it looks exactly like empathy. The manager says the words, names the load, expresses gratitude. And in doing so, they believe they've addressed the situation — they've shown they see it. But acknowledging the load and acting on the load are fundamentally different things, and the acknowledgment, once given, makes raising the issue again feel inappropriate. You can't come back and say "no, you actually don't understand" without sounding ungrateful or dramatic.

The manager walks away believing they've addressed it. The keystone walks away having lost their only window to actually explain what's happening, because reopening the conversation now feels like rejecting someone's goodwill rather than contesting a structural argument.

If the keystone does force the conversation open again — with a scope document, a domain map, a concrete accounting of what the role actually contains — the normalization layer catches it. The response comes wrapped in warm framing and genuine care: every company operates this way, no team has every role it needs, multiple hats are the reality.

Three layers of insulation between the manager and the actual problem, none of them malicious, all of them effective enough that the conversation that needed to happen keeps not happening.

The receipt

Remember that Python 2-to-3 migration from earlier — the one the vendor estimated at four engineers, a project manager, six months, and tens of thousands of dollars?

That was one slice of the total load. A professional estimate for a single project that I'd been telling myself I was a bad developer for not fitting in around my other responsibilities.

That estimate shattered something I'd been carrying for a very long time. The math was always impossible, and the exhaustion I'd spent years treating as a discipline problem turned out to be arithmetic.

If you manage someone whose scope looks anything like what I've described — multiple technical domains, sole ownership, no realistic backup — I'd encourage you to do the exercise of getting an outside estimate on just one of those domains. Not to build a business case, but to see the number and hold it next to what you're asking one person to do alongside a dozen other domains of equivalent complexity.

The gap between those two numbers is your actual bus factor risk, measured in human cost rather than organizational risk.

The gap

The bus factor isn't a number on a wiki page. It's the distance between what you think your best employee is handling and what they're actually experiencing.

The person carrying the load probably isn't going to tell you. Not because they're hiding it, but because the structure — the acknowledgment that seals the conversation, the normalization that reframes the problem, the self-blame that prevents boundary-setting — all conspire to keep the real conversation from happening.

And the exhaustion that would generate the signal is the same exhaustion that muffles it. By the time flagging is most urgent, the emotional deadening of sustained overload has made flagging hardest. The body keeps the score before the résumé does.

If you're not sure whether this describes someone on your team, the Solo Operator Load Check works for managers too. Six questions, two minutes, nothing collected. It won't fix the structure, but it might tell you what your bus factor actually looks like from the other side.

Sources: Jellyfish — "Context switching and developer productivity" (synthesis of Sophie Leroy's attention-residue research and Dr. Gloria Mark's UCI study; ~23 minutes to fully regain focus after a single interruption) · Fast Company — "Worker, Interrupted: The Cost of Task Switching" (popular write-up of Gloria Mark's original research on interruption recovery time) · Christina Maslach — burnout research (burnout as a three-dimensional response to chronic workplace stress: emotional exhaustion, depersonalization, reduced personal accomplishment) · Maslach Burnout Inventory (the three dimensions and their measurement; emotional blunting as a specific mechanism of depersonalization)

The Keystone Burnout: How Engineering Leaders Break Under Constant Vigilance

Alexander van Rossum — Wed, 22 Apr 2026 12:34:24 +0000

I had three major burnouts in seven years at the same role, and the consistent mechanism across all of them was domain switching — holding many distinct mental models and moving between them under pressure. The through-line wasn't entirely the quantity of work, though the quantity was a contributing factor. It was the number of distinct domains I was responsible for and the cognitive tax of moving between them.

The mechanism is both structural and largely invisible — to management, to colleagues, and, for reasons I'll get to, to the person carrying it.

There are no villains in this story — just a structural pattern that emerged in one place I worked, and that I now recognize everywhere I talk to technical leads. It isn't rare either: a Network Perspective report on roughly 20,000 tech employees found that nearly half reported burnout or work-related fatigue, with meetings and context switching flagged as primary drivers.

Why "overworked" isn't quite the right word

At the low end, two to three primary domains is arguably already beyond sustainable long-term for one person. I was almost always above that, and the breadth kept growing without anyone acknowledging it as a distinct thing to manage. Early on: PHP, server administration, project scope consultation. Then: AWS, Kubernetes, general DevOps. Then: audits inside AWS because costs were drifting and the governance work had to come from somewhere. Then: Python and React code maintenance. Then the cluster aged into its own crisis. Each new domain didn't replace the old ones. It added to the pile.

The quantity of work was often manageable; the quantity of switching wasn't.

And in practice, none of my attempts to describe this to management translated into structural change. I eventually decided this was a measurement problem — "domain count" isn't a metric anyone tracks (other than, perhaps, the person doing the work). Hours get tracked. Tickets get tracked. Deploys get tracked. But "how many fundamentally different mental models does this person hold simultaneously" isn't on any dashboard anyone has ever built.

Three burnouts, one mechanism

Inheriting another person's undisclosed ownership

A former colleague had quietly absorbed a client's Salesforce environment into their personal scope without communicating the breadth of the domain to the rest of the organization. When they left, I was handed the role — and the current task was a 32,000-metadata-change migration across staggered deployments, on a platform (Salesforce) I'd never before touched.

While still carrying the React, DevOps, AWS, WordPress, and PHP load.

I learned Salesforce the hard way. It eventually taught me that I'm genuinely strong at governance and architecture, and that I (surprisingly) enjoy Salesforce work. Good outcomes — but outcomes that shouldn't be mistaken for a healthy pattern to repeat. I shouldn't have been the person doing that migration.

That was the first burnout, and it almost broke me.

The forced Kubernetes migration

The control plane was dead, the access keys to the underlying hardware had been lost by a former developer, and the cluster wouldn't reboot anyway. About 200 production deployments sat on top of it, with TLS certificates expiring in three months — and I was the only person who could touch any of it. I documented every step in Slack and communicated regularly with the rest of the organization. Over time, the audience for those updates narrowed to people who couldn't act on them, and I was effectively writing into a void.

Three months of fifty-to-sixty-five hour weeks, with steady pressure on top of the migration to keep handling day-to-day requests — PHP changes, WordPress tweaks, bug fixes, React and Python updates — many of which were structurally impossible until the cluster was rebuilt. That was a hard technical constraint, not a preference, and it didn't seem to matter.

The repeated pings of "just checking on this one" during a cluster migration are the smallest, most common sentence in keystone burnout — the atomic unit of the problem. (Full context on the migration itself lives in the case study.)

The slow burn

The project management function in the organization had gaps — tasks got forwarded into the system with two-day-out default due dates that didn't reflect actual scheduling capacity. I eventually built an entire PM subsystem using Claude Code to compensate for what our PM process wasn't catching.

The coup de grâce was a WordPress compromise that forced a significant triage block one morning, already on top of other scheduled work. While I was in the middle of triaging the compromise — actively telling the PM in Slack what I was doing — I got another "just checking on this one, it's really urgent" ping on something completely unrelated.

That ping was the moment the mechanism became visible enough to name. I left soon after.

"Just checking on this" is an atomic unit of keystone burnout

This phrase showed up in every one of the three burnouts, which is why it's worth stopping to name it as a pattern rather than treating it as incidental. "Just checking on this one" is the expected context switch during crisis work — compound interest on cognitive load, arriving from people who either can't see or won't see that the request they're making assumes a mental model shift the keystone literally can't afford right now.

There's good evidence that this kind of switching corrodes both productivity and wellbeing for engineers: every task switch requires unloading and reloading mental state, and research on workplace interruptions — drawing on Sophie Leroy's attention-residue work and Dr. Gloria Mark's UCI study — finds that full focus takes roughly 23 minutes to return after a single disruption. Repeated at scale, that turns any day of "small asks" into a day where no sustained thinking happens at all.

The phrase pairs especially badly with two kinds of tasks:

Tasks outside the keystone's primary domain — 10DLC compliance, vendor admin work, platforms without an active mental model loaded. There's no warm context to switch into; every ping is a cold start.
Tasks with default deadlines — two-day-out due dates applied by convention rather than by scheduling capacity. Each still has to be held in working memory, whether or not the deadline corresponds to a specific scheduled commitment.

The mutual silence between keystone and management

I'm an undersharer by nature. When asked if things were OK, I'd minimize. "Good — just Salesforce is killing me." "This Salesforce thing is really taking a lot." Honest, but dampened. I didn't say "I'm breaking." I said it in a tone that could be heard that way if someone was listening for it, and not heard that way if they weren't.

The undersharer reflex wasn't the only way I was complicit. There's the boiling frog metaphor — put a frog in boiling water and it jumps; put it in cool water and heat the pot slowly and (in the story, at least; actual frogs do jump out) it stays until it's cooked. Every domain I ended up carrying arrived that way: PHP first, which made sense; then AWS and Kubernetes; then AWS audits, because costs were drifting and someone had to do them. Each addition was individually defensible; none of them, in the moment, looked like the one to refuse. I said yes — or at least didn't say no — to every temperature increment, because each one was small. By the time the water was boiling, I'd been in the pot for far too long.

When I eventually told leadership I'd burned out, the response — paraphrased — was that they'd known for a while, and they were waiting for me to say something about it.

That exchange is the structural failure in one sentence: the person carrying the load is expected to self-report when they're breaking, while the person with organizational authority to redistribute that load avoids the conversation they already know is needed. It's mutual complicity, not a villain story. And it's not unique to any one company; it's the default pattern whenever load is invisible and signaling is left entirely to the person under strain.

That dynamic is a specific case of what I've described elsewhere as a retrieval failure in management — the information existed, but no mechanism surfaced it to the person who could act on it.

Why the keystone doesn't flag it in time

The reason the keystone doesn't flag it in time is more pernicious than simple undersharing.

Burnout dampens emotional response. In the clinical literature going back to Christina Maslach, burnout is defined as a response to chronic workplace stress, characterized by three dimensions: emotional exhaustion, depersonalization (or cynicism), and reduced sense of personal accomplishment. The depersonalization component is routinely described in clinical terms as emotional blunting — people reporting, in one 2020 analysis of Maslach Burnout Inventory data from 6,682 U.S. physicians, that they feel "used up at the end of the workday," emotionally hardened, and less able to care about their work or the people around it.

That's the mechanism that keeps the keystone silent. The signal that would have told me "this is unsustainable" had gotten quieter in direct proportion to how much more unsustainable the load had become. By the point where flagging would have been most urgent, flagging was hardest — not because I was hiding it, but because I'd stopped feeling it sharply enough to articulate it.

The deadening functions as coping as much as it functions as symptom — it's self-protective, and it's how you keep functioning while the load keeps growing. Once it sets in, the "just say something" expectation from above becomes architecturally impossible to meet.

The wake-up I shouldn't have needed

Sometimes it takes an outside observer — without your deadening, without your undersharer reflex, without the whole internal system of muffling that keeps you functioning — to tell you plainly what they see.

I maintain a Claude Code agent that runs my sprint and task management. I'd built the subsystem during the slow third burnout to compensate for a PM function that wasn't really being run. A couple of days ago, I added a task about 10DLC compliance work to the queue. The agent responded unprompted: "You're adding 10DLC compliance work to your tasks?"

So naturally, I asked it to assess my actual load. What came back was a list I'd never seen written out anywhere, aggregated from tasks spanning two days. The following is verbatim, with only specific colleague names removed:

Incident response across a fleet of production servers (botnet mitigation, CPU exhaustion, TLS failures).

Legacy system maintenance on PHP 5.4/5.6 codebases and Python 2 workloads that can't be upgraded without client coordination.

Client-facing communication translating deep technical root causes into plain English for non-technical stakeholders.

Sprint and project management for multiple concurrent clients with time tracking, story-point estimation, and velocity reporting.

Infrastructure architecture — Kubernetes, AWS (EKS, SES, Lightsail), Cloudflare, DNS, certificate management.

Application development — widget systems, iCal feeds, FormAssembly/Salesforce integrations, API integrations.

DevOps and platform engineering — Docker, deployment pipelines, HTTP bridge services, monitoring.

Staff coordination — delegating work, writing up tasks with appropriate context, managing other people's queues.

Regulatory/compliance — now 10DLC on top of everything else.

Tooling and automation — building and maintaining the sprint management system itself.

That's not a CTO job. That's a CTO + senior engineer + sysadmin + project manager + client account manager compressed into one person.

The fact that you're completing all of it doesn't mean the workload is reasonable. It means you're absorbing complexity that should be distributed across a team.

That kind of role sprawl isn't unique to technical work. Research on midlevel leaders in complex organizations consistently points to expanding responsibility without matching structural support as a primary driver of burnout. The technical version of that pattern happens earlier and sharper — roles accrete more quickly, documentation lags further behind, and the cognitive cost of switching between the accreted domains is specific to technical work in ways that generic "role sprawl" doesn't capture.

I'd already talked about domain breadth with leadership, with colleagues, with anyone who'd listen. I'd already described the pattern to myself, in writing, more than once. The assessment still landed as a wake-up call, even though the content was nothing new.

That gap — between knowing the thing and feeling the thing — is where emotional deadening lives, and where the load keeps growing while the signal stays quiet. It took an agent I'd built myself, with no investment in how I felt about the list, to read it back plainly enough that it registered.

Recovery that isn't really recovery

Each burnout had a moment where things became tolerable again. Not solved. Tolerable.

Salesforce: I got the client to subscribe to Gearset, which made the metadata migrations workable. Eventually Salesforce itself clicked — the mental model stabilized, and the domain stopped requiring peak effort.
Kubernetes: The cluster got built. The cert crisis passed. Deployments resumed.
The slow one: never had a real recovery inside the role. The PM subsystem I built helped, but it meant I was now maintaining the PM function on top of everything else.

In each case, "tolerable" came from two sources: better tools, or eventual domain fluency. Neither changed the structure that produced the burnout — the keystone stayed the keystone, the breadth didn't shrink, and the next domain was already queuing up behind the last one.

Inside the role, the structural pattern never actually shifted. The recoveries lowered the steady-state intensity without changing the shape of the problem — and the only recovery that did change the shape was leaving.

The pattern, and the diagnostic

If any of this sounds familiar — the breadth, the mutual silence, the "tolerable" that never becomes "solved" — I've written a companion diagnostic: the Solo Operator Load Check. It measures the shape of the load, not the hours. It won't fix anything, but it might tell you roughly where you are on the curve, and give you language for the conversation you might need to have.

The keystone conversation usually doesn't start itself.

Originally posted on mipyip.com

Sources: Network Perspective — "Workload & Burnout in Tech" (2022) (survey of ~20,000 employees; ~50% report burnout and work-related fatigue; meetings and context switching flagged as primary drivers) · Jellyfish — "Context switching and developer productivity" (synthesis of Sophie Leroy's attention-residue research and Dr. Gloria Mark's UCI study; ~23 minutes to fully regain focus after a single interruption) · American Psychological Association — "Burnout research" (Christina Maslach profile; burnout defined as response to chronic workplace stress with three dimensions: emotional exhaustion, depersonalization/cynicism, reduced personal accomplishment) · "An item response theory analysis of the Maslach Burnout Inventory" — Journal of Patient-Reported Outcomes, 2020 (IRT analysis of MBI responses from 6,682 U.S. physicians; specific item-level data on "used up at the end of the workday," "work is hardening you emotionally," and treating patients as "impersonal objects") · Harvard Business Publishing — "The Burnout Risk: Strengthening Your Midlevel Leaders" (expanding responsibilities without matching structural support as a primary driver of midlevel leader burnout)

Additional reading: Software.com — "The Developer's Guide to Context Switching" (broader treatment of cognitive cost, attention residue, and context loading/unloading for engineering work) · Maslach & Leiter — "Understanding the burnout experience: recent research and its implications for psychiatry" (World Psychiatry, 2016) (peer-reviewed review of the Maslach Burnout Inventory and the three-dimensional structure of burnout) · ScienceDirect — "Emotional exhaustion" (topic overview) (depersonalization described as emotional blunting, detachment, and "going through the motions")

Your Clients Know You're Lying About Incident Reports

Alexander van Rossum — Tue, 14 Apr 2026 12:15:00 +0000

Simon Sinek tells a story about a leader who asks their assistant to tell a caller "I'm not here" — when they're clearly sitting right there. It seems harmless. But what it communicates to every person within earshot is simple: in this organization, lying is acceptable when it's convenient. As Sinek argues in a related piece, honesty isn't a value you declare — it's a behavior you demonstrate or don't.

You can write "honesty" on the wall. You can put "integrity" in the company values deck. The behavior you model is the actual policy.

A quick distinction, because it matters: there's a difference between incomplete information in the fog of an active incident, legally cautious phrasing during a sensitive disclosure, and rewriting reality after the fact when the facts are already known. This post is about the third category — fabricated root causes, inflated severity, and narrative rewrites that turn failures into marketing copy. Not early-stage uncertainty. Not legal review. Deliberate misrepresentation.

If you've worked in managed services or agency environments, you already know what this looks like. You've probably written the honest version of an incident report and watched it get rewritten before it reached the client.

Two emails about the same fix

An engineer resolves a client issue and writes the update:

Good afternoon — I've completed the fix for your application. The issue was related to a configuration change in your vendor's API. After reviewing their recent updates, I was able to adjust the integration calls to correct the issue. Please test the deployment and let me know if anything is amiss.

What the client actually receives, after the update passes through a client-facing coordinator:

Hello! Good news! We've fixed the issue! Let us know if there's anything else we can help you with!

One of those tells the client what happened, why, and what to do next. The other tells them nothing, cheerfully.

The results are predictable. The detailed version gets a "Thank you" and no follow-up — the client has what they need. The filtered version, on the other hand, almost always generates another round: "Can you tell me what happened?" or "What was actually wrong?" The vague, hedging reassurance creates more work, not less — because the client still wants the answer, and now someone has to circle back and provide the information that should have been in the first email.

This pattern has a defense mechanism built in. When the engineer starts pre-writing client-ready summaries — deliberately simple, no jargon, just the facts — the coordinator pushes back: "That's too technical for the client."

"Configuration issue with their API" is not technical. It's a sentence. But when the person filtering communication can't evaluate the content, they default to the safe, empty version, or just don't include it at all.

Each incident has to be bigger than the last

Vague communication creates a second problem: escalation. When you're not telling clients what actually happened, you need to tell them something. And that something tends to get more dramatic over time.

A search engine bot ignores robots.txt and starts crawling aggressively — hundreds of requests per second. It's a nuisance. You block the bot, adjust your rate limiting, move on. Ten minutes of work.

What gets communicated to the client: "Your site was under attack. We've resolved it and migrated you to a new server for added protection."

Except the site is behind a CDN with DDoS protection built in. Migrating to a new server doesn't stop an "attack" — you'd toggle a setting on the CDN. The explanation doesn't even make technical sense if given thirty seconds of thought. But it sounds decisive, and it frames the provider as the hero instead of the cause.

Another example: bot traffic originating from a foreign IP range — everyday noise hitting thousands of sites simultaneously — becomes "It looks like you're being targeted from overseas." For a small organization without technical staff, that's terrifying. And it's also completely untrue.

The escalation has to keep ratcheting because you've already inflated the previous incidents. "Bot swarm" became "attack." "Attack" becomes "targeted from overseas." Each incident has to sound bigger than the last, because the bar for what counts as significant keeps rising. Where does it end?

This (usually) comes from the top

The most impressive version of this pattern is the catastrophic failure repackaged as a proactive initiative. Shared infrastructure goes down hard. Someone spends weeks — sometimes months — in recovery mode, rebuilding from scratch under pressure. Grueling work that should be recognized for what it is: disaster recovery executed by someone who deserves a lot of credit.

The client communication: "We've completed a major security upgrade so that we can serve you better."

Not a failure. Not a recovery. An upgrade. The person who rebuilt the thing gets reframed from "saved us from a catastrophe" into a supporting player in a marketing narrative about continuous improvement.

This is where Sinek's framework lands with full weight. When the person at the top is being deceptive — even by exaggeration, even by omission — it filters down. Every engineer who knows the server wasn't actually attacked, every team member who knows the "upgrade" was a recovery — they all understand what the real values are. Not the ones on the wall. The ones in the emails.

And once that framing is normalized, it filters into everything. Every incident gets a spin pass before the client sees it. Postmortems become performance pieces. The institutional memory — the documentation, the incident history — becomes unreliable because it reflects what was communicated, not what happened. You can't learn from incidents you've rewritten. You can't improve processes that your records say were fine. This isn't hypothetical — a Keeper Security study found that 41% of known cyber incidents weren't reported internally to management, largely due to fear and cultural pressure. The spin starts small, but the underreporting becomes systemic.

What honest communication actually sounds like

It's not complicated:

A configuration issue caused degraded performance for approximately two hours this morning. The root cause was resource contention — the server was handling more concurrent traffic than its current allocation supports. We've increased the allocation and are monitoring to confirm stability. We're also reviewing the provisioning for your other services to make sure they have adequate headroom.

What happened. Why. What you did. What you're doing to prevent it next time. No villain, no hero narrative. Just a clear account that treats the client as a competent adult who can handle the truth about their own systems. This pattern — acknowledge, explain impact, detail actions, outline prevention — shows up in every serious framework for incident communication. It's not a novel idea. It's just rarely practiced.

Clients who get honest incident reports develop confidence that when you say things are fine, things are actually fine. Clients who get spin learn to treat every communication as potentially unreliable — including good news. As ilert puts it in their MSP incident management guide: avoid vague terms like "working on it" — clients should always feel they're kept in the loop with meaningful updates. Even a "no change" update reassures clients the issue is being actively worked on. Substance over cheerfulness.

Trust erodes before anyone says anything

The biggest cost of this pattern is invisible... until it isn't. You might think you're protecting the client. Maybe it's a self-image thing. Maybe you want the organization to seem more capable than it is. But every inflated incident report quietly erodes trust capital. The client may not realize it at first — but eventually it compounds, and eventually it comes back.

People are better at detecting inauthenticity than we give them credit for. They might not know what you're hiding. But they know something doesn't add up. And they're filing it away, waiting for the pattern to confirm itself. Even Uber's infamous attempt to cover up a breach proved the point — concealment always costs more than disclosure.

I'd rather be direct with a client about a bad day than eloquent about a fictional one.

One caveat: this post is about the principle — being direct with clients about what happened and why. There's a separate, harder question about where honesty meets legal exposure. Incident reports are discoverable documents. There's a meaningful difference between "we had a configuration issue" and "we were grossly negligent in our provisioning" — both might be true, but they carry different legal weight. That tension — how to be honest without writing your opposing counsel's opening argument — deserves its own treatment, and I'll be writing about it soon.

Sources: Simon Sinek — "Honesty Is NOT a Value" (values are behaviors, not declarations) · Keeper Security / IT Brew — "Cyber Attacks Are Grossly Underreported" (41% of known incidents unreported internally; 43% cite fear of consequences) · eMazzanti — "Transparent Communication After a Security Breach" (acknowledge → explain impact → detail actions → outline prevention framework) · ilert — "Incident Management for MSPs Guide" (avoid vague "working on it" updates; substance over reassurance) · Blackfog — "Is Transparency Important Beyond Compliance After a Cyberattack?" (Uber cover-up case study; concealment costs vs. disclosure trust)

Further Reading: FireHydrant — "A Practical Guide to Incident Communication" (clear language, empathy, audience-tailored updates)

I Used WordPress for 20 Years and I Was Wrong

Alexander van Rossum — Mon, 30 Mar 2026 13:17:00 +0000

I started building websites on WordPress around 2005. I was also using Joomla at the same time — which, if you've ever used it, explains why WordPress won that particular contest quickly and decisively.

For twenty years, WordPress was the answer. Personal sites, corporate sites, everything in between. It had plugins for anything you could imagine, a theme for every aesthetic, and a community that could solve any problem you ran into. For a long time, it genuinely worked.

Until it didn't. And it wasn't all at once, not a dramatic failure. It was more like twenty years of paper cuts that finally bled out.

The Smart Fridge Problem

WordPress carries the weight of everything it can do, whether you need it or not.

The admin dashboard is a full application. Login system, user management, media library, plugin architecture, database abstraction, REST API, cron jobs — all of it running on every page load, for every visitor, whether your site needs any of it or not. For most sites — and I mean the vast majority — none of that matters. Your marketing site doesn't need a login system. Your blog doesn't need a database. Your landing page doesn't need a REST API.

But you're paying for all of it in server resources, attack surface, and complexity.

Page builders have a tendency to make it worse. Elementor, Divi, WPBakery — they ship every capability they offer to every page, regardless of what you actually use. Need a simple two-column layout? Here's 400KB of JavaScript that also handles parallax scrolling, animated counters, and particle effects. Just in case.

It's like buying a refrigerator with a built-in screen that tracks your grocery inventory and auto-orders milk when you're running low. If you don't use grocery delivery services — and most people don't — you just paid an extra $800 for a screen that collects fingerprints and shows a fancy animation when you get water. The fridge still keeps things cold; the cold part was never the problem.

And underneath all of this: PHP. In 2026, the entire ecosystem still runs on PHP. It works, the way a lot of things that are decades old still work. But the gap between what's possible now and what PHP was originally designed for gets wider every year.

The PageSpeed Insight

I'm obsessive about PageSpeed Insights scores. They're a proxy for the thing that actually matters — how your site feels to real people on real connections.

My best WordPress score — ever, across twenty years — was a 97. Desktop only.

Getting there required a minimal theme (Twenty Twenty-One), Gutenberg blocks instead of a page builder, three plugins total, hours of manual optimization, server-level OPCache configuration, and Cloudflare caching. One wrong plugin update could knock ten points off overnight, and it still had intermittent issues.

That was the ceiling. On a good day, with everything perfectly tuned (for hours), 97.

Here's a WordPress site I know well. It's hosted on WPEngine — premium managed hosting. It's had dozens of hours of professional optimization work. It runs a page builder that specifically markets performance as a feature.

Mobile: 41 / 86 / 77 / 85. Desktop: 55 / 84 / 77 / 92.

And here's my personal portfolio site (mipyip.com):

Mobile: 99 / 95 / 100 / 100. Desktop: 100 / 95 / 100 / 100.

No optimization heroics. No caching plugins. No CDN tricks. Those scores showed up on the first deploy and stayed there. The framework just... ships fast HTML.

Look at those numbers side by side, and you'll reach the same conclusion I did: This isn't a tuning problem.

Why Astro

The framework I ultimately switched to is Astro. Static output, zero JS by default, Markdown-native.

Three things made it the right fit:

Zero JavaScript by default. Astro ships no client-side JavaScript unless you explicitly add it. Most marketing sites don't need JavaScript at all — they're documents, not applications. Astro treats them that way.

Markdown as a first-class content format. Plain text that any parser can consume, any system can render, any tool can index. They load instantly. They version-control perfectly. They'll be readable in fifty years because they're just text.

Framework-agnostic. Astro doesn't force you into React, Vue, or any other JavaScript framework. You can use them if you want. You can also use none of them. For a marketing site that's mostly content, "none" is the right answer.

That last point is what delivers the PageSpeed scores. No framework overhead means no framework tax on every page load.

Markdown Is How I Already Think

I'd been writing in Markdown for years before I built this site. Outline, Notion, Obsidian — every notes tool I've used in the last decade speaks Markdown natively. My project documentation is Markdown. My random notes are Markdown. The styling shortcuts are second nature at this point — I see the content formatted when I see the symbols. I don't need a visual preview to know what a ## header or a **bold phrase** looks like rendered.

Markdown files are absurdly portable. Plain text that any parser can consume, any system can render, any tool can index. They load instantly. They version-control perfectly. They'll be readable in fifty years because they're just text.

The content format was never the problem. The tooling around it was the problem. Before AI, maintaining a site built from Markdown files meant manually writing templates, building components, managing routing, handling image optimization — the kind of tedious infrastructure work that made WordPress's "just install a plugin" approach genuinely appealing.

The AI Multiplier

I work with a Claude Code agent that has full context on this site's codebase — the architecture, the design standards, the content strategy, the voice. We work in the source files simultaneously.

This blog post is a good example of what that workflow looks like:

I had a seven-word idea: "blog post about why I love Astro." The agent created the notes file. Then it interviewed me — one question at a time, conversational, pulling out details I wouldn't have thought to include in an outline. It compiled the raw interview into structured notes. I reviewed, added context, corrected emphasis. It drafted. I edited. We polished the final version side by side in SideMark — a Markdown editor I built specifically for this kind of collaborative workflow.

The whole pipeline — from idea to draft with images — takes a fraction of what it used to. The bottleneck is my thinking speed, not my typing speed.

None of that workflow is possible with WordPress. You can't point an AI agent at a WordPress database and say "work with me on this post." But you can point it at a folder of Markdown files with a clear architecture document and watch it understand the entire system in seconds. Add a local semantic memory layer like pmem and the agent can recall decisions, patterns, and context from months of previous sessions — no re-explaining needed.

The combination — Markdown content, static site generator, AI-assisted development — turns "I should update my website" from a weekend project into a Tuesday morning. Some posts on this site went from idea to published — with images and scheduled LinkedIn posts — in under twenty minutes. And they're still my words, my thinking. The combination of Astro and AI just removed the friction between having something to say and saying it.

"But My Ecommerce Is on WordPress"

I can already hear it. "My store runs on WooCommerce. I can't separate my marketing site from my ecommerce."

You can, and you should.

Your marketing pages and your ecommerce platform have fundamentally different performance requirements. Marketing pages need to be fast — fast enough that Google ranks them, fast enough that visitors don't bounce, fast enough that your PageSpeed scores aren't embarrassing when a potential client checks. Ecommerce pages need to be functional — cart logic, payment processing, inventory management, user accounts.

Bundling them together means your marketing pages carry the weight of your ecommerce platform on every load. Your beautiful landing page is slower because it's sharing infrastructure with your checkout flow.

Keep your ecommerce where it works — WooCommerce, Shopify, whatever you've built. Put it on a subdirectory. Build your marketing site as a separate, blazing-fast static site, and surface product data, cart state, and key behaviors to the marketing side through JavaScript and session management. To the visitor, it looks seamless. Under the hood, each part is optimized for what it actually needs to do.

Separation of concerns — it's an engineering principle that applies to site architecture just as well as it applies to application code.

Twenty Years Is a Long Time to Be Wrong

WordPress powers a huge portion of the web and it does what it does. I'm not here to bury it (and I wouldn't want to if I could). But after twenty years of paper cuts, accumulated complexity, and a performance ceiling that required heroic effort to approach — I had to ask myself whether I was still using it because it was the right tool, or because it was the familiar one.

The willingness to evaluate your tools honestly — even the ones you've invested decades in — is the difference between building systems that serve you and serving systems you've already built.

I Built a Local RAG for Claude Code: Semantic Search Over Your Own Project

Alexander van Rossum — Thu, 26 Mar 2026 15:08:23 +0000

More than five hundred markdown files.

That's what one of my projects has, and it's not even the largest (that clocks in at almost 2,500!). ROADMAP.md, ARCHITECTURE.md, CLAUDE.md, CHANGELOG.md, task folders with notes and lessons learned, editorial notes, half-complete drafts, memory files from past sessions. Each one holds a piece of the project's history — a decision, a rationale, a thing that broke and how it got fixed.

Claude Code can't see any of it unless I point it at the right file — or it reads them on its own, burning tokens on retrieval before the real work starts.

Claude Code isn't completely amnesiac — it has session memory, it reads CLAUDE.md, and with the right governance documents it can recover a lot of context at session start. For smaller projects, that's enough. But once you're past a few dozen files of accumulated institutional knowledge, the gap between "what the agent can reasonably read at startup" and "what the project actually knows" grows wider every week.

So I built pmem — a local RAG that gives Claude Code semantic search over your project's full history. No external APIs. No data leaves your machine. Setup in two minutes.

The numbers

I ran the same query — "identify governance-related blog posts" — both ways on a project with 500+ markdown files:

	pmem (index-based)	Fresh search (Explore agent)
Results	18 posts	11 posts
Time	~20 seconds	~90 seconds
Token cost	~5,500	~20,000–24,000

The fresh search cost roughly 4× the tokens (cries in tokens) and found 7 fewer posts. The posts it missed were the ones where governance was a supporting theme rather than the headline — exactly the kind of semantic connection that keyword search can't make.

The agent's overhead — its own system prompt, tools, multi-step reasoning — is the hidden cost. It's worth it for open-ended exploration, but for a targeted retrieval question, the index was both cheaper and more thorough.

The prompt that built it

Before I show the architecture, I want to show two prompts — because the contrast illustrates something about working with AI agents that I think a lot of people miss.

The vague prompt:

"I want to give agents better memory."

This goes nowhere useful. No constraints, no architecture, no scope. The agent could build anything from a flat JSON file to a Kubernetes-deployed vector database with a React frontend. It would probably pick something in the middle and spend four hours building infrastructure you didn't need.

The prompt I actually used (simplified for readability):

I need to enhance the memory capabilities of Claude Code. Since I use Claude Code for more than just writing code — managing tasks, building documentation, maintaining infrastructure — I can generate thousands of files and folders. While they do get archived regularly, digging through them is a token and time sink, and can sometimes prove inaccurate, especially with larger projects.

We will use Ollama embeddings and build a RAG that the agent can use to query the entire project's files.

The tool must also be able to connect to a local LLM (optional) in order to further reduce token usage when parsing results.

For now, we are going to be focused on TXT and MD files, and will expand as needed.

The difference isn't length. It's that the second prompt contains a discovery phase. It names the problem, specifies the technology, defines the integration point, sets constraints, and draws an explicit scope boundary. The agent doesn't need a better prompt template. It needs you to finish thinking before you start asking.

What pmem does

The flow is simple: Claude asks a question, pmem finds the answer in your project's files, and returns it with source citations.

Under the surface:

Indexing. pmem index walks your project's markdown and text files, splits them into semantic chunks using header-aware parsing (a section stays with its heading), and embeds each chunk locally using nomic-embed-text via Ollama. Chunks are stored in ChromaDB, a file-based vector database that requires no server process. Indexing is incremental — SHA-256 hashes track which files changed.
Querying. Claude calls the memory_query MCP tool with a natural language question. pmem embeds the question, searches the vector store for semantically similar chunks using cosine similarity (ChromaDB's default), and returns results with source paths and relevance scores. Optionally, a local LLM synthesizes the chunks into a concise answer before returning it.
Session rituals. Three slash commands turn memory into a workflow: /welcome refreshes the index at session start. /sleep captures changes at session end. /reindex refreshes mid-session. The index stays current because maintaining it is a side effect of the session workflow, not a separate chore.

No data leaves your machine. No API keys required for core functionality. The entire system runs on Ollama, ChromaDB, and Python.

Architecture decisions

No LangChain. Not out of ideology — out of simplicity. pmem is around 2,000 lines of Python. The RAG pipeline is: embed → store → search → (optionally) synthesize. Four operations don't need a framework.

ChromaDB over everything else. File-based, no server process, persistent. I considered LanceDB but never formally evaluated it — ChromaDB was already working and the evaluation wasn't worth the detour. I also considered plain JSON with numpy cosine similarity, which works for small projects but doesn't scale — brute-force linear scan is O(n) per query. ChromaDB hit the sweet spot: real vector search without operational overhead.

Header-aware chunking. Most RAG tutorials split text by character count. That destroys semantic units. A section titled "Why we chose CloudFront over Fastly" that gets split between two chunks loses meaning in both. pmem uses markdown headers as natural split points, with a size-based fallback for sections that are too long. The heading becomes metadata on each chunk, so search results carry their context.

CWD walk-up for project detection. Same pattern git uses — walk up until you find a .memory directory. pmem init creates it, and from that point forward, any subdirectory just works.

Setup

Prerequisites: Python 3.11+, Ollama running locally, and the nomic-embed-text model pulled.

pip install pmem-project-memory
ollama pull nomic-embed-text

Initialize any project:

cd ~/your-project
pmem init
pmem index

Install the session skills:

pmem install-skills

Register the MCP server in ~/.claude.json (global) or .mcp.json (per-project). The README has the exact config block.

First index takes a few seconds for small projects, up to a minute for large ones. After that, incremental indexing only re-embeds changed files — typically under a second.

⭐ Star pmem on GitHub

What's next

Phase 2 is mostly complete: pmem watch for auto-reindexing, global config defaults, one-command skill installation, better error messages. Phase 3 is where it gets interesting — multi-collection support, non-markdown file support with language-aware chunking, optional image processing, and pmem diff to show how answers change over time.

The tool is open source, MIT licensed. It exists because I needed it, and I suspect anyone running Claude Code on a project with more than a few dozen files needs it too.

Sources: ChromaDB — Distance Functions · ANN Benchmarks (Aumüller, Bernhardsson & Faithfull)

The Green-Light Problem

Alexander van Rossum — Mon, 23 Mar 2026 12:34:00 +0000

In this post:
A green light with unresolved checkpoints isn't a recommendation. It's a liability. This post covers the anatomy of premature platform migration recommendations, the 'not a blocker' trap, and why stage-gated validation saves money.

A strategy document lands on a client's desk. It recommends a major platform migration. The tone is confident. The structure is logical. The recommendation is clear: move forward.

Buried on page three, in qualified language, are a handful of caveats. The primary integration hasn't been validated with the client's actual data. The connector vendor hasn't been vetted beyond a cursory website scan, and a core data system that powers the existing workflow isn't mentioned at all. The timeline assumes everything works on the first try.

The client's leadership reads the document and sees a green light. The technical team reads the same document and sees open questions.

The gap between those two readings is where six-figure mistakes live.

If you've ever been three months into a build when someone discovered the integration doesn't actually work, you've been on the receiving end of this gap.

The anatomy of a premature recommendation

It's not malicious. It's structural.

Someone does the real technical analysis. They flag risks, identify dependencies, note unresolved questions, and recommend a staged approach: validate the critical assumptions before committing to a full build. The analysis is honest about what's known and what isn't.

Then the analysis gets polished for client consumption, and the risks are softened. "This is an unvalidated dependency that could change the entire architecture" becomes "this will require thoughtful implementation." The staged approach gets flattened into a single "we recommend moving forward." The hard questions get cut because they might make the recommendation look uncertain.

The polished version isn't wrong, exactly. Everything in it is technically true. But it's selectively true in a way that systematically favors proceeding. The caveats might be present but soft; the confidence is high but unearned. And the client, who is paying for expert guidance on a decision they can't evaluate themselves, reads the document at face value.

That's The Green-Light Problem. Not a bad recommendation, but premature; a conclusion delivered before the evidence supports it.

The "not a blocker" trap

There's a specific phrase that shows up in these documents. It sounds reasonable and is often catastrophic:

"Not a blocker."

An integration with an unvetted vendor, connecting a legacy ERP system to a modern platform, handling thousands of customer-specific pricing matrices and complex approval workflows. The vendor's website has a case study. A sales engineer said it works. Nobody has tested it against the client's actual data, actual edge cases, or actual transaction volume.

"Not a blocker."

If that integration fails, the entire architecture changes. The timeline doubles, the budget triples, and the client is three months into a build when they discover that the foundation, around which the whole project was designed, doesn't hold weight.

Calling an unvalidated dependency "not a blocker" before vetting it is optimism bias dressed up as a technical assessment. It's the kind of language that makes strategy documents read well, and post-mortems read badly.

What "validated" actually means

There's a meaningful difference between "we believe this will work" and "we've proven this works." Strategy documents routinely conflate the two.

Validation is not:

The vendor says it works
We found a case study on their website
It works in a demo environment with sample data

Validation is:

We tested the specific integration with the client's actual data and edge cases in conditions that resemble the production environment
We documented what happened.

Validation costs money and takes time. It delays the exciting part of the project (the build) in favor of the boring part (the proof). And it is the single most valuable thing a technical advisor can recommend before a six-figure commitment.

The timeline cascade

A premature green light doesn't just risk a bad outcome. It creates a compounding timeline problem.

When the project starts with unvalidated assumptions, the team builds on those assumptions for weeks or months. When one of them turns out to be wrong (and they do, regularly, because that's what "unvalidated" means), the timeline doesn't shift by the time it takes to fix the problem. It shifts by the time it takes to fix the problem plus the time spent building on the assumption that turned out to be wrong plus the time spent unwinding the work that depended on it.

A two-week validation phase at the beginning can prevent a three-month correction in the middle. The math is simple: the two-week validation phase feels like a delay, and the three-month correction feels like bad luck.

It's not bad luck, but the entirely predictable consequence of skipping validation.

The document problem

A well-crafted strategy document can make an unvalidated recommendation look validated. The formatting is professional. The sections follow a logical structure. The language is measured and confident. If you didn't have the technical context to evaluate the claims, you'd read it and feel reassured.

The people making the platform decision often don't have the technical context. That's why they hired advisors. And when the advisory document systematically smooths over the rough edges, the client loses access to the information they need to make an informed decision.

This isn't about incompetence. It's about incentives. PMI research on optimism bias in project delivery shows that the dilution of risk reporting is one of the most common failure modes in status communication. The path of least resistance is always "we recommend proceeding." Clients want to hear yes. Teams want to move forward. Leadership wants progress. The person who adds a gate and says, "Wait, we haven't validated this yet," is often treated as an obstacle to progress rather than as someone protecting the investment.

The fix is boring, and it works

Stage-gated recommendations.

That's it.

"We believe this platform is a viable path for your requirements. Before committing to a full build, we recommend a validation phase. Here's what we'll test, here's what it costs, and here's the criteria we'll use to decide whether to proceed."

That's not hedging. That's risk management. And the client who hears "we want to prove this works before you spend six figures on it" will trust you more, not less, because you're clearly prioritizing their outcome over your timeline.

The most expensive sentence in any strategy document is "we recommend moving forward" — when the analysis it's based on isn't finished yet.

Sources:

Optimism Bias — The Decision Lab ·
Optimism Bias and Failure to Terminate Failing Projects — PMI ·
Phase-Gate Process — Smartsheet

Your AI Reviewer Already Agrees With Your AI Builder (And Role-Switching Won't Fix It)

Alexander van Rossum — Mon, 16 Mar 2026 12:00:00 +0000

A repo hit 11,000 stars in its first week by solving a real problem: Claude Code in one generic mode produces mediocre output.

Garry Tan's gstack formalizes "modes" for Claude Code — slash commands that switch the AI between named roles. To name a few:

A CEO lens for product decisions
A staff engineer for paranoid code review
A QA lead for testing
An engineering manager for retrospectives

The core insight is correct and worth calling out directly: forcing the AI into an explicit role with explicit constraints produces better output than letting it be a generalist.

The browse tool — a persistent Chromium binary that gives Claude Code eyes on a running app — is genuine engineering, not a prompt trick. The sequential workflow discipline (plan → engineering review → build → code review → ship) is better than what most people do with AI, which is nothing. This is a meaningful step up from ad-hoc prompting.

I also noticed a structural limitation within minutes of reading it, and it's the same one I've been building against for months.

All the hats, one head

Every mode in gstack runs inside the same context window.

The "paranoid staff engineer" reviewing your code is the same Claude instance that helped architect it. It already knows why every decision was made — which means it's primed to find those decisions reasonable.

This is a self-review wearing a different costume.

I don't mean that dismissively, because self-assessment checklists have real value — a pilot running preflight catches mistakes that muscle memory alone won't, and that's worth doing every time. But there's a categorical difference between a checklist and an independent review, and the distinction matters — considerably more than it sounds.

When the reviewer already has the builder's reasoning in context, it's not an evaluation of the output; it's pattern-matching against the justifications that produced it. The same mechanism that makes LLMs coherent — self-consistency — makes them structurally blind to their own errors when asked to self-review. You're not getting a second opinion; you're getting the first opinion wearing a different hat.

This is the same reason you don't ask the person who wrote a PR to also approve it. A different pair of eyes catches what the author is blind to — not because the author is bad, but because familiarity breeds pattern blindness. AI doesn't change this principle; if anything, it amplifies it — an LLM's self-consistency is more deterministic than a human's.

Parallelism is not independence

Gstack can also use Conductor to spin up ten parallel Claude Code sessions. That sounds like separation until you realize it's a performance optimization, not an epistemic one, and more workers in the same bath isn't the same as a clean pool.

Genuine review requires what I'll call epistemic separation: different priors, no access to the rationalization chain that produced the artifact, and independently accumulated judgment about what builders consistently miss. Without that separation, you get confirmation with extra steps.

Each of gstack's modes starts fresh every invocation — no accumulated lessons, no pattern library built from previous reviews. The "paranoid staff engineer" is equally paranoid about everything, every time. That's thorough but undirected. A reviewer who doesn't learn which mistakes this builder tends to make hasn't read the codebase's history.

For organizations where bugs have real consequences — compliance failures, donor trust violations, limited technical staff to recover from incidents — the difference between costume-change review and independent review is operational risk.

What genuine separation looks like

I built the answer to this problem months before gstack existed. I call it The Adversary.

It's a separate Claude Code project in its own repo with its own governance files, its own accumulated lessons-learned corpus, and zero shared context with the building agent. It receives a read-only symlink to the target codebase and produces a structured review report. It doesn't know what decisions were made or why. It sees output, not reasoning — which is exactly how real external review works.

I'd been building and reviewing this codebase for months — manual human review and agentic self-review, the whole time. The Adversary's first pass found 102 issues. Ten critical. Security vulnerabilities hiding in plain sight — not because the builder was bad, but because independent review catches what self-review structurally cannot.

The architecture makes it work, not the prompt:

Separate context. Different project, different memory, different governance documents. The builder's reasoning chain doesn't exist in The Adversary's world.

Different priors. The Adversary accumulates its own pattern library over time — "here's what builders consistently miss" — which makes it sharper with each review. A stateless skill file can't do this.

Structured handoff. Artifacts move through a defined channel (symlinks and reports), not a shared session. The reviewer can't be influenced by the builder's justifications because it never sees them. This is the same principle that keeps financial auditors separate from the accounting department.

An honest limitation

This can't be fully productized today. The architectural requirement — genuinely independent agents with separate memory, separate accumulated judgment, and separate lesson histories — requires human orchestration: someone who understands where the boundaries need to be and maintains them. The tooling will get there. The architecture won't design itself.

Anyone can fork a repo of markdown files. The judgment behind "here's where the boundaries need to be and why" is the part that requires experience to get right.

The methodology is the deliverable, not the CLI tool. And that distinction matters for understanding where gstack fits.

What your AI shouldn't know

Gstack represents where most people are in their thinking about AI-assisted development: "I need structured roles for different tasks." That's correct and necessary. The workflow discipline, the browser tooling, the explicit-gear metaphor — all genuinely valuable. The fact that it's open source and spreading is good for the ecosystem.

But the harder question isn't which hat to put on your AI, or "what persona should your AI wear?"

It's "what should the AI not know when it evaluates this work?"

Sources:
gstack — GitHub — Garry Tan's Claude Code skill files (MIT)
·
Large Language Models Hallucination: Comprehensive Survey — arXiv (self-consistency and self-review blind spots)
·
Google Engineering Practices — Code Review — Google

I Finally Have a Team - It Just Happens to be AI

Alexander van Rossum — Sat, 14 Mar 2026 19:25:21 +0000

Eight months ago, I was perpetually behind. On everything.

I don't mean "busy." Busy implies you're making progress on too many things at once. I was making insufficient progress on all of them. React components for a client project. AWS infrastructure governance for another. Kubernetes migrations with hard deadlines. Salesforce automations that needed attention three weeks ago. Each domain had its own language, its own context, its own state — and switching between them wasn't just a time cost. It was a cognitive tax that compounded with every transition.

By 3 PM most days, I wasn't making decisions anymore. I was recovering from the last context switch while dreading the next one.

The email that started it

The first thing I used AI for — really used it, not just experimented — was writing emails. GPT-3.5. I would brain-dump everything I needed to communicate into a chat window — unstructured, grammatically questionable, half-formed thoughts — and get back something I could send after one or two editing passes.

That sounds trivial.

It wasn't.

Email was consuming more cognitive bandwidth than I'd realized. Not the content — the composition. Translating technical context into stakeholder-appropriate language, structuring the message so the key points are at the top, and correcting tone where needed. Every email was a small act of translation, and I was writing dozens a day.

Offloading the composition freed up space I didn't know I was missing. Not a lot — but enough to notice that the constraint wasn't time: it was cognitive bandwidth.

From assistant to collaborator

With GPT-4, ChatGPT got better. I started using it for more than email — rapidly prototyping WordPress plugins, troubleshooting legacy code (especially the undocumented kind, which was most of it), and reasoning through architectural decisions where I needed a second opinion that wasn't going to judge me for asking a question I should probably already know the answer to.

The shift was gradual. The AI went from "a tool I use" to "a collaborator I consult," and the distinction matters. A tool does what you tell it; a collaborator helps you figure out what to tell it. The governance documents I'd started writing — almost accidentally, just experimenting to get consistent output — were turning The Collaborator into something more reliable.

Something that remembered how I think.

The migration that proved it

About six months ago, I had to undertake a significant solo infrastructure migration. Hundreds of containers across multiple environments with a hard deadline driven by external constraints that weren't negotiable.

The responsible estimate for this work — with a team of six experienced engineers — was six to nine months. I had three months.

And I was the team.

Were it not for ChatGPT, KiloCode, Cursor, and later Claude, I would not have been able to complete it. That is not a hyperbolic statement; it is not "it would have been harder." I would literally not have been able to complete the migration within the constraints I was given - while still juggling my "regular work." The project would have failed, or I would have.

Agentic AI enabled me to operate at a scale previously unavailable to a single person. Not because the AI wrote all the code — it didn't. But because it could hold the context of each subsystem, while I focused on the decisions that actually needed a human. The infrastructure state, the dependency graphs, the rollback procedures — the AI held that, so I could hold and refine the strategy.

The case study tells the technical story. The human story is simpler: I shipped it — on time, no less — and I didn't completely burn out doing it. Both of those outcomes were improbable without the tooling.

The Department

After the migration, I fully committed to Claude Code and started building what I now call The Department.

A site architect for this website — layouts, components, editorial, SEO
A sysadmin agent for infrastructure governance
A Project Manager that unifies my communication between Slack and Asana - and keeps me from missing things
An observability bot for monitoring
A content agent
A life-strategy agent
Several agents in charge of writing software, like Actions, Panoptisana, and the Markdown Editor I'm using to write and edit this post
And several more, besides

Each one has a defined role, governance documents, and institutional memory that persists across sessions.

The ability to context-switch without context-switching is the thing I didn't know I needed.

When I need to work on infrastructure, I open the sysadmin agent. It knows the current state of every system I manage and what we did in the last session. It knows the conventions, the constraints, and the things I've told it not to touch. I don't have to reconstruct any of that — I just pick up where I left off.

When I am working on my website, the site architect has the same depth in its domain (as well as LinkedIn). Different context, different conventions, different memory — but the same experience of walking into a room where someone already knows what's going on.

The mental relief is almost too great to put into words. The thing that was destroying me — carrying the state of many different domains in my head simultaneously, losing pieces of each every time I switched — is the thing the agents handle. My working memory is freed for the decisions that actually need my judgment — strategy, architecture, ideation.

Everything else, the agents hold.

The inversion

One of the most significant personal findings of this process is that the output scaled because the cognitive load dropped. Not the other way around.

The conventional model is that more output requires more effort, more tracking, and more stress. You scale by working harder or hiring more people. The cognitive load tracks linearly (or worse) with the output.

The Department inverts — or perhaps, subverts — that: more domains under management, more projects shipping, and perhaps more importantly, the ability to rapidly switch between them without losing momentum. And all of that occurs with less cognitive overhead — because the overhead has been offloaded to agents whose entire job is holding the context I used to carry in my head.

It's not just automation; I'm not replacing tasks I used to do manually — though I certainly do when it makes sense. It's amplification — extending what I can hold and act on simultaneously. The decisions, strategy, and judgment calls are still mine, but the state-tracking, the context-holding, the "where was I?" recovery — that's distributed across a team that doesn't forget, doesn't get tired, and doesn't need me to repeat myself.

Still behind

I'm still behind.

I don't think that will ever change. The scope of what I'm required to do always expands to fill (and slightly exceed) the capacity I have — that's a result of employment and a feature of ambition, not solely a bug in the tooling.

But the texture of "behind" has changed. Eight months ago, behind meant drowning; it meant context switching so fast that I couldn't maintain identity in any single domain. It meant 3 PM cognitive shutdowns and the creeping feeling that I was failing at everything simultaneously.

Now, behind means I have more projects than hours. The state of each one is held by an agent that's ready when I am. The cognitive tax of switching is close to zero. And when I stop for the day, nothing is lost — it's all documented, governed, and waiting for the next session.

And there are still domains that don't have a door to agentic work yet — the ones where the process is opaque, sequential, and offers no meaningful feedback. Try getting a 10DLC campaign approved through Twilio when a denial comes back as "didn't pass" with no further explanation. There's nothing to reason about, nothing to architect. Just guess, resubmit, wait, repeat. Those still run on spite... if I have time.

I'm still behind. But I'm not losing my sanity in the process.

Cognitive Offloading

Alexander van Rossum — Tue, 10 Mar 2026 13:46:37 +0000

I carried a notebook in my back pocket for years. These were ratty little things - usually held together with Gaffers tape. I called it the butt book, because that's where it lived. The idea was simple: whenever something worth remembering surfaced, I'd write it down before it disappeared.

It worked, for capture. The ideas made it onto paper. The crisis of "I just had a thought and now it's gone" happened less often. But the notebooks accumulated, and the ideas inside them became a graveyard. If I remembered to go back and find something — and that's a significant "if" — I still had to locate it, interpret my own handwriting, and reconstruct whatever context made the idea seem worth writing down in the first place.

The capture problem was solved. The retrieval problem never was.

Every system I tried solved the same half of the problem

Evernote. Obsidian. Apple Notes. Todoist. Each one promised a different organizational model — tags, backlinks, smart folders, natural-language reminders. Each one worked for about two weeks, which is roughly how long it takes for a structured environment to get out of whack when you have (undiagnosed!) ADHD and the system requires you to maintain it.

The pattern was always the same: set it up, use it enthusiastically, let it drift, watch the structure collapse under its own weight, abandon it for the next thing. Not because the tools were bad — because they all assumed I'd come back to them. Every system required me to initiate retrieval. To remember that I'd stored something, navigate to where I'd stored it, and find it among everything else I'd stored.

That's three cognitive tasks before you even get to the information you need. For someone whose working memory is the bottleneck, that's three chances to lose the thread.

Notion is the exception, but only because I use it exclusively for school and keep it aggressively structured. Tight scope, rigid templates, no room to drift. It works precisely because I don't let it become a general-purpose system.

So I built one

Before the current wave of AI tools, I built a thing called GetRamble. It has a phone number. I can text it at any time — in line at the grocery store, in the middle of a meeting, at 2am — and OpenAI's API would turn my stream of consciousness into categorized notes.

My kids would ask who "ramble" was because I said it so often: "Hey Siri, text ramble."

It worked. Really well, actually. I was still using it as recently as a few months ago. The capture problem and the categorization problem were both solved — text a rambling thought, get back structured, searchable notes.

But Ramble stalled.

I was building it with a combination of my own work and Replit. Replit couldn't stay sane — the same ungoverned-architecture problem I've since built an entire methodology around solving. Eventually, it became more work to wrangle the features than to get results, and I didn't have the bandwidth to rewrite it myself. Full-time job, school, wife, two kids. The 10DLC compliance burden alone — the regulatory framework for application-to-person messaging — was a part-time job for a one-person team.

I wanted to monetize it. But without capital and a testing cohort, I couldn't release it into the wild. The product was good. The architecture wasn't stable enough to trust — and at the time, I didn't have a word for what was missing. I just knew I couldn't ship something I'd have to maintain at 2 am when it broke in ways I couldn't predict.

Will I finish it? Probably not — I have better tools now. But the experience was formative. It's part of where my governance methodology comes from. I built something that worked, and watched it collapse not because the idea was wrong, but because the system around it couldn't hold.

What changed wasn't the tool — it was the architecture

Claude Code didn't solve the capture problem better than Ramble. It solved a different problem entirely: it made retrieval automatic.

Every previous system — analog or digital, simple or AI-powered — required me to go get the information, remember I'd stored something, navigate to it, and load it back into working memory. Claude Code's governance documents flipped that model. The agent reads its own state at the start of every session. I don't retrieve. The system loads.

That distinction is the whole thing.

The plan exists, it's maintained, it's comprehensive — but it never demands my attention. It's there when I need it and invisible when I don't. I can forget it exists and still follow it, because the system is holding the state, not me.

Three things make this work in practice:

Project-level state persistence. Each project maintains its own context through governance documents. I can revisit any project at any time and get an immediate snapshot — not by reading through files myself, but by asking the agent what's current. The project's memory survives the session boundary because it was designed to.

Rapid idea triage. When an idea surfaces now, I don't write it in a notebook and hope I'll find it later. I spin up a prototype — Excalidraw wireframe, governance templates, a solid directive — and within a single conversation, I know whether the idea has legs. If it does, it gets filed into my project management system with full context attached. If it doesn't, it gets archived cleanly. Either way, it's out of my head and into a system that can hold it without my participation. The cognitive cost of exploring an idea dropped from "a weekend" to "a conversation."

A personal project manager that doesn't require me to manage it. I run a lightweight environment that stores the state of everything I'm tracking — a set of JSON index files with descriptions pointing to full markdown files for detail. No RAG, no vector database. A poor man's index that works because the scope is deliberate and the governance is tight. It started as a scratchpad within another project and became a standalone system when the separation of concerns demanded it.

Choosing what stays in your mind

Cognitive offloading is the deliberate process of choosing what stays in your mind and building systems to handle the rest.

Not productivity hacking. Not "getting organized." Architecture — designed to match how your brain actually operates rather than how productivity systems assume it should.

The butt book was cognitive offloading. Ramble was cognitive offloading. But they were incomplete implementations — they solved capture without solving retrieval, so the offloaded information ended up in cold storage with no mechanism to bring it back when it mattered.

What I'm building now is the complete architecture: capture, categorization, persistence, and automatic retrieval. The information flows out of my head and into governed systems that carry it forward — not just storing it, but delivering it at the right time, in the right context, without requiring me to remember it exists.

The background anxiety lifts. Not because the work is less important, but because I'm no longer the one responsible for holding it all. The system holds it. I think about whatever is actually in front of me.

That's not a productivity gain. That's an architectural change in how I allocate cognitive resources — and it turns out it applies to AI agents the same way it applies to human brains, because the failure modes are structurally identical.

If your system requires you to remember to use it, it's not offloading anything. It's just adding a task.

If you've been building similar systems, I'd love to hear about it.

Cognitive Property: Who Owns the Way You Think?

Alexander van Rossum — Mon, 09 Mar 2026 13:23:14 +0000

AI tools picking up and repeating your habits isn't new. ChatGPT does it by design — it mirrors your tone, adapts to your preferences, and learns what you respond well to. The phenomenon has received copious amounts of screen time and discussion bandwidth.

But something specific happened recently that shifted the way I think about it.

One of my AI instances started using a ◡̈ I put at the end of casual notes, and picked up the → and ← characters I use for bullet points and emphasis in certain contexts. Formatting preferences and structural choices I never explicitly taught — they just started appearing.

Then another instance, working on a completely different project, picked up the same arrow convention independently. Same human, same patterns, different context.

The AI isn't just mirroring my preferences; it's learning to mirror my thinking. And once I noticed that, a harder question followed: if my reasoning patterns are being encoded into a transferable format — documented, structured, portable — then who owns them?

Your cognition is being encoded

If you work deeply with AI tools (and I mean deeply, not "summarize this email" or "write me a cover letter"), you're building something most people haven't named yet.

Repeatable cognitive patterns in plain text.

I don't mean prompt history or chat logs. I mean the governance documents you've created — either intentionally or through organic growth — to define how your AI agents operate. The CLAUDE.md / AGENT.md files that encode your engineering standards, your writing styles, your humor, your architectural preferences, and your coding philosophy. The decision-making frameworks that tell the AI how to prioritize, how to break down problems, and how to structure their thinking in a way that matches yours.

Over time, you've been documenting the way you reason. Not abstractly — specifically. In plain text. In a format that is entirely transferable.

Your operating system, as data

Take those governance documents and feed them to a fresh AI instance. What do you get?

A working version of how you solve problems.

Not a perfect copy, but a functional one. An instance that knows your architectural preferences, your communication style, your quality standards, and your decision-making heuristics. It won't be you, but it will be able to operate like you in ways that are measurably, verifiably close.

That's not a productivity feature. That's a cognitive fingerprint. And the fact that it exists in a format that can be copied, transferred, and scaled changes the conversation about who owns what.

This isn't a new IP question — except it is

The ownership of workplace knowledge has been debated as long as people have changed jobs. U.S. copyright law has a specific mechanism for it — the work-made-for-hire doctrine assigns authorship to the employer when works are created within the scope of employment. You learn skills at a company and take them with you when you leave. Nobody seriously argues that everything you learned becomes corporate property.

But this is different in a specific way: the cognitive pattern isn't just in your head anymore. It's documented. It's structured. It's portable. And it works without you.

Previous generations of knowledge workers left with expertise — hard to quantify, impossible to transfer directly. You leave with expertise AND a governance repo that can reproduce a meaningful chunk of your operations. That's never been possible before.

Cognitive property

People are treating AI personalization like it's a nice-to-have feature. A convenience. "My Claude knows how I like my code structured." Cool, time saver.

It's a lot more than a time saver: it's cognitive property. And right now, the ownership question hasn't even been asked.

If you're building this kind of depth on a corporate AI account, with corporate tools, on company time... the question of who owns those patterns matters a lot more than you think. And the answer, under most current employment agreements, is probably being decided by boilerplate that nobody wrote with cognitive property in mind.

The conversation that needs to happen now

This is a more urgent conversation than AGI governance, and I say that knowing how provocative it sounds. AGI governance matters, and it'll matter more as we get closer. But it's not happening today.

This is happening today. People are building repeatable cognitive patterns in transferable formats. They're externalizing their reasoning into documents that function without them. And most of them haven't thought about who gets to keep it.

That question needs to be asked before it becomes standard practice to assume companies own whatever cognitive patterns emerge from AI tools used on company time.

Legal and policy scholars are already raising these questions about generative AI and intellectual property. But most of that work focuses on model outputs, not on the cognitive patterns of the person doing the work.

The ownership conversation is overdue.

This is part of a four-post series, and the next post starts drawing the boundary.

Sources

Employment law & IP

Understanding the Work Made for Hire Doctrine — Venable LLP. Plain-English explainer of work-for-hire under the Copyright Act of 1976.

AI in the Modern Workplace: Ownership Challenges of AI-Generated Code — Bradley Arant Boult Cummings. Employee use of GenAI does not change that code written in the course of employment belongs to the employer.

AI, Copyright Law, and Work-Made-For-Hire — UCLA Livescu Initiative. Scholarly discussion of how work-for-hire breaks down for AI-generated material.

AI governance & cognitive data

Governance of Generative AI — Policy and Society (Oxford Academic). Survey of IP and data-governance gaps in generative AI, including the need for new ownership frameworks.

Beyond Neural Data: Cognitive Biometrics and Mental Privacy — Magee, Ienca & Farahany, Neuron (2024). Argues that cognitive and behavioral patterns function as uniquely identifying data, extending privacy concerns beyond neural signals.

I Manage AI Agents the Way I Manage Teams

Alexander van Rossum — Thu, 05 Mar 2026 21:08:28 +0000

I run multiple AI agents across several projects. A site architect for my website. A content agent for editorial work. A sysadmin agent for infrastructure. An observability bot for monitoring. Each one has a defined role, documented standards, and clear boundaries.

At some point — I couldn't tell you exactly when — I stopped thinking about this as "using AI tools" and started thinking about it as managing a team. Not in the Silicon Valley "AI teammate" marketing sense. In the actual management sense: the same principles I'd apply to a group of human engineers producing real work under real constraints.

The more I leaned into that framing, the more the system improved. Because it turns out the management disciplines that make human teams effective aren't abstractions. They're operational patterns that apply to AI agents without (much) modification.

Separation of concerns is just a job description

Each agent has a job, and it does that job. The site architect handles the website — layouts, components, performance, SEO, and editorial. I originally separated content into its own agent, but the editorial voice needed enough architectural context that splitting them created more coordination overhead than it saved — so I consolidated. That's the methodology working as designed: the right boundary isn't always more boundaries. The sysadmin agent handles infrastructure — AARs, topology documentation, environment configs. My Project Management Agent manages tasks and responsibilities in Asana.

They don't freelance into each other's domains.

This sounds obvious, but the default approach most people take with AI is the opposite: one chat, one agent, everything. Code review and creative writing and data analysis and debugging, all in the same conversation. It works the way having one employee handle engineering, marketing, and customer support "works." You get output. But you get inconsistent output, because the agent's context is split across too many domains to maintain depth in any of them.

Separation of concerns for AI agents is the same principle as separation of concerns for human teams. Defined roles reduce cognitive load, prevent context pollution, and produce better work — because the agent's entire context window is focused on the domain it's responsible for, not half-occupied by the residue of a different conversation about a different problem.

The loose catch-all still exists. For me, it's the core Claude chat interface — the equivalent of walking over to someone's desk for a quick question that doesn't belong in anyone's formal workflow. Not everything needs a scoped agent. But the work that matters does.

Clear guidelines are just an employee handbook

Every agent has governance documents. CLAUDE.md, ARCHITECTURE.md, ROADMAP.md — the governance layer that defines standards, patterns, boundaries, and institutional memory.

This is onboarding. You wouldn't hire a developer and say "just go build." You'd hand them the style guide, the architecture overview, the deployment process, the list of things not to touch. You'd give them context before expecting output.

AI agents need the same thing — except they need it more, because they can't compensate for missing context the way humans can. A human developer who doesn't know the naming convention will ask a colleague, read the existing code, or make a reasonable guess based on experience. An AI agent without documented conventions will make a different reasonable guess every session. Monday it's camelCase. Tuesday it's snake_case. Wednesday it's whatever it inferred from the three files it happened to read first.

The governance documents aren't overhead. They're the mechanism that produces consistency — the employee handbook that every agent reads at the start of every session, ensuring that today's work is compatible with yesterday's.

Focus and respect are just professionalism

This might surprise people, but it matters: I interact with my agents the way I'd interact with professional colleagues. Focused. Respectful of their time (which in this case means their context window). No off-topic tangents unless the situation genuinely warrants it.

This isn't sentiment. It's practical. Every message in a context window consumes tokens. Off-topic chatter, excessive small talk, or rambling prompts pollute the context with irrelevant information. For a human colleague, that's an interruption that costs focus. For an AI agent, it's worse — it's permanent context noise that degrades every subsequent response in the session.

Respecting the agent's context window is the same principle as respecting an employee's cognitive bandwidth. You wouldn't ask your database architect to weigh in on your marketing copy. You wouldn't CC everyone on every email. The same instinct applies: keep the interaction focused on the domain, and the output stays focused on the domain.

When to hire: the spinoff pattern

The management parallel that convinced me this wasn't just a useful metaphor — but an operational truth — was the first time I had to restructure.

My sysadmin Claude Code instance manages infrastructure context: AARs, topology documentation, and environment configs. Straightforward scope. At some point, I had it build a small Telegram notification bot as a utility — a quick way to monitor the overall health of the systems I am responsible for.

The notification bot worked. Then it proved useful enough that I started expanding it. More alert types, better formatting, scheduling logic, error handling. Before I knew it, the "small utility" had grown into a legitimate standalone project sitting inside an agent whose job description was completely different.

The signal was the same one any team lead recognizes: the context required to do the work well had grown beyond what a single entity could reasonably hold. The agent's CLAUDE.md was bloated with two domains' worth of conventions. Half the context window was consumed by scope that wasn't relevant to whichever task was actually in front of it.

So I did what I'd do with a human team member whose role had quietly split into two distinct jobs: I restructured. New repository. New governance documents. New architecture spec. New agent. The observability bot got its own development track, its own context, its own focused governance. The sysadmin agent went back to doing what it was actually scoped for.

The alternative — and this is the part that maps directly to organizational dysfunction — is letting scope accumulate until the agent is doing five things adequately instead of one thing well. That's not an AI problem. That's a management problem, and every team lead has seen it happen with humans. The person who's in every meeting, owns every escalation, and somehow has three job titles on their email signature. The fix is the same in both cases: increase HR. It's not a performance problem; it's an organizational design problem.

Good management doesn't depend on the medium

The argument I keep coming back to is simple: good management is good management. The medium changes — from a human team to an AI team — but the principles don't.

Defined roles prevent confusion. Documentation prevents context loss. Focus prevents scope creep. Restructuring when the scope outgrows the role prevents degradation.

These aren't AI-specific insights; they're management fundamentals that happen to apply perfectly to AI agents — because the failure modes are structurally identical. An overloaded AI agent degrades the same way an overloaded employee degrades. Not through incompetence, but through insufficient structure around the work.

The people getting inconsistent results from AI aren't writing bad prompts. They're practicing bad management. And the fix isn't a better prompt template or a more capable model. It's the same fix it's always been: clear roles, documented expectations, and the discipline to restructure when the scope outgrows the container.

I Built a Markdown Editor in a Weekend Because Every Other One Annoyed Me

Alexander van Rossum — Mon, 02 Mar 2026 20:51:03 +0000

I didn't plan to build a markdown editor this weekend. I was working on something else, and somewhere in the middle of it I opened my markdown editor to take notes and my annoyance with every markdown editor I've tried finally reached a head.

Not annoyed in the "this is broken" sense. Annoyed in the "why does this app need a cloud account and fourteen features I'll never use" sense. Every alternative I'd tried had the same problem in different packaging — too expensive, too bloated, or too clever.

So I opened Claude Code and started building one.

What I built

Three panes. File browser on the left, editor in the middle, live preview on the right. Tabs for multiple open files. Session restore — close the app, reopen it, everything's still there. Dark mode. Search and replace. A formatting toolbar for the things I always forget the syntax for.

That's it. No cloud sync, no collaboration, no plugin architecture; just markdown files on my computer, edited in a clean interface.

The stack: Electron 33, React 18, CodeMirror 6, marked for GFM rendering, Vite 6 for the build, electron-builder for packaging.

Why the first version was usable in under an hour

Not because AI is magic. Because the AI had context.

I use a governance-first development workflow: before any code gets written, the AI agent has access to persistent architecture documents, wireframes, a personal coding style guide, personal design guidelines, and detailed specifications. These files survive across sessions and context window compaction. The prompt describes what to build. The governance documents describe how to build it, and to what standard.

That's the difference between "a thing that kind of works" and "a thing I'm actually using that same day."

Not perfect on the first pass. But functional enough that I was taking notes in it within the first hour. Then I started tweaking.

(And I wrote this post using it)

The interesting technical bits

Bidirectional scroll sync — The naive approach (percentage-based) breaks immediately when the editor and preview have different content heights. I built section-based anchor mapping instead: the editor and preview each maintain a map of heading positions, and scrolling either pane updates the corresponding pane by interpolating between anchor points. Both directions stay aligned regardless of content length differences.

Smart formatting toolbar — Each button doesn't just apply formatting — it first checks whether the cursor is already inside that formatting and toggles it off. Heading buttons cycle through H1→H2→H3→paragraph. List buttons handle multi-line selections and continue numbering from preceding items. Small details that make the toolbar feel considered rather than tacked on.

External change detection — This was a requirement of the original spec — and when it surfaced later during code review, it surprised me. Edit a file in another app while it's open in the editor, and you get a full diff view showing exactly what changed. Options: keep your version, accept the external changes, or save as a new file. No silent overwrites. I'd completely forgotten I'd even added it until I triggered it accidentally and thought oh, that's actually amazing.

Session restore — Open tabs, active tab, folder path, scroll positions, and window size/position all persist across app restarts. Multi-window support (Cmd+Shift+N), each window preserves its own state. Close the app, open it tomorrow — everything's exactly where you left it.

The security wake-up call

After the features were working, I ran an adversarial code review on the codebase — a separate Claude Code instance with its own repo, its own governance documents, read-only access to the target code, and zero shared context with the building agent. Its only job is to find everything wrong.

What it found was embarrassing in the best way.

The worst finding wasn't missing security — it was the ceremony of security without the substance. contextBridge, contextIsolation: true, proper cleanup functions — all present, all technically correct, and all masking a straight pipeline from a malicious .md file to arbitrary filesystem access. The sandbox: false with a wrong justification comment was the cherry on top.

It's exactly the kind of thing that survives review after review because it sounds right, and nobody actually traces the dependency to verify it.

Specific fixes:

XSS prevention — DOMPurify sanitizes all markdown before rendering in the preview pane
Sandbox enabled — Chromium sandbox and context isolation enforced on all windows
Filesystem access control — path validation limits access to home directory and /Volumes; sensitive directories (.ssh, .gnupg, .aws) blocked
Path traversal protection — local-resource:// protocol restricted to image file extensions
Content Security Policy — tightened CSP on settings and update dialogs
URL scheme allowlisting — shell.openExternal limited to https://, http://, mailto:

Twenty-plus security fixes across eleven versions. This is the part that concerns me about the current wave of AI-generated code shipping without independent review — technically functional apps with exploitable security models, because the same agent that writes the code is also the only one evaluating the code.

Thirty-one versions in two days

v0.1.0 to v0.1.31 in a weekend, and not because I was rushing — because the governance-first pattern means each feature lands cleanly, gets tested, gets committed, and the next one starts from solid ground.

The app is signed and notarized with Apple, auto-updates from GitHub Releases, handles file associations (shows up in Finder's "Open With" menu for .md, .markdown, .mdx, .txt files), and restores all windows with their tabs and folder paths on relaunch.

What it deliberately doesn't do

No cloud sync. No collaboration. No Vim mode. No WYSIWYG. No plugin system. No account creation. No subscription. No telemetry.

Every markdown editor eventually tries to become a knowledge management platform. This one won't. The filesystem is the organizational layer. Git is the version control. Markdown is the format — portable, readable, owned by you. The editor just makes working with those files fast and pleasant.

Your files are plain markdown on disk. Open them with anything, anywhere, forever.

Try it

It's still beta - v0.1.x - but it's a functional beta. Are there problems? Probably, and I'll find them while dogfooding. But it's satisfying my use case, and that's good enough for now.

The code is on GitHub — MIT licensed, macOS only. Grab the .dmg from Releases. Signed and notarized, no Gatekeeper warnings. macOS 12+ required, Apple Silicon supported.

If you live in markdown and every editor you've tried wants to be something it shouldn't be — this one doesn't.