<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Alexander van Rossum</title>
    <description>The latest articles on DEV Community by Alexander van Rossum (@avanrossum).</description>
    <link>https://dev.to/avanrossum</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3794776%2F47898777-b8cf-4ccf-9c55-e5ecffb69f37.jpeg</url>
      <title>DEV Community: Alexander van Rossum</title>
      <link>https://dev.to/avanrossum</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/avanrossum"/>
    <language>en</language>
    <item>
      <title>I Used WordPress for 20 Years and I Was Wrong</title>
      <dc:creator>Alexander van Rossum</dc:creator>
      <pubDate>Mon, 30 Mar 2026 13:17:00 +0000</pubDate>
      <link>https://dev.to/avanrossum/i-used-wordpress-for-20-years-and-i-was-wrong-53hh</link>
      <guid>https://dev.to/avanrossum/i-used-wordpress-for-20-years-and-i-was-wrong-53hh</guid>
      <description>&lt;p&gt;I started building websites on WordPress around 2005. I was also using Joomla at the same time — which, if you've ever used it, explains why WordPress won that particular contest quickly and decisively.&lt;/p&gt;

&lt;p&gt;For twenty years, WordPress was the answer. Personal sites, corporate sites, everything in between. It had plugins for anything you could imagine, a theme for every aesthetic, and a community that could solve any problem you ran into. For a long time, it genuinely worked.&lt;/p&gt;

&lt;p&gt;Until it didn't. And it wasn't all at once, not a dramatic failure. It was more like twenty years of paper cuts that finally bled out.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Smart Fridge Problem
&lt;/h2&gt;

&lt;p&gt;WordPress carries the weight of everything it &lt;em&gt;can&lt;/em&gt; do, whether you need it or not.&lt;/p&gt;

&lt;p&gt;The admin dashboard is a full application. Login system, user management, media library, plugin architecture, database abstraction, REST API, cron jobs — all of it running on every page load, for every visitor, whether your site needs any of it or not. For most sites — and I mean the vast majority — none of that matters. Your marketing site doesn't need a login system. Your blog doesn't need a database. Your landing page doesn't need a REST API.&lt;/p&gt;

&lt;p&gt;But you're paying for all of it in server resources, attack surface, and complexity.&lt;/p&gt;

&lt;p&gt;Page builders have a tendency to make it worse. Elementor, Divi, WPBakery — they ship every capability they offer to every page, regardless of what you actually use. Need a simple two-column layout? Here's 400KB of JavaScript that also handles parallax scrolling, animated counters, and particle effects. Just in case.&lt;/p&gt;

&lt;p&gt;It's like buying a refrigerator with a built-in screen that tracks your grocery inventory and auto-orders milk when you're running low. If you don't use grocery delivery services — and most people don't — you just paid an extra $800 for a screen that collects fingerprints and shows a fancy animation when you get water. The fridge still keeps things cold; the cold part was never the problem.&lt;/p&gt;

&lt;p&gt;And underneath all of this: PHP. In 2026, the entire ecosystem still runs on PHP. It works, the way a lot of things that are decades old still work. But the gap between what's possible now and what PHP was originally designed for gets wider every year.&lt;/p&gt;

&lt;h2&gt;
  
  
  The PageSpeed Insight
&lt;/h2&gt;

&lt;p&gt;I'm obsessive about PageSpeed Insights scores. They're a proxy for the thing that actually matters — how your site feels to real people on real connections.&lt;/p&gt;

&lt;p&gt;My best WordPress score — ever, across twenty years — was a 97. Desktop only.&lt;/p&gt;

&lt;p&gt;Getting there required a minimal theme (Twenty Twenty-One), Gutenberg blocks instead of a page builder, three plugins total, hours of manual optimization, server-level OPCache configuration, and Cloudflare caching. One wrong plugin update could knock ten points off overnight, and it &lt;em&gt;still&lt;/em&gt; had intermittent issues.&lt;/p&gt;

&lt;p&gt;That was the ceiling. On a good day, with everything perfectly tuned (for hours), 97.&lt;/p&gt;

&lt;p&gt;Here's a WordPress site I know well. It's hosted on WPEngine — premium managed hosting. It's had dozens of hours of professional optimization work. It runs a page builder that specifically markets performance as a feature.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mobile: 41 / 86 / 77 / 85. Desktop: 55 / 84 / 77 / 92.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0rtao8inr3ibcqo9ce6q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0rtao8inr3ibcqo9ce6q.png" alt="WordPress PageSpeed Insights — Mobile: 41 Performance" width="800" height="968"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fob01n4g6mh7yvzrr40g4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fob01n4g6mh7yvzrr40g4.png" alt="WordPress PageSpeed Insights — Desktop: 55 Performance" width="800" height="968"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And here's my personal portfolio site (&lt;a href="https://mipyip.com" rel="noopener noreferrer"&gt;mipyip.com&lt;/a&gt;):&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mobile: 99 / 95 / 100 / 100. Desktop: 100 / 95 / 100 / 100.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyhpwzap67wkozw2xxbr1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyhpwzap67wkozw2xxbr1.png" alt="Astro PageSpeed Insights — Mobile: 99 Performance" width="800" height="968"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn5b4iqk5nsofasm1pv2a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn5b4iqk5nsofasm1pv2a.png" alt="Astro PageSpeed Insights — Desktop: 100 Performance" width="800" height="968"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;No optimization heroics. No caching plugins. No CDN tricks. Those scores showed up on the first deploy and stayed there. The framework just... ships fast HTML.&lt;/p&gt;

&lt;p&gt;Look at those numbers side by side, and you'll reach the same conclusion I did: This isn't a tuning problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Astro
&lt;/h2&gt;

&lt;p&gt;The framework I ultimately switched to is &lt;a href="https://astro.build/" rel="noopener noreferrer"&gt;Astro&lt;/a&gt;. Static output, zero JS by default, Markdown-native.&lt;/p&gt;

&lt;p&gt;Three things made it the right fit:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Zero JavaScript by default.&lt;/strong&gt; Astro ships no client-side JavaScript unless you explicitly add it. Most marketing sites don't need JavaScript at all — they're documents, not applications. Astro treats them that way.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Markdown as a first-class content format.&lt;/strong&gt; Plain text that any parser can consume, any system can render, any tool can index. They load instantly. They version-control perfectly. They'll be readable in fifty years because they're just text.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Framework-agnostic.&lt;/strong&gt; Astro doesn't force you into React, Vue, or any other JavaScript framework. You can use them if you want. You can also use none of them. For a marketing site that's mostly content, "none" is the right answer.&lt;/p&gt;

&lt;p&gt;That last point is what delivers the PageSpeed scores. No framework overhead means no framework tax on every page load.&lt;/p&gt;

&lt;h2&gt;
  
  
  Markdown Is How I Already Think
&lt;/h2&gt;

&lt;p&gt;I'd been writing in Markdown for years before I built this site. Outline, Notion, Obsidian — every notes tool I've used in the last decade speaks Markdown natively. My project documentation is Markdown. My random notes are Markdown. The styling shortcuts are second nature at this point — I see the content formatted when I see the symbols. I don't need a visual preview to know what a &lt;code&gt;##&lt;/code&gt; header or a &lt;code&gt;**bold phrase**&lt;/code&gt; looks like rendered.&lt;/p&gt;

&lt;p&gt;Markdown files are absurdly portable. Plain text that any parser can consume, any system can render, any tool can index. They load instantly. They version-control perfectly. They'll be readable in fifty years because they're just text.&lt;/p&gt;

&lt;p&gt;The content format was never the problem. The &lt;em&gt;tooling around it&lt;/em&gt; was the problem. Before AI, maintaining a site built from Markdown files meant manually writing templates, building components, managing routing, handling image optimization — the kind of tedious infrastructure work that made WordPress's "just install a plugin" approach genuinely appealing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The AI Multiplier
&lt;/h2&gt;

&lt;p&gt;I work with a Claude Code agent that has full context on this site's codebase — the architecture, the design standards, the content strategy, the voice. We work in the source files simultaneously.&lt;/p&gt;

&lt;p&gt;This blog post is a good example of what that workflow looks like:&lt;/p&gt;

&lt;p&gt;I had a seven-word idea: "blog post about why I love Astro." The agent created the notes file. Then it interviewed me — one question at a time, conversational, pulling out details I wouldn't have thought to include in an outline. It compiled the raw interview into structured notes. I reviewed, added context, corrected emphasis. It drafted. I edited. We polished the final version side by side in &lt;a href="https://github.com/avanrossum/sidemark" rel="noopener noreferrer"&gt;SideMark&lt;/a&gt; — a Markdown editor I built specifically for this kind of collaborative workflow.&lt;/p&gt;

&lt;p&gt;The whole pipeline — from idea to draft with images — takes a fraction of what it used to. The bottleneck is my thinking speed, not my typing speed.&lt;/p&gt;

&lt;p&gt;None of that workflow is possible with WordPress. You can't point an AI agent at a WordPress database and say "work with me on this post." But you &lt;em&gt;can&lt;/em&gt; point it at a folder of Markdown files with a clear architecture document and watch it understand the entire system in seconds. Add a local semantic memory layer like &lt;a href="https://github.com/avanrossum/pmem-project-memory-tool-for-claude-code" rel="noopener noreferrer"&gt;pmem&lt;/a&gt; and the agent can recall decisions, patterns, and context from months of previous sessions — no re-explaining needed.&lt;/p&gt;

&lt;p&gt;The combination — Markdown content, static site generator, AI-assisted development — turns "I should update my website" from a weekend project into a Tuesday morning. Some posts on this site went from idea to published — with images &lt;em&gt;and&lt;/em&gt; scheduled LinkedIn posts — in under twenty minutes. And they're still &lt;em&gt;my&lt;/em&gt; words, &lt;em&gt;my&lt;/em&gt; thinking. The combination of Astro and AI just removed the friction between having something to say and saying it.&lt;/p&gt;

&lt;h2&gt;
  
  
  "But My Ecommerce Is on WordPress"
&lt;/h2&gt;

&lt;p&gt;I can already hear it. "My store runs on WooCommerce. I can't separate my marketing site from my ecommerce."&lt;/p&gt;

&lt;p&gt;You can, and you should.&lt;/p&gt;

&lt;p&gt;Your marketing pages and your ecommerce platform have fundamentally different performance requirements. Marketing pages need to be fast — fast enough that Google ranks them, fast enough that visitors don't bounce, fast enough that your PageSpeed scores aren't embarrassing when a potential client checks. Ecommerce pages need to be functional — cart logic, payment processing, inventory management, user accounts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bundling them together means your marketing pages carry the weight of your ecommerce platform on every load.&lt;/strong&gt; Your beautiful landing page is &lt;em&gt;slower because it's sharing infrastructure with your checkout flow.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Keep your ecommerce where it works — WooCommerce, Shopify, whatever you've built. Put it on a subdirectory. Build your marketing site as a separate, blazing-fast static site, and surface product data, cart state, and key behaviors to the marketing side through JavaScript and session management. To the visitor, it looks seamless. Under the hood, each part is optimized for what it actually needs to do.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dev.to/avanrossum/i-manage-ai-agents-the-way-i-manage-teams-1hgm"&gt;Separation of concerns&lt;/a&gt; — it's an engineering principle that applies to site architecture just as well as it applies to application code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Twenty Years Is a Long Time to Be Wrong
&lt;/h2&gt;

&lt;p&gt;WordPress powers a huge portion of the web and it does what it does. I'm not here to bury it (and I wouldn't want to if I could). But after twenty years of paper cuts, accumulated complexity, and a performance ceiling that required heroic effort to approach — I had to ask myself whether I was still using it because it was the right tool, or because it was the familiar one.&lt;/p&gt;

&lt;p&gt;The willingness to evaluate your tools honestly — even the ones you've invested decades in — is the difference between building systems that serve you and serving systems you've already built.&lt;/p&gt;

</description>
      <category>wordpress</category>
      <category>webdev</category>
      <category>astro</category>
      <category>performance</category>
    </item>
    <item>
      <title>I Built a Local RAG for Claude Code: Semantic Search Over Your Own Project</title>
      <dc:creator>Alexander van Rossum</dc:creator>
      <pubDate>Thu, 26 Mar 2026 15:08:23 +0000</pubDate>
      <link>https://dev.to/avanrossum/i-built-a-local-rag-for-claude-code-semantic-search-over-your-own-project-4gle</link>
      <guid>https://dev.to/avanrossum/i-built-a-local-rag-for-claude-code-semantic-search-over-your-own-project-4gle</guid>
      <description>&lt;p&gt;More than five hundred markdown files.&lt;/p&gt;

&lt;p&gt;That's what one of my projects has, and it's not even the largest (that clocks in at almost 2,500!). ROADMAP.md, ARCHITECTURE.md, CLAUDE.md, CHANGELOG.md, task folders with notes and lessons learned, editorial notes, half-complete drafts, memory files from past sessions. Each one holds a piece of the project's history — a decision, a rationale, a thing that broke and how it got fixed.&lt;/p&gt;

&lt;p&gt;Claude Code can't see any of it unless I point it at the right file — or it reads them on its own, burning tokens on retrieval before the real work starts.&lt;/p&gt;

&lt;p&gt;Claude Code isn't completely amnesiac — it has session memory, it reads CLAUDE.md, and with the right governance documents it can recover a lot of context at session start. For smaller projects, that's enough. But once you're past a few dozen files of accumulated institutional knowledge, the gap between "what the agent can reasonably read at startup" and "what the project actually knows" grows wider every week.&lt;/p&gt;

&lt;p&gt;So I built &lt;a href="https://github.com/avanrossum/pmem-project-memory-tool-for-claude" rel="noopener noreferrer"&gt;pmem&lt;/a&gt; — a local RAG that gives Claude Code semantic search over your project's full history. No external APIs. No data leaves your machine. Setup in two minutes.&lt;/p&gt;

&lt;h2&gt;
  
  
  The numbers
&lt;/h2&gt;

&lt;p&gt;I ran the same query — "identify governance-related blog posts" — both ways on a project with 500+ markdown files:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;pmem (index-based)&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Fresh search (Explore agent)&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Results&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;18 posts&lt;/td&gt;
&lt;td&gt;11 posts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Time&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~20 seconds&lt;/td&gt;
&lt;td&gt;~90 seconds&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Token cost&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~5,500&lt;/td&gt;
&lt;td&gt;~20,000–24,000&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The fresh search cost roughly 4× the tokens (cries in tokens) and found 7 fewer posts. The posts it missed were the ones where governance was a supporting theme rather than the headline — exactly the kind of semantic connection that keyword search can't make.&lt;/p&gt;

&lt;p&gt;The agent's overhead — its own system prompt, tools, multi-step reasoning — is the hidden cost. It's worth it for open-ended exploration, but for a targeted retrieval question, the index was both cheaper and more thorough.&lt;/p&gt;

&lt;h2&gt;
  
  
  The prompt that built it
&lt;/h2&gt;

&lt;p&gt;Before I show the architecture, I want to show two prompts — because the contrast illustrates something about working with AI agents that I think a lot of people miss.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The vague prompt:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"I want to give agents better memory."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This goes nowhere useful. No constraints, no architecture, no scope. The agent could build anything from a flat JSON file to a Kubernetes-deployed vector database with a React frontend. It would probably pick something in the middle and spend four hours building infrastructure you didn't need.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The prompt I actually used&lt;/strong&gt; (simplified for readability):&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I need to enhance the memory capabilities of Claude Code. Since I use Claude Code for more than just writing code — managing tasks, building documentation, maintaining infrastructure — I can generate thousands of files and folders. While they do get archived regularly, digging through them is a token and time sink, and can sometimes prove inaccurate, especially with larger projects.&lt;/p&gt;

&lt;p&gt;We will use Ollama embeddings and build a RAG that the agent can use to query the entire project's files.&lt;/p&gt;

&lt;p&gt;The tool must also be able to connect to a local LLM (optional) in order to further reduce token usage when parsing results.&lt;/p&gt;

&lt;p&gt;For now, we are going to be focused on TXT and MD files, and will expand as needed.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The difference isn't length. It's that the second prompt contains a discovery phase. It names the problem, specifies the technology, defines the integration point, sets constraints, and draws an explicit scope boundary. The agent doesn't need a better prompt template. It needs you to finish thinking before you start asking.&lt;/p&gt;

&lt;h2&gt;
  
  
  What pmem does
&lt;/h2&gt;

&lt;p&gt;The flow is simple: Claude asks a question, pmem finds the answer in your project's files, and returns it with source citations.&lt;/p&gt;

&lt;p&gt;Under the surface:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Indexing.&lt;/strong&gt; &lt;code&gt;pmem index&lt;/code&gt; walks your project's markdown and text files, splits them into semantic chunks using header-aware parsing (a section stays with its heading), and embeds each chunk locally using &lt;code&gt;nomic-embed-text&lt;/code&gt; via Ollama. Chunks are stored in ChromaDB, a file-based vector database that requires no server process. Indexing is incremental — SHA-256 hashes track which files changed.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Querying.&lt;/strong&gt; Claude calls the &lt;code&gt;memory_query&lt;/code&gt; MCP tool with a natural language question. pmem embeds the question, searches the vector store for semantically similar chunks using &lt;a href="https://docs.trychroma.com/docs/collections/configure#distance-function" rel="noopener noreferrer"&gt;cosine similarity&lt;/a&gt; (ChromaDB's default), and returns results with source paths and relevance scores. Optionally, a local LLM synthesizes the chunks into a concise answer before returning it.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Session rituals.&lt;/strong&gt; Three slash commands turn memory into a workflow: &lt;code&gt;/welcome&lt;/code&gt; refreshes the index at session start. &lt;code&gt;/sleep&lt;/code&gt; captures changes at session end. &lt;code&gt;/reindex&lt;/code&gt; refreshes mid-session. The index stays current because maintaining it is a side effect of the session workflow, not a separate chore.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;No data leaves your machine. No API keys required for core functionality. The entire system runs on Ollama, ChromaDB, and Python.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture decisions
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;No LangChain.&lt;/strong&gt; Not out of ideology — out of simplicity. pmem is around 2,000 lines of Python. The RAG pipeline is: embed → store → search → (optionally) synthesize. Four operations don't need a framework.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ChromaDB over everything else.&lt;/strong&gt; File-based, no server process, persistent. I considered LanceDB but never formally evaluated it — ChromaDB was already working and the evaluation wasn't worth the detour. I also considered plain JSON with numpy cosine similarity, which works for small projects but &lt;a href="http://ann-benchmarks.com" rel="noopener noreferrer"&gt;doesn't scale&lt;/a&gt; — brute-force linear scan is O(n) per query. ChromaDB hit the sweet spot: real vector search without operational overhead.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Header-aware chunking.&lt;/strong&gt; Most RAG tutorials split text by character count. That destroys semantic units. A section titled "Why we chose CloudFront over Fastly" that gets split between two chunks loses meaning in both. pmem uses markdown headers as natural split points, with a size-based fallback for sections that are too long. The heading becomes metadata on each chunk, so search results carry their context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CWD walk-up for project detection.&lt;/strong&gt; Same pattern git uses — walk up until you find a &lt;code&gt;.memory&lt;/code&gt; directory. &lt;code&gt;pmem init&lt;/code&gt; creates it, and from that point forward, any subdirectory just works.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setup
&lt;/h2&gt;

&lt;p&gt;Prerequisites: Python 3.11+, Ollama running locally, and the &lt;code&gt;nomic-embed-text&lt;/code&gt; model pulled.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;pmem-project-memory
ollama pull nomic-embed-text
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Initialize any project:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd&lt;/span&gt; ~/your-project
pmem init
pmem index
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Install the session skills:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pmem install-skills
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Register the MCP server in &lt;code&gt;~/.claude.json&lt;/code&gt; (global) or &lt;code&gt;.mcp.json&lt;/code&gt; (per-project). The &lt;a href="https://github.com/avanrossum/pmem-project-memory-tool-for-claude" rel="noopener noreferrer"&gt;README&lt;/a&gt; has the exact config block.&lt;/p&gt;

&lt;p&gt;First index takes a few seconds for small projects, up to a minute for large ones. After that, incremental indexing only re-embeds changed files — typically under a second.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/avanrossum/pmem-project-memory-tool-for-claude" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;⭐ Star pmem on GitHub&lt;/a&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;p&gt;Phase 2 is mostly complete: &lt;code&gt;pmem watch&lt;/code&gt; for auto-reindexing, global config defaults, one-command skill installation, better error messages. Phase 3 is where it gets interesting — multi-collection support, non-markdown file support with language-aware chunking, optional image processing, and &lt;code&gt;pmem diff&lt;/code&gt; to show how answers change over time.&lt;/p&gt;

&lt;p&gt;The tool is open source, MIT licensed. It exists because I needed it, and I suspect anyone running Claude Code on a project with more than a few dozen files needs it too.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Sources: &lt;a href="https://docs.trychroma.com/docs/collections/configure#distance-function" rel="noopener noreferrer"&gt;ChromaDB — Distance Functions&lt;/a&gt; · &lt;a href="http://ann-benchmarks.com" rel="noopener noreferrer"&gt;ANN Benchmarks&lt;/a&gt; (Aumüller, Bernhardsson &amp;amp; Faithfull)&lt;/em&gt;&lt;/p&gt;

</description>
      <category>claudecode</category>
      <category>ai</category>
      <category>python</category>
      <category>opensource</category>
    </item>
    <item>
      <title>The Green-Light Problem</title>
      <dc:creator>Alexander van Rossum</dc:creator>
      <pubDate>Mon, 23 Mar 2026 12:34:00 +0000</pubDate>
      <link>https://dev.to/avanrossum/the-green-light-problem-5hfj</link>
      <guid>https://dev.to/avanrossum/the-green-light-problem-5hfj</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;In this post:&lt;/strong&gt;&lt;br&gt;
A green light with unresolved checkpoints isn't a recommendation. It's a liability. This post covers the anatomy of premature platform migration recommendations, the 'not a blocker' trap, and why stage-gated validation saves money.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A strategy document lands on a client's desk. It recommends a major platform migration. The tone is confident. The structure is logical. The recommendation is clear: move forward.&lt;/p&gt;

&lt;p&gt;Buried on page three, in qualified language, are a handful of caveats. The primary integration hasn't been validated with the client's actual data. The connector vendor hasn't been vetted beyond a cursory website scan, and a core data system that powers the existing workflow isn't mentioned at all. The timeline assumes everything works on the first try.&lt;/p&gt;

&lt;p&gt;The client's leadership reads the document and sees a green light. The technical team reads the same document and sees open questions. &lt;/p&gt;

&lt;p&gt;The gap between those two readings is where six-figure mistakes live.&lt;/p&gt;

&lt;p&gt;If you've ever been three months into a build when someone discovered the integration doesn't actually work, you've been on the receiving end of this gap.&lt;/p&gt;

&lt;h2&gt;
  
  
  The anatomy of a premature recommendation
&lt;/h2&gt;

&lt;p&gt;It's not malicious. It's structural.&lt;/p&gt;

&lt;p&gt;Someone does the real technical analysis. They flag risks, identify dependencies, note unresolved questions, and recommend a staged approach: validate the critical assumptions before committing to a full build. The analysis is honest about what's known and what isn't.&lt;/p&gt;

&lt;p&gt;Then the analysis gets polished for client consumption, and the risks are softened. "This is an unvalidated dependency that could change the entire architecture" becomes "this will require thoughtful implementation." The staged approach gets flattened into a single "we recommend moving forward." The hard questions get cut because they might make the recommendation look uncertain.&lt;/p&gt;

&lt;p&gt;The polished version isn't wrong, &lt;em&gt;exactly&lt;/em&gt;. Everything in it is technically true. But it's selectively true in a way that systematically favors proceeding. The caveats might be present but soft; the confidence is high but unearned. And the client, who is paying for expert guidance on a decision they can't evaluate themselves, reads the document at face value.&lt;/p&gt;

&lt;p&gt;That's The Green-Light Problem. Not a bad recommendation, but premature; a conclusion delivered before the evidence supports it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The "not a blocker" trap
&lt;/h2&gt;

&lt;p&gt;There's a specific phrase that shows up in these documents. It sounds reasonable and is often catastrophic:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Not a blocker."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;An integration with an unvetted vendor, connecting a legacy ERP system to a modern platform, handling thousands of customer-specific pricing matrices and complex approval workflows. The vendor's website has a case study. A sales engineer said it works. Nobody has tested it against the client's actual data, actual edge cases, or actual transaction volume.&lt;/p&gt;

&lt;p&gt;"Not a blocker."&lt;/p&gt;

&lt;p&gt;If that integration fails, the entire architecture changes. The timeline doubles, the budget triples, and the client is three months into a build when they discover that the foundation, around which the whole project was designed, doesn't hold weight.&lt;/p&gt;

&lt;p&gt;Calling an unvalidated dependency "not a blocker" before vetting it is &lt;a href="https://thedecisionlab.com/biases/optimism-bias" rel="noopener noreferrer"&gt;optimism bias&lt;/a&gt; dressed up as a technical assessment. It's the kind of language that makes strategy documents read well, and post-mortems read badly.&lt;/p&gt;

&lt;h2&gt;
  
  
  What "validated" actually means
&lt;/h2&gt;

&lt;p&gt;There's a meaningful difference between "we believe this will work" and "we've proven this works." Strategy documents routinely conflate the two.&lt;/p&gt;

&lt;p&gt;Validation is not: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The vendor says it works&lt;/li&gt;
&lt;li&gt;We found a case study on their website&lt;/li&gt;
&lt;li&gt;It works in a demo environment with sample data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Validation is: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We tested the specific integration with the client's actual data and edge cases in conditions that resemble the production environment&lt;/li&gt;
&lt;li&gt;We documented what happened.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Validation costs money and takes time. It delays the exciting part of the project (the build) in favor of the boring part (the proof). And it is the single most valuable thing a technical advisor can recommend before a six-figure commitment.&lt;/p&gt;

&lt;h2&gt;
  
  
  The timeline cascade
&lt;/h2&gt;

&lt;p&gt;A premature green light doesn't just risk a bad outcome. It creates a compounding timeline problem.&lt;/p&gt;

&lt;p&gt;When the project starts with unvalidated assumptions, the team builds on those assumptions for weeks or months. When one of them turns out to be wrong (and they do, regularly, because that's what "unvalidated" means), the timeline doesn't shift by the time it takes to fix the problem. It shifts by the time it takes to fix the problem &lt;em&gt;plus&lt;/em&gt; the time spent building on the assumption that turned out to be wrong &lt;em&gt;plus&lt;/em&gt; the time spent unwinding the work that depended on it.&lt;/p&gt;

&lt;p&gt;A two-week validation phase at the beginning can prevent a three-month correction in the middle. The math is simple: the two-week validation phase feels like a delay, and the three-month correction feels like bad luck.&lt;/p&gt;

&lt;p&gt;It's not bad luck, but the entirely predictable consequence of skipping validation.&lt;/p&gt;

&lt;h2&gt;
  
  
  The document problem
&lt;/h2&gt;

&lt;p&gt;A well-crafted strategy document can make an unvalidated recommendation look validated. The formatting is professional. The sections follow a logical structure. The language is measured and confident. If you didn't have the technical context to evaluate the claims, you'd read it and feel reassured.&lt;/p&gt;

&lt;p&gt;The people making the platform decision often don't have the technical context. That's why they hired advisors. And when the advisory document systematically smooths over the rough edges, the client loses access to the information they need to make an informed decision.&lt;/p&gt;

&lt;p&gt;This isn't about incompetence. It's about incentives. PMI research on &lt;a href="https://www.pmi.org/learning/library/optimism-bias-terminate-failing-projects-3779" rel="noopener noreferrer"&gt;optimism bias in project delivery&lt;/a&gt; shows that the dilution of risk reporting is one of the most common failure modes in status communication. The path of least resistance is always "we recommend proceeding." Clients want to hear yes. Teams want to move forward. Leadership wants progress. The person who adds a gate and says, "Wait, we haven't validated this yet," is often treated as an obstacle to progress rather than as someone protecting the investment.&lt;/p&gt;

&lt;h2&gt;
  
  
  The fix is boring, and it works
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.smartsheet.com/phase-gate-process" rel="noopener noreferrer"&gt;Stage-gated recommendations&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;That's it.&lt;/p&gt;

&lt;p&gt;"We believe this platform is a viable path for your requirements. Before committing to a full build, we recommend a validation phase. Here's what we'll test, here's what it costs, and here's the criteria we'll use to decide whether to proceed."&lt;/p&gt;

&lt;p&gt;That's not hedging. That's risk management. And the client who hears "we want to prove this works before you spend six figures on it" will trust you more, not less, because you're clearly prioritizing their outcome over your timeline.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;The most expensive sentence in any strategy document is "we recommend moving forward" — when the analysis it's based on isn't finished yet.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Sources:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://thedecisionlab.com/biases/optimism-bias" rel="noopener noreferrer"&gt;Optimism Bias — The Decision Lab&lt;/a&gt; ·&lt;br&gt;
&lt;a href="https://www.pmi.org/learning/library/optimism-bias-terminate-failing-projects-3779" rel="noopener noreferrer"&gt;Optimism Bias and Failure to Terminate Failing Projects — PMI&lt;/a&gt; ·&lt;br&gt;
&lt;a href="https://www.smartsheet.com/phase-gate-process" rel="noopener noreferrer"&gt;Phase-Gate Process — Smartsheet&lt;/a&gt;&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>governance</category>
    </item>
    <item>
      <title>Your AI Reviewer Already Agrees With Your AI Builder (And Role-Switching Won't Fix It)</title>
      <dc:creator>Alexander van Rossum</dc:creator>
      <pubDate>Mon, 16 Mar 2026 12:00:00 +0000</pubDate>
      <link>https://dev.to/avanrossum/your-ai-reviewer-already-agrees-with-your-ai-builder-and-role-switching-wont-fix-it-4n1n</link>
      <guid>https://dev.to/avanrossum/your-ai-reviewer-already-agrees-with-your-ai-builder-and-role-switching-wont-fix-it-4n1n</guid>
      <description>&lt;p&gt;A repo hit 11,000 stars in its first week by solving a real problem: Claude Code in one generic mode produces mediocre output.&lt;/p&gt;

&lt;p&gt;Garry Tan's &lt;a href="https://github.com/garrytan/gstack" rel="noopener noreferrer"&gt;gstack&lt;/a&gt; formalizes "modes" for Claude Code — slash commands that switch the AI between named roles. To name a few:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A CEO lens for product decisions&lt;/li&gt;
&lt;li&gt;A staff engineer for paranoid code review&lt;/li&gt;
&lt;li&gt;A QA lead for testing&lt;/li&gt;
&lt;li&gt;An engineering manager for retrospectives&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The core insight is correct and worth calling out directly: forcing the AI into an explicit role with explicit constraints produces better output than letting it be a generalist.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://github.com/garrytan/gstack/tree/main/browse" rel="noopener noreferrer"&gt;browse tool&lt;/a&gt; — a persistent Chromium binary that gives Claude Code eyes on a running app — is genuine engineering, not a prompt trick. The sequential workflow discipline (plan → engineering review → build → code review → ship) is better than what most people do with AI, which is nothing. This is a meaningful step up from ad-hoc prompting.&lt;/p&gt;

&lt;p&gt;I also noticed a structural limitation within minutes of reading it, and it's the same one I've been building against for months.&lt;/p&gt;

&lt;h2&gt;
  
  
  All the hats, one head
&lt;/h2&gt;

&lt;p&gt;Every mode in gstack runs inside the same context window. &lt;/p&gt;

&lt;p&gt;The "paranoid staff engineer" reviewing your code is the same Claude instance that helped architect it. It already knows &lt;em&gt;why&lt;/em&gt; every decision was made — which means it's primed to find those decisions reasonable.&lt;/p&gt;

&lt;p&gt;This is a self-review wearing a different costume.&lt;/p&gt;

&lt;p&gt;I don't mean that dismissively, because self-assessment checklists have real value — a pilot running preflight catches mistakes that muscle memory alone won't, and that's worth doing every time. But there's a categorical difference between a checklist and an independent review, and the distinction matters — considerably more than it sounds.&lt;/p&gt;

&lt;p&gt;When the reviewer already has the builder's reasoning in context, it's not an evaluation of the output; it's pattern-matching against the justifications that produced it. The same mechanism that makes LLMs coherent — &lt;a href="https://arxiv.org/html/2510.06265v2" rel="noopener noreferrer"&gt;self-consistency&lt;/a&gt; — makes them structurally blind to their own errors when asked to self-review. You're not getting a second opinion; you're getting the first opinion wearing a different hat.&lt;/p&gt;

&lt;p&gt;This is the same reason you don't ask the person who wrote a PR to also approve it. &lt;a href="https://google.github.io/eng-practices/review/" rel="noopener noreferrer"&gt;A different pair of eyes catches what the author is blind to&lt;/a&gt; — not because the author is bad, but because familiarity breeds pattern blindness. AI doesn't change this principle; if anything, it amplifies it — an LLM's self-consistency is &lt;em&gt;more&lt;/em&gt; deterministic than a human's.&lt;/p&gt;

&lt;h2&gt;
  
  
  Parallelism is not independence
&lt;/h2&gt;

&lt;p&gt;Gstack can also use &lt;a href="https://conductor.build" rel="noopener noreferrer"&gt;Conductor&lt;/a&gt; to spin up ten parallel Claude Code sessions. That sounds like separation until you realize it's a performance optimization, not an epistemic one, and more workers in the same bath isn't the same as a clean pool.&lt;/p&gt;

&lt;p&gt;Genuine review requires what I'll call &lt;strong&gt;epistemic separation&lt;/strong&gt;: different priors, no access to the rationalization chain that produced the artifact, and independently accumulated judgment about what builders consistently miss. Without that separation, you get confirmation with extra steps.&lt;/p&gt;

&lt;p&gt;Each of gstack's modes starts fresh every invocation — no accumulated lessons, no pattern library built from previous reviews. The "paranoid staff engineer" is equally paranoid about everything, every time. That's thorough but undirected. A reviewer who doesn't learn which mistakes &lt;em&gt;this&lt;/em&gt; builder tends to make hasn't read the codebase's history.&lt;/p&gt;

&lt;p&gt;For organizations where bugs have real consequences — compliance failures, donor trust violations, limited technical staff to recover from incidents — the difference between costume-change review and independent review is operational risk.&lt;/p&gt;

&lt;h2&gt;
  
  
  What genuine separation looks like
&lt;/h2&gt;

&lt;p&gt;I built the answer to this problem months before gstack existed. I call it The Adversary.&lt;/p&gt;

&lt;p&gt;It's a separate Claude Code project in its own repo with its own governance files, its own accumulated lessons-learned corpus, and zero shared context with the building agent. It receives a read-only symlink to the target codebase and produces a structured review report. It doesn't know what decisions were made or why. It sees &lt;em&gt;output&lt;/em&gt;, not &lt;em&gt;reasoning&lt;/em&gt; — which is exactly how real external review works.&lt;/p&gt;

&lt;p&gt;I'd been building and reviewing this codebase for months — manual human review and agentic self-review, the whole time. The Adversary's first pass found 102 issues. Ten critical. Security vulnerabilities hiding in plain sight — not because the builder was bad, but because independent review catches what self-review structurally cannot.&lt;/p&gt;

&lt;p&gt;The architecture makes it work, not the prompt:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Separate context.&lt;/strong&gt; Different project, different memory, different governance documents. The builder's reasoning chain doesn't exist in The Adversary's world.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Different priors.&lt;/strong&gt; The Adversary accumulates its own pattern library over time — "here's what builders consistently miss" — which makes it sharper with each review. A stateless skill file can't do this.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Structured handoff.&lt;/strong&gt; Artifacts move through a defined channel (symlinks and reports), not a shared session. The reviewer can't be influenced by the builder's justifications because it never sees them. This is the same principle that keeps financial auditors separate from the accounting department.&lt;/p&gt;

&lt;h2&gt;
  
  
  An honest limitation
&lt;/h2&gt;

&lt;p&gt;This can't be fully productized today. The architectural requirement — genuinely independent agents with separate memory, separate accumulated judgment, and separate lesson histories — requires human orchestration: someone who understands where the boundaries need to be and maintains them. The tooling will get there. The architecture won't design itself.&lt;/p&gt;

&lt;p&gt;Anyone can fork a repo of markdown files. The judgment behind "here's where the boundaries need to be and why" is the part that requires experience to get right.&lt;/p&gt;

&lt;p&gt;The methodology is the deliverable, not the CLI tool. And that distinction matters for understanding where gstack fits.&lt;/p&gt;

&lt;h2&gt;
  
  
  What your AI shouldn't know
&lt;/h2&gt;

&lt;p&gt;Gstack represents where most people are in their thinking about AI-assisted development: "I need structured roles for different tasks." That's correct and necessary. The workflow discipline, the browser tooling, the explicit-gear metaphor — all genuinely valuable. The fact that it's open source and spreading is good for the ecosystem.&lt;/p&gt;

&lt;p&gt;But the harder question isn't which hat to put on your AI, or "what persona should your AI wear?"&lt;/p&gt;

&lt;p&gt;It's "what should the AI &lt;em&gt;not know&lt;/em&gt; when it evaluates this work?"&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Sources:&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://github.com/garrytan/gstack" rel="noopener noreferrer"&gt;gstack — GitHub&lt;/a&gt; — Garry Tan's Claude Code skill files (MIT)&lt;br&gt;
·&lt;br&gt;
&lt;a href="https://arxiv.org/html/2510.06265v2" rel="noopener noreferrer"&gt;Large Language Models Hallucination: Comprehensive Survey&lt;/a&gt; — arXiv (self-consistency and self-review blind spots)&lt;br&gt;
·&lt;br&gt;
&lt;a href="https://google.github.io/eng-practices/review/" rel="noopener noreferrer"&gt;Google Engineering Practices — Code Review&lt;/a&gt; — Google&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architectire</category>
      <category>codereview</category>
    </item>
    <item>
      <title>I Finally Have a Team - It Just Happens to be AI</title>
      <dc:creator>Alexander van Rossum</dc:creator>
      <pubDate>Sat, 14 Mar 2026 19:25:21 +0000</pubDate>
      <link>https://dev.to/avanrossum/i-finally-have-a-team-it-just-happens-to-be-ai-4ho5</link>
      <guid>https://dev.to/avanrossum/i-finally-have-a-team-it-just-happens-to-be-ai-4ho5</guid>
      <description>&lt;p&gt;Eight months ago, I was perpetually behind. On everything.&lt;/p&gt;

&lt;p&gt;I don't mean "busy." Busy implies you're making progress on too many things at once. I was making insufficient progress on all of them. React components for a client project. AWS infrastructure governance for another. Kubernetes migrations with hard deadlines. Salesforce automations that needed attention three weeks ago. Each domain had its own language, its own context, its own state — and switching between them wasn't just a time cost. It was a cognitive tax that compounded with every transition.&lt;/p&gt;

&lt;p&gt;By 3 PM most days, I wasn't making decisions anymore. I was recovering from the last context switch while dreading the next one.&lt;/p&gt;

&lt;h2&gt;
  
  
  The email that started it
&lt;/h2&gt;

&lt;p&gt;The first thing I used AI for — really used it, not just experimented — was writing emails. GPT-3.5. I would brain-dump everything I needed to communicate into a chat window — unstructured, grammatically questionable, half-formed thoughts — and get back something I could send after one or two editing passes.&lt;/p&gt;

&lt;p&gt;That sounds trivial. &lt;/p&gt;

&lt;p&gt;It wasn't.&lt;/p&gt;

&lt;p&gt;Email was consuming more cognitive bandwidth than I'd realized. Not the content — the &lt;em&gt;composition&lt;/em&gt;. Translating technical context into stakeholder-appropriate language, structuring the message so the key points are at the top, and correcting tone where needed. Every email was a small act of translation, and I was writing dozens a day.&lt;/p&gt;

&lt;p&gt;Offloading the composition freed up space I didn't know I was missing. Not a lot — but enough to notice that the constraint wasn't time: it was &lt;em&gt;cognitive bandwidth&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  From assistant to collaborator
&lt;/h2&gt;

&lt;p&gt;With GPT-4, ChatGPT got better. I started using it for more than email — rapidly prototyping WordPress plugins, troubleshooting legacy code (especially the undocumented kind, which was most of it), and reasoning through architectural decisions where I needed a second opinion that wasn't going to judge me for asking a question I should probably already know the answer to.&lt;/p&gt;

&lt;p&gt;The shift was gradual. The AI went from "a tool I use" to "a collaborator I consult," and the distinction matters. A tool does what you tell it; a collaborator helps you figure out &lt;em&gt;what to tell it&lt;/em&gt;. The governance documents I'd started writing — almost accidentally, just experimenting to get consistent output — were turning The Collaborator into something more reliable. &lt;/p&gt;

&lt;p&gt;Something that remembered how I think.&lt;/p&gt;

&lt;h2&gt;
  
  
  The migration that proved it
&lt;/h2&gt;

&lt;p&gt;About six months ago, I had to undertake a significant solo infrastructure migration. Hundreds of containers across multiple environments with a hard deadline driven by external constraints that weren't negotiable.&lt;/p&gt;

&lt;p&gt;The responsible estimate for this work — with a team of six experienced engineers — was six to nine months. I had three months. &lt;/p&gt;

&lt;p&gt;And I was the team.&lt;/p&gt;

&lt;p&gt;Were it not for ChatGPT, KiloCode, Cursor, and later Claude, I would not have been able to complete it. That is not a hyperbolic statement; it is not "it would have been harder." I would literally not have been able to complete the migration within the constraints I was given - while still juggling my "regular work." The project would have failed, or I would have.&lt;/p&gt;

&lt;p&gt;Agentic AI enabled me to operate at a scale previously unavailable to a single person. Not because the AI wrote all the code — it didn't. But because it could hold the context of each subsystem, while I focused on the decisions that actually needed a human. The infrastructure state, the dependency graphs, the rollback procedures — the AI held that, so I could hold and refine the strategy.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://dev.to/work/kubernetes-migration"&gt;case study&lt;/a&gt; tells the technical story. The human story is simpler: I shipped it — on time, no less — and I didn't &lt;em&gt;completely&lt;/em&gt; burn out doing it. Both of those outcomes were improbable without the tooling.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Department
&lt;/h2&gt;

&lt;p&gt;After the migration, I fully committed to Claude Code and started building what I now call The Department.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;a href="https://dev.to/avanrossum/i-manage-ai-agents-the-way-i-manage-teams-1hgm"&gt;site architect&lt;/a&gt; for this website — layouts, components, editorial, SEO&lt;/li&gt;
&lt;li&gt;A sysadmin agent for infrastructure governance&lt;/li&gt;
&lt;li&gt;A Project Manager that unifies my communication between Slack and Asana - and keeps me from missing things&lt;/li&gt;
&lt;li&gt;An observability bot for monitoring&lt;/li&gt;
&lt;li&gt;A content agent&lt;/li&gt;
&lt;li&gt;A life-strategy agent&lt;/li&gt;
&lt;li&gt;Several agents in charge of writing software, like Actions, Panoptisana, and the &lt;a href="https://dev.to/avanrossum/i-built-a-markdown-editor-in-a-weekend-because-every-other-one-annoyed-me-252e"&gt;Markdown Editor&lt;/a&gt; I'm using to write and edit this post&lt;/li&gt;
&lt;li&gt;And several more, besides&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each one has a defined role, governance documents, and institutional memory that persists across sessions.&lt;/p&gt;

&lt;p&gt;The ability to context-switch without context-switching is the thing I didn't know I needed.&lt;/p&gt;

&lt;p&gt;When I need to work on infrastructure, I open the sysadmin agent. It knows the current state of every system I manage and what we did in the last session. It knows the conventions, the constraints, and the things I've told it not to touch. I don't have to reconstruct any of that — I just pick up where I left off.&lt;/p&gt;

&lt;p&gt;When I am working on my website, the site architect has the same depth in its domain (as well as LinkedIn). Different context, different conventions, different memory — but the same experience of walking into a room where someone already knows what's going on.&lt;/p&gt;

&lt;p&gt;The mental relief is almost too great to put into words. The thing that was destroying me — carrying the state of many different domains in my head simultaneously, losing pieces of each every time I switched — is the thing the agents handle. My working memory is freed for the decisions that actually need my judgment — strategy, architecture, ideation. &lt;/p&gt;

&lt;p&gt;Everything else, the agents hold.&lt;/p&gt;

&lt;h2&gt;
  
  
  The inversion
&lt;/h2&gt;

&lt;p&gt;One of the most significant personal findings of this process is that the output scaled &lt;em&gt;because&lt;/em&gt; the cognitive load dropped. Not the other way around.&lt;/p&gt;

&lt;p&gt;The conventional model is that more output requires more effort, more tracking, and more stress. You scale by working harder or hiring more people. The cognitive load tracks linearly (or worse) with the output.&lt;/p&gt;

&lt;p&gt;The Department inverts — or perhaps, subverts — that: more domains under management, more projects shipping, and perhaps more importantly, the ability to rapidly switch between them without losing momentum. And all of that occurs with less cognitive overhead — because the overhead has been &lt;a href="https://dev.to/avanrossum/cognitive-offloading-5hjm"&gt;offloaded&lt;/a&gt; to agents whose entire job is holding the context I used to carry in my head.&lt;/p&gt;

&lt;p&gt;It's not just automation; I'm not replacing tasks I used to do manually — though I certainly do when it makes sense. It's amplification — extending what I can hold and act on simultaneously. The decisions, strategy, and judgment calls are still mine, but the state-tracking, the context-holding, the "where was I?" recovery — that's distributed across a team that doesn't forget, doesn't get tired, and doesn't need me to repeat myself.&lt;/p&gt;

&lt;h2&gt;
  
  
  Still behind
&lt;/h2&gt;

&lt;p&gt;I'm still behind. &lt;/p&gt;

&lt;p&gt;I don't think that will ever change. The scope of what I'm required to do always expands to fill (and slightly exceed) the capacity I have — that's a result of employment and a feature of ambition, not solely a bug in the tooling.&lt;/p&gt;

&lt;p&gt;But the texture of "behind" has changed. Eight months ago, behind meant drowning; it meant context switching so fast that I couldn't maintain identity in any single domain. It meant 3 PM cognitive shutdowns and the creeping feeling that I was failing at everything simultaneously.&lt;/p&gt;

&lt;p&gt;Now, behind means I have more projects than hours. The state of each one is held by an agent that's ready when I am. The cognitive tax of switching is close to zero. And when I stop for the day, nothing is lost — it's all documented, governed, and waiting for the next session.&lt;/p&gt;

&lt;p&gt;And there are still domains that don't have a door to agentic work yet — the ones where the process is opaque, sequential, and offers no meaningful feedback. Try getting a 10DLC campaign approved through Twilio when a denial comes back as "didn't pass" with no further explanation. There's nothing to reason about, nothing to architect. Just guess, resubmit, wait, repeat. Those still run on spite... if I have time.&lt;/p&gt;

&lt;p&gt;I'm still behind. But I'm not losing my sanity in the process.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>systemdesign</category>
      <category>infrastructure</category>
      <category>leadership</category>
    </item>
    <item>
      <title>Cognitive Offloading</title>
      <dc:creator>Alexander van Rossum</dc:creator>
      <pubDate>Tue, 10 Mar 2026 13:46:37 +0000</pubDate>
      <link>https://dev.to/avanrossum/cognitive-offloading-5hjm</link>
      <guid>https://dev.to/avanrossum/cognitive-offloading-5hjm</guid>
      <description>&lt;p&gt;I carried a notebook in my back pocket for years. These were ratty little things - usually held together with Gaffers tape. I called it the butt book, because that's where it lived. The idea was simple: whenever something worth remembering surfaced, I'd write it down before it disappeared.&lt;/p&gt;

&lt;p&gt;It worked, for capture. The ideas made it onto paper. The crisis of "I just had a thought and now it's gone" happened less often. But the notebooks accumulated, and the ideas inside them became a graveyard. If I remembered to go back and find something — and that's a significant "if" — I still had to locate it, interpret my own handwriting, and reconstruct whatever context made the idea seem worth writing down in the first place.&lt;/p&gt;

&lt;p&gt;The capture problem was solved. The retrieval problem never was.&lt;/p&gt;

&lt;h2&gt;
  
  
  Every system I tried solved the same half of the problem
&lt;/h2&gt;

&lt;p&gt;Evernote. Obsidian. Apple Notes. Todoist. Each one promised a different organizational model — tags, backlinks, smart folders, natural-language reminders. Each one worked for about two weeks, which is roughly how long it takes for a structured environment to get out of whack when you have (undiagnosed!) ADHD and the system requires you to maintain it.&lt;/p&gt;

&lt;p&gt;The pattern was always the same: set it up, use it enthusiastically, let it drift, watch the structure collapse under its own weight, abandon it for the next thing. Not because the tools were bad — because they all assumed I'd come back to them. Every system required me to initiate retrieval. To remember that I'd stored something, navigate to where I'd stored it, and find it among everything else I'd stored.&lt;/p&gt;

&lt;p&gt;That's three cognitive tasks before you even get to the information you need. For someone whose working memory is the bottleneck, that's three chances to lose the thread.&lt;/p&gt;

&lt;p&gt;Notion is the exception, but only because I use it exclusively for school and keep it aggressively structured. Tight scope, rigid templates, no room to drift. It works precisely because I don't let it become a general-purpose system.&lt;/p&gt;

&lt;h2&gt;
  
  
  So I built one
&lt;/h2&gt;

&lt;p&gt;Before the current wave of AI tools, I built a thing called GetRamble. It has a phone number. I can text it at any time — in line at the grocery store, in the middle of a meeting, at 2am — and OpenAI's API would turn my stream of consciousness into categorized notes.&lt;/p&gt;

&lt;p&gt;My kids would ask who "ramble" was because I said it so often: "Hey Siri, text ramble."&lt;/p&gt;

&lt;p&gt;It worked. Really well, actually. I was still using it as recently as a few months ago. The capture problem and the categorization problem were both solved — text a rambling thought, get back structured, searchable notes.&lt;/p&gt;

&lt;p&gt;But Ramble stalled.&lt;/p&gt;

&lt;p&gt;I was building it with a combination of my own work and Replit. Replit couldn't stay sane — the same ungoverned-architecture problem I've since built an entire methodology around solving. Eventually, it became more work to wrangle the features than to get results, and I didn't have the bandwidth to rewrite it myself. Full-time job, school, wife, two kids. The 10DLC compliance burden alone — the regulatory framework for application-to-person messaging — was a part-time job for a one-person team.&lt;/p&gt;

&lt;p&gt;I wanted to monetize it. But without capital and a testing cohort, I couldn't release it into the wild. The product was good. The architecture wasn't stable enough to trust — and at the time, I didn't have a word for what was missing. I just knew I couldn't ship something I'd have to maintain at 2 am when it broke in ways I couldn't predict.&lt;/p&gt;

&lt;p&gt;Will I finish it? Probably not — I have better tools now. But the experience was formative. It's part of where my governance methodology comes from. I built something that worked, and watched it collapse not because the idea was wrong, but because the system around it couldn't hold.&lt;/p&gt;

&lt;h2&gt;
  
  
  What changed wasn't the tool — it was the architecture
&lt;/h2&gt;

&lt;p&gt;Claude Code didn't solve the capture problem better than Ramble. It solved a different problem entirely: it made retrieval automatic.&lt;/p&gt;

&lt;p&gt;Every previous system — analog or digital, simple or AI-powered — required me to go get the information, remember I'd stored something, navigate to it, and load it back into working memory. Claude Code's governance documents flipped that model. The agent reads its own state at the start of every session. I don't retrieve. The system loads.&lt;/p&gt;

&lt;p&gt;That distinction is the whole thing.&lt;/p&gt;

&lt;p&gt;The plan exists, it's maintained, it's comprehensive — but it never demands my attention. It's there when I need it and invisible when I don't. I can forget it exists and still follow it, because the system is holding the state, not me.&lt;/p&gt;

&lt;p&gt;Three things make this work in practice:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Project-level state persistence.&lt;/strong&gt; Each project maintains its own context through governance documents. I can revisit any project at any time and get an immediate snapshot — not by reading through files myself, but by asking the agent what's current. The project's memory survives the session boundary because it was designed to.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rapid idea triage.&lt;/strong&gt; When an idea surfaces now, I don't write it in a notebook and hope I'll find it later. I spin up a prototype — Excalidraw wireframe, governance templates, a solid directive — and within a single conversation, I know whether the idea has legs. If it does, it gets filed into my project management system with full context attached. If it doesn't, it gets archived cleanly. Either way, it's out of my head and into a system that can hold it without my participation. The cognitive cost of exploring an idea dropped from "a weekend" to "a conversation."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A personal project manager that doesn't require me to manage it.&lt;/strong&gt; I run a lightweight environment that stores the state of everything I'm tracking — a set of JSON index files with descriptions pointing to full markdown files for detail. No RAG, no vector database. A poor man's index that works because the scope is deliberate and the governance is tight. It started as a scratchpad within another project and became a standalone system when the &lt;a href="https://dev.to/avanrossum/i-manage-ai-agents-the-way-i-manage-teams-1hgm"&gt;separation of concerns&lt;/a&gt; demanded it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Choosing what stays in your mind
&lt;/h2&gt;

&lt;p&gt;Cognitive offloading is the deliberate process of choosing what stays in your mind and building systems to handle the rest.&lt;/p&gt;

&lt;p&gt;Not productivity hacking. Not "getting organized." Architecture — designed to match how your brain actually operates rather than how productivity systems assume it should.&lt;/p&gt;

&lt;p&gt;The butt book was cognitive offloading. Ramble was cognitive offloading. But they were incomplete implementations — they solved capture without solving retrieval, so the offloaded information ended up in cold storage with no mechanism to bring it back when it mattered.&lt;/p&gt;

&lt;p&gt;What I'm building now is the complete architecture: capture, categorization, persistence, and automatic retrieval. The information flows out of my head and into governed systems that carry it forward — not just storing it, but delivering it at the right time, in the right context, without requiring me to remember it exists.&lt;/p&gt;

&lt;p&gt;The background anxiety lifts. Not because the work is less important, but because I'm no longer the one responsible for holding it all. The system holds it. I think about whatever is actually in front of me.&lt;/p&gt;

&lt;p&gt;That's not a productivity gain. That's an architectural change in how I allocate cognitive resources — and it turns out it applies to AI agents the same way it applies to human brains, because the failure modes are structurally identical.&lt;/p&gt;

&lt;p&gt;If your system requires you to remember to use it, it's not offloading anything. It's just adding a task.&lt;/p&gt;

&lt;p&gt;If you've been building similar systems, I'd love to hear about it.&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>ai</category>
      <category>goverrnance</category>
    </item>
    <item>
      <title>Cognitive Property: Who Owns the Way You Think?</title>
      <dc:creator>Alexander van Rossum</dc:creator>
      <pubDate>Mon, 09 Mar 2026 13:23:14 +0000</pubDate>
      <link>https://dev.to/avanrossum/cognitive-property-who-owns-the-way-you-think-2960</link>
      <guid>https://dev.to/avanrossum/cognitive-property-who-owns-the-way-you-think-2960</guid>
      <description>&lt;p&gt;AI tools picking up and repeating your habits isn't new. ChatGPT does it by design — it mirrors your tone, adapts to your preferences, and learns what you respond well to. The phenomenon has received copious amounts of screen time and discussion bandwidth.&lt;/p&gt;

&lt;p&gt;But something specific happened recently that shifted the way I think about it.&lt;/p&gt;

&lt;p&gt;One of my AI instances started using a ◡̈ I put at the end of casual notes, and picked up the → and ← characters I use for bullet points and emphasis in certain contexts. Formatting preferences and structural choices I never explicitly taught — they just started appearing.&lt;/p&gt;

&lt;p&gt;Then another instance, working on a completely different project, picked up the same arrow convention independently. Same human, same patterns, different context.&lt;/p&gt;

&lt;p&gt;The AI isn't just mirroring my preferences; it's learning to mirror my thinking. And once I noticed that, a harder question followed: if my reasoning patterns are being encoded into a transferable format — documented, structured, portable — then who owns them?&lt;/p&gt;

&lt;h2&gt;
  
  
  Your cognition is being encoded
&lt;/h2&gt;

&lt;p&gt;If you work deeply with AI tools (and I mean deeply, not "summarize this email" or "write me a cover letter"), you're building something most people haven't named yet.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Repeatable cognitive patterns in plain text.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I don't mean prompt history or chat logs. I mean the governance documents you've created — either intentionally or through organic growth — to define how &lt;em&gt;your&lt;/em&gt; AI agents operate. The CLAUDE.md / AGENT.md files that encode your engineering standards, your writing styles, your humor, your architectural preferences, and your coding philosophy. The decision-making frameworks that tell the AI how to prioritize, how to break down problems, and how to structure their thinking in a way that matches yours.&lt;/p&gt;

&lt;p&gt;Over time, you've been documenting the way you reason. Not abstractly — specifically. In plain text. In a format that is entirely transferable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Your operating system, as data
&lt;/h2&gt;

&lt;p&gt;Take those governance documents and feed them to a fresh AI instance. What do you get?&lt;/p&gt;

&lt;p&gt;A working version of how you solve problems.&lt;/p&gt;

&lt;p&gt;Not a perfect copy, but a functional one. An instance that knows your architectural preferences, your communication style, your quality standards, and your decision-making heuristics. It won't be you, but it will be able to operate like you in ways that are measurably, verifiably close.&lt;/p&gt;

&lt;p&gt;That's not a productivity feature. That's a &lt;a href="https://www.sciencedirect.com/science/article/pii/S0896627324006524" rel="noopener noreferrer"&gt;cognitive fingerprint&lt;/a&gt;. And the fact that it exists in a format that can be copied, transferred, and scaled changes the conversation about who owns what.&lt;/p&gt;

&lt;h2&gt;
  
  
  This isn't a new IP question — except it is
&lt;/h2&gt;

&lt;p&gt;The ownership of workplace knowledge has been debated as long as people have changed jobs. U.S. copyright law has a specific mechanism for it — the &lt;a href="https://www.venable.com/insights/publications/ip-quick-bytes/understanding-the-work-made-for-hire-doctrine" rel="noopener noreferrer"&gt;work-made-for-hire doctrine&lt;/a&gt; assigns authorship to the employer when works are created within the scope of employment. You learn skills at a company and take them with you when you leave. Nobody seriously argues that everything you learned becomes corporate property.&lt;/p&gt;

&lt;p&gt;But this is different in a specific way: the cognitive pattern isn't just in your head anymore. It's documented. It's structured. It's portable. And it works without you.&lt;/p&gt;

&lt;p&gt;Previous generations of knowledge workers left with expertise — hard to quantify, impossible to transfer directly. You leave with expertise AND a governance repo that can reproduce a meaningful chunk of your operations. That's never been possible before.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cognitive property
&lt;/h2&gt;

&lt;p&gt;People are treating AI personalization like it's a nice-to-have feature. A convenience. "My Claude knows how I like my code structured." Cool, time saver.&lt;/p&gt;

&lt;p&gt;It's a lot more than a time saver: it's &lt;em&gt;cognitive property&lt;/em&gt;. And right now, the ownership question hasn't even been asked.&lt;/p&gt;

&lt;p&gt;If you're building this kind of depth on a corporate AI account, with corporate tools, on company time... the question of who owns those patterns matters a lot more than you think. And the answer, under &lt;a href="https://www.bradley.com/insights/publications/2023/10/ai-in-the-modern-workplace-ownership-challenges-of-ai-generated-code" rel="noopener noreferrer"&gt;most current employment agreements&lt;/a&gt;, is probably being decided by boilerplate that nobody wrote with cognitive property in mind.&lt;/p&gt;

&lt;h2&gt;
  
  
  The conversation that needs to happen now
&lt;/h2&gt;

&lt;p&gt;This is a more urgent conversation than AGI governance, and I say that knowing how provocative it sounds. AGI governance matters, and it'll matter more as we get closer. But it's not happening today.&lt;/p&gt;

&lt;p&gt;This is happening today. People are building repeatable cognitive patterns in transferable formats. They're externalizing their reasoning into documents that function without them. And most of them haven't thought about who gets to keep it.&lt;/p&gt;

&lt;p&gt;That question needs to be asked before it becomes standard practice to assume companies own whatever cognitive patterns emerge from AI tools used on company time.&lt;/p&gt;

&lt;p&gt;Legal and policy scholars are &lt;a href="https://academic.oup.com/policyandsociety/article/44/1/1/7997395" rel="noopener noreferrer"&gt;already raising these questions&lt;/a&gt; about generative AI and intellectual property. But most of that work focuses on model outputs, not on the cognitive patterns of the person doing the work.&lt;/p&gt;

&lt;p&gt;The ownership conversation is overdue. &lt;/p&gt;

&lt;p&gt;This is part of a four-post series, and the next post starts drawing the boundary.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Employment law &amp;amp; IP&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.venable.com/insights/publications/ip-quick-bytes/understanding-the-work-made-for-hire-doctrine" rel="noopener noreferrer"&gt;Understanding the Work Made for Hire Doctrine&lt;/a&gt; — Venable LLP. Plain-English explainer of work-for-hire under the Copyright Act of 1976.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.bradley.com/insights/publications/2023/10/ai-in-the-modern-workplace-ownership-challenges-of-ai-generated-code" rel="noopener noreferrer"&gt;AI in the Modern Workplace: Ownership Challenges of AI-Generated Code&lt;/a&gt; — Bradley Arant Boult Cummings. Employee use of GenAI does not change that code written in the course of employment belongs to the employer.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://livescu.ucla.edu/ai-copyright-law-and-work-made-for-hire/" rel="noopener noreferrer"&gt;AI, Copyright Law, and Work-Made-For-Hire&lt;/a&gt; — UCLA Livescu Initiative. Scholarly discussion of how work-for-hire breaks down for AI-generated material.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI governance &amp;amp; cognitive data&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://academic.oup.com/policyandsociety/article/44/1/1/7997395" rel="noopener noreferrer"&gt;Governance of Generative AI&lt;/a&gt; — Policy and Society (Oxford Academic). Survey of IP and data-governance gaps in generative AI, including the need for new ownership frameworks.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.sciencedirect.com/science/article/pii/S0896627324006524" rel="noopener noreferrer"&gt;Beyond Neural Data: Cognitive Biometrics and Mental Privacy&lt;/a&gt; — Magee, Ienca &amp;amp; Farahany, Neuron (2024). Argues that cognitive and behavioral patterns function as uniquely identifying data, extending privacy concerns beyond neural signals.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>identity</category>
      <category>intellectualproperty</category>
    </item>
    <item>
      <title>I Manage AI Agents the Way I Manage Teams</title>
      <dc:creator>Alexander van Rossum</dc:creator>
      <pubDate>Thu, 05 Mar 2026 21:08:28 +0000</pubDate>
      <link>https://dev.to/avanrossum/i-manage-ai-agents-the-way-i-manage-teams-1hgm</link>
      <guid>https://dev.to/avanrossum/i-manage-ai-agents-the-way-i-manage-teams-1hgm</guid>
      <description>&lt;p&gt;I run multiple AI agents across several projects. A site architect for my website. A content agent for editorial work. A sysadmin agent for infrastructure. An observability bot for monitoring. Each one has a defined role, documented standards, and clear boundaries.&lt;/p&gt;

&lt;p&gt;At some point — I couldn't tell you exactly when — I stopped thinking about this as "using AI tools" and started thinking about it as managing a team. Not in the Silicon Valley "AI teammate" marketing sense. In the actual management sense: the same principles I'd apply to a group of human engineers producing real work under real constraints.&lt;/p&gt;

&lt;p&gt;The more I leaned into that framing, the more the system improved. Because it turns out the management disciplines that make human teams effective aren't abstractions. They're operational patterns that apply to AI agents without (much) modification.&lt;/p&gt;

&lt;h2&gt;
  
  
  Separation of concerns is just a job description
&lt;/h2&gt;

&lt;p&gt;Each agent has a job, and it does that job. The site architect handles the website — layouts, components, performance, SEO, and editorial. I originally separated content into its own agent, but the editorial voice needed enough architectural context that splitting them created more coordination overhead than it saved — so I consolidated. That's the methodology working as designed: the right boundary isn't always more boundaries. The sysadmin agent handles infrastructure — AARs, topology documentation, environment configs. My Project Management Agent manages tasks and responsibilities in Asana.&lt;/p&gt;

&lt;p&gt;They don't freelance into each other's domains.&lt;/p&gt;

&lt;p&gt;This sounds obvious, but the default approach most people take with AI is the opposite: one chat, one agent, everything. Code review and creative writing and data analysis and debugging, all in the same conversation. It works the way having one employee handle engineering, marketing, and customer support "works." You get output. But you get inconsistent output, because the agent's context is split across too many domains to maintain depth in any of them.&lt;/p&gt;

&lt;p&gt;Separation of concerns for AI agents is the same principle as separation of concerns for human teams. Defined roles reduce cognitive load, prevent context pollution, and produce better work — because the agent's entire context window is focused on the domain it's responsible for, not half-occupied by the residue of a different conversation about a different problem.&lt;/p&gt;

&lt;p&gt;The loose catch-all still exists. For me, it's the core Claude chat interface — the equivalent of walking over to someone's desk for a quick question that doesn't belong in anyone's formal workflow. Not everything needs a scoped agent. But the work that matters does.&lt;/p&gt;

&lt;h2&gt;
  
  
  Clear guidelines are just an employee handbook
&lt;/h2&gt;

&lt;p&gt;Every agent has governance documents. CLAUDE.md, ARCHITECTURE.md, ROADMAP.md — the governance layer that defines standards, patterns, boundaries, and institutional memory.&lt;/p&gt;

&lt;p&gt;This is onboarding. You wouldn't hire a developer and say "just go build." You'd hand them the style guide, the architecture overview, the deployment process, the list of things not to touch. You'd give them context before expecting output.&lt;/p&gt;

&lt;p&gt;AI agents need the same thing — except they need it more, because they can't compensate for missing context the way humans can. A human developer who doesn't know the naming convention will ask a colleague, read the existing code, or make a reasonable guess based on experience. An AI agent without documented conventions will make a different reasonable guess every session. Monday it's camelCase. Tuesday it's snake_case. Wednesday it's whatever it inferred from the three files it happened to read first.&lt;/p&gt;

&lt;p&gt;The governance documents aren't overhead. They're the mechanism that produces consistency — the employee handbook that every agent reads at the start of every session, ensuring that today's work is compatible with yesterday's.&lt;/p&gt;

&lt;h2&gt;
  
  
  Focus and respect are just professionalism
&lt;/h2&gt;

&lt;p&gt;This might surprise people, but it matters: I interact with my agents the way I'd interact with professional colleagues. Focused. Respectful of their time (which in this case means their &lt;em&gt;context window&lt;/em&gt;). No off-topic tangents unless the situation genuinely warrants it.&lt;/p&gt;

&lt;p&gt;This isn't sentiment. It's practical. Every message in a context window consumes tokens. Off-topic chatter, excessive small talk, or rambling prompts pollute the context with irrelevant information. For a human colleague, that's an interruption that costs focus. For an AI agent, it's worse — it's permanent context noise that degrades every subsequent response in the session.&lt;/p&gt;

&lt;p&gt;Respecting the agent's context window is the same principle as respecting an employee's cognitive bandwidth. You wouldn't ask your database architect to weigh in on your marketing copy. You wouldn't CC everyone on every email. The same instinct applies: keep the interaction focused on the domain, and the output stays focused on the domain.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to hire: the spinoff pattern
&lt;/h2&gt;

&lt;p&gt;The management parallel that convinced me this wasn't just a useful metaphor — but an operational truth — was the first time I had to restructure.&lt;/p&gt;

&lt;p&gt;My sysadmin Claude Code instance manages infrastructure context: AARs, topology documentation, and environment configs. Straightforward scope. At some point, I had it build a small Telegram notification bot as a utility — a quick way to monitor the overall health of the systems I am responsible for.&lt;/p&gt;

&lt;p&gt;The notification bot worked. Then it proved useful enough that I started expanding it. More alert types, better formatting, scheduling logic, error handling. Before I knew it, the "small utility" had grown into a legitimate standalone project sitting inside an agent whose job description was completely different.&lt;/p&gt;

&lt;p&gt;The signal was the same one any team lead recognizes: the context required to do the work well had grown beyond what a single entity could reasonably hold. The agent's CLAUDE.md was bloated with two domains' worth of conventions. Half the context window was consumed by scope that wasn't relevant to whichever task was actually in front of it.&lt;/p&gt;

&lt;p&gt;So I did what I'd do with a human team member whose role had quietly split into two distinct jobs: I restructured. New repository. New governance documents. New architecture spec. New agent. The observability bot got its own development track, its own context, its own focused governance. The sysadmin agent went back to doing what it was actually scoped for.&lt;/p&gt;

&lt;p&gt;The alternative — and this is the part that maps directly to organizational dysfunction — is letting scope accumulate until the agent is doing five things adequately instead of one thing well. That's not an AI problem. That's a management problem, and every team lead has seen it happen with humans. The person who's in every meeting, owns every escalation, and somehow has three job titles on their email signature. The fix is the same in both cases: increase HR. It's not a performance problem; it's an organizational design problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  Good management doesn't depend on the medium
&lt;/h2&gt;

&lt;p&gt;The argument I keep coming back to is simple: good management is good management. The medium changes — from a human team to an AI team — but the principles don't.&lt;/p&gt;

&lt;p&gt;Defined roles prevent confusion. Documentation prevents context loss. Focus prevents scope creep. Restructuring when the scope outgrows the role prevents degradation.&lt;/p&gt;

&lt;p&gt;These aren't AI-specific insights; they're management fundamentals that happen to apply perfectly to AI agents — because the failure modes are structurally identical. An overloaded AI agent degrades the same way an overloaded employee degrades. Not through incompetence, but through insufficient structure around the work.&lt;/p&gt;

&lt;p&gt;The people getting inconsistent results from AI aren't writing bad prompts. They're practicing bad management. And the fix isn't a better prompt template or a more capable model. It's the same fix it's always been: clear roles, documented expectations, and the discipline to restructure when the scope outgrows the container.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>management</category>
      <category>governance</category>
    </item>
    <item>
      <title>I Built a Markdown Editor in a Weekend Because Every Other One Annoyed Me</title>
      <dc:creator>Alexander van Rossum</dc:creator>
      <pubDate>Mon, 02 Mar 2026 20:51:03 +0000</pubDate>
      <link>https://dev.to/avanrossum/i-built-a-markdown-editor-in-a-weekend-because-every-other-one-annoyed-me-252e</link>
      <guid>https://dev.to/avanrossum/i-built-a-markdown-editor-in-a-weekend-because-every-other-one-annoyed-me-252e</guid>
      <description>&lt;p&gt;I didn't plan to build a markdown editor this weekend. I was working on something else, and somewhere in the middle of it I opened my markdown editor to take notes and my annoyance with every markdown editor I've tried finally reached a head.&lt;/p&gt;

&lt;p&gt;Not annoyed in the "this is broken" sense. Annoyed in the "why does this app need a cloud account and fourteen features I'll never use" sense. Every alternative I'd tried had the same problem in different packaging — too expensive, too bloated, or too clever.&lt;/p&gt;

&lt;p&gt;So I opened &lt;a href="https://docs.anthropic.com/en/docs/claude-code" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt; and started building one.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I built
&lt;/h2&gt;

&lt;p&gt;Three panes. File browser on the left, editor in the middle, live preview on the right. Tabs for multiple open files. Session restore — close the app, reopen it, everything's still there. Dark mode. Search and replace. A formatting toolbar for the things I always forget the syntax for.&lt;/p&gt;

&lt;p&gt;That's it. No cloud sync, no collaboration, no plugin architecture; just markdown files on my computer, edited in a clean interface.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjwmjadwkzxwkg1a5mw2x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjwmjadwkzxwkg1a5mw2x.png" alt="Simple Markdown Editor — three-pane layout with file browser, editor, and live preview" width="800" height="573"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The stack: &lt;strong&gt;Electron 33&lt;/strong&gt;, &lt;strong&gt;React 18&lt;/strong&gt;, &lt;strong&gt;CodeMirror 6&lt;/strong&gt;, &lt;strong&gt;marked&lt;/strong&gt; for GFM rendering, &lt;strong&gt;Vite 6&lt;/strong&gt; for the build, &lt;strong&gt;electron-builder&lt;/strong&gt; for packaging.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why the first version was usable in under an hour
&lt;/h2&gt;

&lt;p&gt;Not because AI is magic. Because the AI had context.&lt;/p&gt;

&lt;p&gt;I use a governance-first development workflow: before any code gets written, the AI agent has access to persistent architecture documents, wireframes, a personal coding style guide, personal design guidelines, and detailed specifications. These files survive across sessions and context window compaction. The prompt describes &lt;em&gt;what&lt;/em&gt; to build. The governance documents describe &lt;em&gt;how&lt;/em&gt; to build it, and to what standard.&lt;/p&gt;

&lt;p&gt;That's the difference between "a thing that kind of works" and "a thing I'm actually using that same day."&lt;/p&gt;

&lt;p&gt;Not perfect on the first pass. But functional enough that I was taking notes in it within the first hour. Then I started tweaking.&lt;/p&gt;

&lt;p&gt;(And I wrote this post using it)&lt;/p&gt;

&lt;h2&gt;
  
  
  The interesting technical bits
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Bidirectional scroll sync&lt;/strong&gt; — The naive approach (percentage-based) breaks immediately when the editor and preview have different content heights. I built section-based anchor mapping instead: the editor and preview each maintain a map of heading positions, and scrolling either pane updates the corresponding pane by interpolating between anchor points. Both directions stay aligned regardless of content length differences.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Smart formatting toolbar&lt;/strong&gt; — Each button doesn't just apply formatting — it first checks whether the cursor is already inside that formatting and toggles it off. Heading buttons cycle through H1→H2→H3→paragraph. List buttons handle multi-line selections and continue numbering from preceding items. Small details that make the toolbar feel considered rather than tacked on.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;External change detection&lt;/strong&gt; — This was a requirement of the original spec — and when it surfaced later during code review, it surprised me. Edit a file in another app while it's open in the editor, and you get a full diff view showing exactly what changed. Options: keep your version, accept the external changes, or save as a new file. No silent overwrites. I'd completely forgotten I'd even added it until I triggered it accidentally and thought &lt;em&gt;oh, that's actually amazing.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0f5jehrvqrxw9umzcl9p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0f5jehrvqrxw9umzcl9p.png" alt="External change detection — diff view showing changes made in another editor" width="800" height="573"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Session restore&lt;/strong&gt; — Open tabs, active tab, folder path, scroll positions, and window size/position all persist across app restarts. Multi-window support (Cmd+Shift+N), each window preserves its own state. Close the app, open it tomorrow — everything's exactly where you left it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The security wake-up call
&lt;/h2&gt;

&lt;p&gt;After the features were working, I ran an adversarial code review on the codebase — a separate Claude Code instance with its own repo, its own governance documents, read-only access to the target code, and zero shared context with the building agent. Its only job is to find everything wrong.&lt;/p&gt;

&lt;p&gt;What it found was embarrassing in the best way.&lt;/p&gt;

&lt;p&gt;The worst finding wasn't missing security — it was the &lt;em&gt;ceremony&lt;/em&gt; of security without the substance. &lt;code&gt;contextBridge&lt;/code&gt;, &lt;code&gt;contextIsolation: true&lt;/code&gt;, proper cleanup functions — all present, all technically correct, and all masking a straight pipeline from a malicious &lt;code&gt;.md&lt;/code&gt; file to arbitrary filesystem access. The &lt;code&gt;sandbox: false&lt;/code&gt; with a wrong justification comment was the cherry on top.&lt;/p&gt;

&lt;p&gt;It's exactly the kind of thing that survives review after review because it &lt;em&gt;sounds&lt;/em&gt; right, and nobody actually traces the dependency to verify it.&lt;/p&gt;

&lt;p&gt;Specific fixes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;XSS prevention&lt;/strong&gt; — DOMPurify sanitizes all markdown before rendering in the preview pane&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sandbox enabled&lt;/strong&gt; — Chromium sandbox and context isolation enforced on all windows&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Filesystem access control&lt;/strong&gt; — path validation limits access to home directory and &lt;code&gt;/Volumes&lt;/code&gt;; sensitive directories (&lt;code&gt;.ssh&lt;/code&gt;, &lt;code&gt;.gnupg&lt;/code&gt;, &lt;code&gt;.aws&lt;/code&gt;) blocked&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Path traversal protection&lt;/strong&gt; — &lt;code&gt;local-resource://&lt;/code&gt; protocol restricted to image file extensions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Content Security Policy&lt;/strong&gt; — tightened CSP on settings and update dialogs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;URL scheme allowlisting&lt;/strong&gt; — &lt;code&gt;shell.openExternal&lt;/code&gt; limited to &lt;code&gt;https://&lt;/code&gt;, &lt;code&gt;http://&lt;/code&gt;, &lt;code&gt;mailto:&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Twenty-plus security fixes across eleven versions. This is the part that concerns me about the current wave of AI-generated code shipping without independent review — technically functional apps with exploitable security models, because the same agent that writes the code is also the only one evaluating the code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Thirty-one versions in two days
&lt;/h2&gt;

&lt;p&gt;v0.1.0 to v0.1.31 in a weekend, and not because I was rushing — because the governance-first pattern means each feature lands cleanly, gets tested, gets committed, and the next one starts from solid ground.&lt;/p&gt;

&lt;p&gt;The app is signed and notarized with Apple, auto-updates from GitHub Releases, handles file associations (shows up in Finder's "Open With" menu for &lt;code&gt;.md&lt;/code&gt;, &lt;code&gt;.markdown&lt;/code&gt;, &lt;code&gt;.mdx&lt;/code&gt;, &lt;code&gt;.txt&lt;/code&gt; files), and restores all windows with their tabs and folder paths on relaunch.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it deliberately doesn't do
&lt;/h2&gt;

&lt;p&gt;No cloud sync. No collaboration. No Vim mode. No WYSIWYG. No plugin system. No account creation. No subscription. No telemetry.&lt;/p&gt;

&lt;p&gt;Every markdown editor eventually tries to become a knowledge management platform. This one won't. The filesystem is the organizational layer. Git is the version control. Markdown is the format — portable, readable, owned by you. The editor just makes working with those files fast and pleasant.&lt;/p&gt;

&lt;p&gt;Your files are plain markdown on disk. Open them with anything, anywhere, forever.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;It's still beta - v0.1.x - but it's a functional beta. Are there problems? Probably, and I'll find them while dogfooding.  But it's satisfying my use case, and that's good enough for now.&lt;/p&gt;

&lt;p&gt;The code is on &lt;a href="https://github.com/avanrossum/a_simple_markdown_editor" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; — MIT licensed, macOS only. Grab the &lt;code&gt;.dmg&lt;/code&gt; from &lt;a href="https://github.com/avanrossum/a_simple_markdown_editor/releases/latest" rel="noopener noreferrer"&gt;Releases&lt;/a&gt;. Signed and notarized, no Gatekeeper warnings. macOS 12+ required, Apple Silicon supported.&lt;/p&gt;

&lt;p&gt;If you live in markdown and every editor you've tried wants to be something it shouldn't be — this one doesn't.&lt;/p&gt;

</description>
      <category>markdown</category>
      <category>ai</category>
      <category>devtools</category>
      <category>electron</category>
    </item>
    <item>
      <title>The AI Perimeter: Where Automation Should End and Judgment Should Begin</title>
      <dc:creator>Alexander van Rossum</dc:creator>
      <pubDate>Thu, 26 Feb 2026 14:14:08 +0000</pubDate>
      <link>https://dev.to/avanrossum/the-ai-perimeter-where-automation-should-end-and-judgment-should-begin-2767</link>
      <guid>https://dev.to/avanrossum/the-ai-perimeter-where-automation-should-end-and-judgment-should-begin-2767</guid>
      <description>&lt;p&gt;Everyone posting about AI is selling it. The frameworks, the workflows, the "10x your productivity" threads — all of it points in one direction. Nobody builds a following by telling you to slow down.&lt;/p&gt;

&lt;p&gt;So here's my credibility pitch: I use AI agents for about 95% of my development work. I've shipped &lt;a href="https://dev.to/products/actions"&gt;features&lt;/a&gt;, caught &lt;a href="https://dev.to/blog/the-adversary"&gt;security vulnerabilities&lt;/a&gt;, and managed &lt;a href="https://dev.to/work/ai-sprint-management"&gt;entire sprint cycles&lt;/a&gt; with AI tooling that most people posting about it haven't opened. And I'm telling you there are things I won't use it for — not because I'm hedging, but because I've pushed the tool far enough to know where it breaks.&lt;/p&gt;

&lt;p&gt;I make that decision a dozen times a week, and most of the time I don't even notice I'm making it. That's not instinct — it's pattern recognition built from doing this work every day. The judgment becomes automatic. And that judgment, not the tooling itself, is the actual skill.&lt;/p&gt;

&lt;h2&gt;
  
  
  You can't water a seed that doesn't exist
&lt;/h2&gt;

&lt;p&gt;I've tried using AI to generate ideas from scratch. Not refine an idea. Not pressure-test a concept. Generate one — from nothing.&lt;/p&gt;

&lt;p&gt;It doesn't work.&lt;/p&gt;

&lt;p&gt;AI is extraordinary at expanding, refining, challenging, and structuring ideas. Hand it a rough concept and it'll find angles you missed, surface contradictions, and help you think through implications faster than you could alone. But it needs raw material. Something rough, something human, something that came from &lt;em&gt;your&lt;/em&gt; context and &lt;em&gt;your&lt;/em&gt; pattern recognition. Without that, you get the most statistically average version of whatever you asked for.&lt;/p&gt;

&lt;p&gt;The seed has to be yours. AI is an amplifier. Without a signal, it amplifies noise.&lt;/p&gt;

&lt;p&gt;Every project I've shipped started with a human idea — scribbled in Excalidraw, talked through with a friend, or captured in a voice memo at 2am. These are the same &lt;a href="https://dev.to/blog/llms-are-practically-adhd"&gt;'scaffolding' patterns&lt;/a&gt; I’ve used to manage state-loss in my own brain; the AI pipeline just turns that scaffolding into working software But the pipeline needs an input. If you skip the human part, you get sophisticated mediocrity — technically correct, architecturally sound, and completely devoid of the insight that would have made it worth building.&lt;/p&gt;

&lt;h2&gt;
  
  
  Dropdown fields for grief
&lt;/h2&gt;

&lt;p&gt;There's a scene in &lt;em&gt;Leviathan Wakes&lt;/em&gt; — the novel that became &lt;em&gt;The Expanse&lt;/em&gt; — where Detective Miller has to write a condolence letter. The system gives him a form:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;To the [husband / wife / mother / father] of [victim name]. We are sorry to inform you that [he / she] was killed aboard [ship / station] on [date]. Please accept our condolences.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Dropdown fields for grief. Efficient. Covers all the cases. Soulless.&lt;/p&gt;

&lt;p&gt;That's what happens when you fully automate emotional communication. And the instinct to reach for AI here is understandable — writing a difficult email is &lt;em&gt;hard&lt;/em&gt;, and the blank page is intimidating. But "hard" is exactly the point. The difficulty is the signal that a human needs to be doing this.&lt;/p&gt;

&lt;p&gt;Where AI &lt;em&gt;can&lt;/em&gt; help with emotional communication is in the middle of the process, not at the beginning or end. You write the first draft — the messy, human, probably-too-long version that says what you actually mean. Then you run it through AI for structure: tighten the phrasing, catch the paragraph that buries the point, find the sentence that says two things when it should say one. Then you do a final pass as a human, because the AI's version will be cleaner but might have smoothed away the part that actually mattered.&lt;/p&gt;

&lt;p&gt;Start human. Refine with AI. Finish human. Skip any of those steps and you get either a mess or a template — and people can tell the difference.&lt;/p&gt;

&lt;h2&gt;
  
  
  The compliance line
&lt;/h2&gt;

&lt;p&gt;I'd use AI to manage a Python 2 to Python 3 migration. Identify deprecated patterns, rewrite syntax, flag compatibility issues across a codebase. Bounded, verifiable, and the cost of a missed edge case is a failing test, not a breach. (It still needs human review — even if you use an &lt;a href="https://dev.to/blog/the-adversary"&gt;adversarial agent&lt;/a&gt; for code review, the human makes the final call.)&lt;/p&gt;

&lt;p&gt;I would not use AI to rotate secrets.&lt;/p&gt;

&lt;p&gt;I would not upload a CSV of client data to an LLM and ask it to generate invoices. Not because the model can't do the math — because a hallucinated line item creates a compliance violation and a client who will never trust you again. The financial services sector is already &lt;a href="https://biztechmagazine.com/article/2025/08/llm-hallucinations-what-are-implications-financial-institutions" rel="noopener noreferrer"&gt;grappling with this&lt;/a&gt; — inaccurate AI outputs in regulated environments don't just create errors, they create regulatory exposure. Invoicing requires auditability, and "the AI did it" is not a line item your accountant can reconcile.&lt;/p&gt;

&lt;p&gt;I would not feed PII into a public AI system. Full stop. This isn't about whether the model will get the answer right — it's about what happens to that data after it leaves your system. LLMs can &lt;a href="https://www.lasso.security/blog/llm-data-privacy" rel="noopener noreferrer"&gt;memorize and regurgitate fragments of their training data&lt;/a&gt;, and unless you're on an enterprise plan with contractual guarantees about data handling, your client's personally identifiable information is potentially entering a training pipeline you don't control and can't audit. That's not an AI problem. That's a data governance problem, and it exists whether the output is correct or not.&lt;/p&gt;

&lt;p&gt;The line isn't about capability. Modern models can do all of these things technically. The line is about what happens when they're wrong — and, in the case of PII, what happens even when they're right. A botched Python migration produces a failing test suite. A botched secret rotation produces a security incident. A hallucinated invoice produces a compliance violation. Client data in a training pipeline produces a breach of trust that no output quality can justify.&lt;/p&gt;

&lt;p&gt;And these aren't edge cases waiting to be patched. Hallucinations are &lt;a href="https://datanucleus.dev/corporate-governance-compliance/ai-hallucinations-rag-and-human-in-loop-risk-mitigation" rel="noopener noreferrer"&gt;an inherent property of how language models work&lt;/a&gt; — they predict the most statistically likely next token, not the most factually correct one. That gap doesn't close with better prompts. It closes with governance, verification, and human oversight. Treating hallucinations as bugs to be fixed is how organizations build false confidence in systems that need guardrails.&lt;/p&gt;

&lt;p&gt;The rule: if the cost of a wrong answer exceeds the cost of doing it manually, the AI shouldn't be doing it unsupervised. "Probably right" is fine for code review. It's not fine for anything where "probably" means "we might get sued."&lt;/p&gt;

&lt;p&gt;This is the same principle behind &lt;a href="https://www.ibm.com/think/topics/human-in-the-loop" rel="noopener noreferrer"&gt;human-in-the-loop design&lt;/a&gt; — and behind &lt;a href="https://dev.to/work/ai-sprint-management"&gt;my own workflow&lt;/a&gt;. The AI generates. The human executes. Not because the AI can't execute — because the gap between "can" and "should" is exactly where the expensive mistakes live.&lt;/p&gt;

&lt;h2&gt;
  
  
  Voice is collaboration, not delegation
&lt;/h2&gt;

&lt;p&gt;Every post on this site started as something I wrote. AI expanded it, tightened the structure, caught weak arguments, and helped me think through what I actually meant. But the voice is mine. The opinions are mine. The experiences are mine.&lt;/p&gt;

&lt;p&gt;If you hand an AI "write me an article about quantum mechanics," you'll get the most average article about quantum mechanics that has ever existed. Not wrong. Not interesting. Think of it as convergence to the mean — the model produces the statistical center of everything it's seen on that topic, and the statistical center of anything is, by definition, unremarkable. It's the same reason every AI-generated LinkedIn post sounds like every other AI-generated LinkedIn post.&lt;/p&gt;

&lt;p&gt;And this isn't just an aesthetic problem. GenAI is &lt;a href="https://www.pwc.com/us/en/tech-effect/ai-analytics/ai-hallucinations.html" rel="noopener noreferrer"&gt;designed to provide the most likely output&lt;/a&gt;, which means it defaults to confident, well-structured prose even when the thinking behind it is shallow. Readers trust polished writing more than they should. The result is content that sounds more authoritative than it deserves to be — and that false authority is its own kind of hallucination.&lt;/p&gt;

&lt;p&gt;Voice requires the same pattern as emotional communication: start human, refine with AI, finish human. The AI needs to know what you sound like, what you care about, what hills you'll die on. That context doesn't come from a single prompt — it comes from &lt;a href="https://dev.to/blog/what-is-pass-at-1"&gt;governance documents&lt;/a&gt; that encode your standards, your patterns, your constraints.&lt;/p&gt;

&lt;blockquote&gt;It comes from working with the tool long enough that you know its blind spots.&lt;/blockquote&gt;

&lt;p&gt;The distinction matters because the audience can always tell. "AI-generated content" and "AI-assisted content" are not the same thing. One reads like a template. The other reads like a person who had help organizing their thoughts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three questions before you automate
&lt;/h2&gt;

&lt;p&gt;Before I hand any task to an AI agent, I ask three questions:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I verify the output?&lt;/strong&gt; If I can check the work faster than I can do the work, AI is a net win. If verification requires as much expertise and time as the original task, I've added a step without saving anything.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is the cost of a wrong answer low?&lt;/strong&gt; &lt;a href="https://dev.to/blog/the-adversary"&gt;Code review&lt;/a&gt; that misses something means I catch it later. A billing error means a client relationship is damaged. A compliance failure means lawyers. Match the automation level to the stakes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does sufficient context exist in the system?&lt;/strong&gt; AI works when the &lt;a href="https://dev.to/blog/what-is-pass-at-1"&gt;governance documents&lt;/a&gt; provide enough structure for a correct first-pass implementation. If the context is ambiguous, incomplete, or doesn't exist yet — the agent will fill in the gaps with confident guesses, and you won't always catch them.&lt;/p&gt;

&lt;p&gt;If any answer is "no," the task stays manual. Not forever — sometimes the fix is building the context that makes automation safe. But automating a task that fails these checks isn't efficiency. It's introducing risk and calling it productivity.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where it works
&lt;/h2&gt;

&lt;p&gt;This isn't an anti-AI post. My entire workflow depends on AI tooling. Well-bounded transformation work, &lt;a href="https://dev.to/blog/the-adversary"&gt;adversarial code review&lt;/a&gt; against defined standards, any task where &lt;a href="https://dev.to/blog/what-is-pass-at-1"&gt;governance documents&lt;/a&gt; provide sufficient context for a correct first pass — these are places where AI genuinely accelerates. And once the seed exists, AI is the best thinking partner most people have ever had access to. It doesn't get tired, doesn't get defensive, and will argue the other side of any position if you ask it to.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tool selection is the expertise
&lt;/h2&gt;

&lt;p&gt;A good chef knows when to use the food processor and when to use the knife. The processor is faster. The knife gives you control. Using the wrong one in the wrong place doesn't make you efficient — it makes you someone who doesn't understand their kitchen.&lt;/p&gt;

&lt;p&gt;AI is the most powerful tool most of us have ever had access to. That makes knowing when &lt;em&gt;not&lt;/em&gt; to use it more important, not less. The capability is not the question. The judgment is.&lt;/p&gt;

&lt;p&gt;If your AI strategy is "use AI for everything," you don't have a strategy. You have enthusiasm. And enthusiasm without judgment is how you end up with dropdown fields for grief.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Sources:&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://biztechmagazine.com/article/2025/08/llm-hallucinations-what-are-implications-financial-institutions" rel="noopener noreferrer"&gt;LLM Hallucinations: What Are the Implications for Financial Institutions?&lt;/a&gt; — BizTech Magazine&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.lasso.security/blog/llm-data-privacy" rel="noopener noreferrer"&gt;LLM Data Privacy: Risks, Challenges &amp;amp; Best Practices&lt;/a&gt; — Lasso Security&lt;/p&gt;

&lt;p&gt;&lt;a href="https://datanucleus.dev/corporate-governance-compliance/ai-hallucinations-rag-and-human-in-loop-risk-mitigation" rel="noopener noreferrer"&gt;AI Hallucinations, RAG and Human-in-Loop Risk Mitigation&lt;/a&gt; — DataNucleus&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.ibm.com/think/topics/human-in-the-loop" rel="noopener noreferrer"&gt;What Is Human-in-the-Loop?&lt;/a&gt; — IBM&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.pwc.com/us/en/tech-effect/ai-analytics/ai-hallucinations.html" rel="noopener noreferrer"&gt;What Are AI Hallucinations?&lt;/a&gt; — PwC&lt;/p&gt;

</description>
      <category>ai</category>
      <category>leadership</category>
      <category>architecture</category>
    </item>
  </channel>
</rss>
