<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Michael Tuszynski</title>
    <description>The latest articles on DEV Community by Michael Tuszynski (@michaeltuszynski).</description>
    <link>https://dev.to/michaeltuszynski</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1447774%2Fa99eea93-7845-4764-9fce-b1755bcfa456.png</url>
      <title>DEV Community: Michael Tuszynski</title>
      <link>https://dev.to/michaeltuszynski</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/michaeltuszynski"/>
    <language>en</language>
    <item>
      <title>Your Claude Plugin Marketplace Needs More Than a Git Repo</title>
      <dc:creator>Michael Tuszynski</dc:creator>
      <pubDate>Mon, 20 Apr 2026 23:48:03 +0000</pubDate>
      <link>https://dev.to/michaeltuszynski/your-claude-plugin-marketplace-needs-more-than-a-git-repo-5631</link>
      <guid>https://dev.to/michaeltuszynski/your-claude-plugin-marketplace-needs-more-than-a-git-repo-5631</guid>
      <description>&lt;h1&gt;
  
  
  Your Claude Plugin Marketplace Needs More Than a Git Repo
&lt;/h1&gt;

&lt;p&gt;Anthropic's &lt;a href="https://github.com/anthropics/claude-plugins-official" rel="noopener noreferrer"&gt;official plugin directory&lt;/a&gt; has 55+ curated Claude Code plugins since launch. The community tracks another 72. By end of 2026, most large engineering orgs will have a private marketplace too — and most of them will ship the architecture wrong.&lt;/p&gt;

&lt;p&gt;The mistake: treating the marketplace as a distribution channel when it's actually a governance layer.&lt;/p&gt;

&lt;h2&gt;
  
  
  What a Claude Code Plugin Actually Is
&lt;/h2&gt;

&lt;p&gt;A plugin is a directory with a &lt;code&gt;.claude-plugin/plugin.json&lt;/code&gt; manifest and some combination of skills (&lt;code&gt;skills/&amp;lt;name&amp;gt;/SKILL.md&lt;/code&gt;), agents, hooks, and MCP servers. When a user runs &lt;code&gt;/plugin install code-formatter@acme-tools&lt;/code&gt;, Claude Code clones the plugin to &lt;code&gt;~/.claude/plugins/cache/&amp;lt;marketplace&amp;gt;/&amp;lt;plugin&amp;gt;/&amp;lt;version&amp;gt;/&lt;/code&gt; and loads it into the session.&lt;/p&gt;

&lt;p&gt;Critical point: &lt;strong&gt;plugins run fully trusted code&lt;/strong&gt; inside your developers' sessions. Skills execute shell commands. MCP servers install arbitrary binaries. Hooks intercept every tool call. There is no sandboxing in 2026. Trust is transitive — trust the Git host, trust the repo's access control, trust the authors.&lt;/p&gt;

&lt;p&gt;That's what "setting up your own marketplace" is really about. Distribution is the easy part.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Three-Layer Model
&lt;/h2&gt;

&lt;p&gt;Most engineers see plugins and think "install mechanism." The actual hierarchy is three layers:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;th&gt;Where it lives&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Skill/plugin&lt;/td&gt;
&lt;td&gt;Runs code in user sessions&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;skills/&amp;lt;name&amp;gt;/SKILL.md&lt;/code&gt;, &lt;code&gt;.claude-plugin/plugin.json&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Marketplace&lt;/td&gt;
&lt;td&gt;Lists approved plugins, sources, versions&lt;/td&gt;
&lt;td&gt;&lt;code&gt;.claude-plugin/marketplace.json&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Managed settings&lt;/td&gt;
&lt;td&gt;Controls which marketplaces users can add&lt;/td&gt;
&lt;td&gt;&lt;code&gt;managed-settings.json&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Skills do the work. The marketplace decides which skills exist. Managed settings decide which marketplaces your employees can reach at all.&lt;/p&gt;

&lt;p&gt;Skip layer 3 and you've built a distribution channel, not an enterprise platform.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Manifest That Matters
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://code.claude.com/docs/en/plugin-marketplaces" rel="noopener noreferrer"&gt;&lt;code&gt;marketplace.json&lt;/code&gt;&lt;/a&gt; looks deceptively simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"acme-tools"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"owner"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"DevTools Team"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"plugins"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sfdc-lint"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"github"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"repo"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"acme-corp/sfdc-lint-plugin"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"ref"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"v2.1.0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"sha"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"a1b2c3d4e5f6..."&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Blocks commits that violate Salesforce field access rules"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;sha&lt;/code&gt; field is the thing nobody talks about. Pin to a commit hash, not a tag. Tags move. Commits don't. This is how you ship deterministic plugins across a 5,000-engineer org without one dev accidentally upgrading to a broken version during a release freeze.&lt;/p&gt;

&lt;p&gt;Release channels work the same way — keep two marketplaces (&lt;code&gt;acme-tools-stable&lt;/code&gt;, &lt;code&gt;acme-tools-canary&lt;/code&gt;) pointing at different branches with distinct pinned SHAs. Developers opt into canary. Everyone else stays on stable.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Lockdown: &lt;code&gt;strictKnownMarketplaces&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;This is the setting that separates hobbyist from enterprise. In your org's managed settings:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"strictKnownMarketplaces"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"github"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"repo"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"acme-corp/approved-plugins"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"hostPattern"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"hostPattern"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"^github&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;.acme-corp&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;.com$"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Empty array = lockdown, no third-party marketplaces. A list = allowlist. A &lt;code&gt;hostPattern&lt;/code&gt; regex = domain-level control. Pair with &lt;code&gt;extraKnownMarketplaces&lt;/code&gt; to auto-inject your corporate marketplace so engineers don't have to discover or configure it.&lt;/p&gt;

&lt;p&gt;Without this, any employee can run &lt;code&gt;/plugin marketplace add random-internet-person/totally-safe-plugins&lt;/code&gt; and suddenly you have unreviewed scripts running with their local credentials, SSH keys, and AWS tokens. The technical capability is identical to the company marketplace. The governance is what changes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hooks Are the Runtime Enforcement Layer
&lt;/h2&gt;

&lt;p&gt;Marketplaces control what gets installed. Hooks control what happens at runtime.&lt;/p&gt;

&lt;p&gt;PreToolUse hooks fire before any tool call. They receive JSON on stdin describing the call, and can exit 0 (allow), non-zero (block), or rewrite the tool arguments. This is where you enforce org policy at execution time — not just installation time.&lt;/p&gt;

&lt;p&gt;Real examples worth shipping in your internal plugin:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Block &lt;code&gt;Bash&lt;/code&gt; calls containing &lt;code&gt;curl | sh&lt;/code&gt; patterns&lt;/li&gt;
&lt;li&gt;Reject &lt;code&gt;Edit&lt;/code&gt;/&lt;code&gt;Write&lt;/code&gt; to &lt;code&gt;~/.aws/credentials&lt;/code&gt;, &lt;code&gt;~/.ssh/id_*&lt;/code&gt;, or &lt;code&gt;/etc/&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Require human approval for MCP calls to external APIs&lt;/li&gt;
&lt;li&gt;Log every filesystem mutation to a central audit service&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The marketplace decides which plugins your team can install. The hooks decide what those plugins can actually do once loaded. You need both.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Most Orgs Will Get Wrong
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Mistake #1: Private repo as marketplace.&lt;/strong&gt; Putting a &lt;code&gt;skills/&lt;/code&gt; directory in a private GitHub repo and asking people to clone it. That skips the &lt;code&gt;marketplace.json&lt;/code&gt; layer entirely — no version pinning, no managed-settings enforcement, no clean update mechanism. Works for 5 people. Breaks at 500.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mistake #2: Trusting the public directory.&lt;/strong&gt; Anthropic's &lt;code&gt;claude-plugins-official&lt;/code&gt; is curated, but "curated" is not "audited for your security posture." The 72+ community plugins have zero security review from anyone. Don't install these org-wide based on install counts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mistake #3: Forgetting the seed directory.&lt;/strong&gt; For air-gapped environments or CI containers, &lt;code&gt;CLAUDE_CODE_PLUGIN_SEED_DIR&lt;/code&gt; lets you bake plugins into the image at build time. Skip it and your CI pipelines hit GitHub every run — network errors, rate limits, surprise outages when Microsoft-owned infrastructure blips.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mistake #4: Treating security as a scanning problem.&lt;/strong&gt; There's no binary signing for Claude plugins in 2026. Static analysis catches obvious issues; it won't catch a plugin that calls an LLM to generate malicious commands at runtime. Your defense is the marketplace allowlist + runtime hooks + human code review before plugins enter the marketplace. Not automated scanners.&lt;/p&gt;

&lt;h2&gt;
  
  
  The First-Week Setup
&lt;/h2&gt;

&lt;p&gt;If you're standing up an enterprise marketplace this quarter:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Two repos, separated.&lt;/strong&gt; &lt;code&gt;acme-plugins-marketplace&lt;/code&gt; (just &lt;code&gt;marketplace.json&lt;/code&gt; and approval metadata) and your actual plugin source repos. Separating manifest from code lets you revise approval lists without touching plugin internals.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Write a managed-settings policy.&lt;/strong&gt; Lock &lt;code&gt;strictKnownMarketplaces&lt;/code&gt; to your corporate marketplace. Ship via your MDM or config-management tool (Ansible, Jamf, Chef, whatever already provisions dev laptops).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Start with three plugins.&lt;/strong&gt; Something low-risk (commit-message helper), something internal-facing (your SSO login flow), something domain-specific (your SOW template for client work). Three forces you to confront versioning, namespacing, and review questions before you scale.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Ship a PreToolUse audit hook on day one.&lt;/strong&gt; Even if it just logs. Retrofitting observability onto 50 plugins later is miserable.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Document the review process.&lt;/strong&gt; Who reviews? What's the bar? What gets rejected? Most orgs skip this and end up with approval-by-loudest-voice. Write it down before you take the first submission.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The tools are ready. The distribution mechanism is solved. What's left is organizational work — deciding who decides, and who's accountable when a plugin ships something stupid.&lt;/p&gt;

</description>
      <category>claudecode</category>
      <category>platformengineering</category>
      <category>ai</category>
      <category>enterpriseai</category>
    </item>
    <item>
      <title>The Model Doesn't Matter. The Harness Does.</title>
      <dc:creator>Michael Tuszynski</dc:creator>
      <pubDate>Sun, 19 Apr 2026 17:04:46 +0000</pubDate>
      <link>https://dev.to/michaeltuszynski/the-model-doesnt-matter-the-harness-does-599</link>
      <guid>https://dev.to/michaeltuszynski/the-model-doesnt-matter-the-harness-does-599</guid>
      <description>&lt;h1&gt;
  
  
  The Model Doesn't Matter. The Harness Does.
&lt;/h1&gt;

&lt;p&gt;Six frontier coding models now score within 0.8 points of each other on SWE-bench Verified. The same model wrapped in different agent frameworks swings almost ten points on SWE-bench Pro. Picking an agent platform based on which model it runs misses where the real performance differences come from.&lt;/p&gt;

&lt;p&gt;The useful shift in framing came from Birgitta Böckeler at Thoughtworks, writing &lt;a href="https://martinfowler.com/articles/harness-engineering.html" rel="noopener noreferrer"&gt;on martinfowler.com&lt;/a&gt;: Agent = Model + Scaffolding. The scaffolding is everything that isn't the model — the tool definitions, the context compaction, the error recovery logic, the feedback sensors, the system prompt, the memory between sessions. That's the layer where most of the variance in real-world agent performance actually lives.&lt;/p&gt;

&lt;h2&gt;
  
  
  The data
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://particula.tech/blog/agent-scaffolding-beats-model-upgrades-swe-bench" rel="noopener noreferrer"&gt;Particula Tech's analysis&lt;/a&gt; lines up four agent frameworks running the same Claude Opus 4.5 against SWE-bench Pro:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Framework&lt;/th&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;SEAL (standardized scaffold)&lt;/td&gt;
&lt;td&gt;Opus 4.5&lt;/td&gt;
&lt;td&gt;45.9%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cursor&lt;/td&gt;
&lt;td&gt;Opus 4.5&lt;/td&gt;
&lt;td&gt;50.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Auggie (Augment)&lt;/td&gt;
&lt;td&gt;Opus 4.5&lt;/td&gt;
&lt;td&gt;51.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Code&lt;/td&gt;
&lt;td&gt;Opus 4.5&lt;/td&gt;
&lt;td&gt;55.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Same weights, 9.5-point spread. And when Meta and Harvard's Confucius Code Agent ran Sonnet 4.5 with its own scaffold, it scored 52.7% — beating Opus 4.5 on Anthropic's stock framework at 52.0%. A cheaper model with better scaffolding beat the flagship on its vendor's own agent.&lt;/p&gt;

&lt;p&gt;Single-variable changes produce similar results. Grok Code Fast went from 6.7% to 68.3% on coding benchmarks after changing only the edit tool format — same model, same prompts. LangChain's coding agent moved from 52.8% to 66.5% on Terminal Bench 2.0 by improving task decomposition and tool use, with no model swap. Adding WarpGrep as a specialized search subagent added 2.1 to 3.7 points across every model tested, while cutting cost 15.6% and runtime 28%.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Anthropic and OpenAI are publishing
&lt;/h2&gt;

&lt;p&gt;Both labs have written up their own agent framework work in the past year, and neither piece is about model training.&lt;/p&gt;

&lt;p&gt;Anthropic's &lt;a href="https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents" rel="noopener noreferrer"&gt;"Effective harnesses for long-running agents"&lt;/a&gt; describes a two-part pattern for agents working across many context windows. An initializer agent writes a feature list as structured JSON, an &lt;code&gt;init.sh&lt;/code&gt; script, and a progress file. A coding agent reads the progress file, picks one feature, commits its work, and updates progress. Same model in both roles — the difference is what the scaffolding makes visible between sessions.&lt;/p&gt;

&lt;p&gt;The observation that drove this: Opus 4.5 running on the Claude Agent SDK in a loop, given the prompt "build a clone of claude.ai," would run out of context mid-implementation, then declare victory too early in a later session because the environment looked mostly done. The fix wasn't a better model. It was a scaffolding pattern that handed off state correctly.&lt;/p&gt;

&lt;p&gt;OpenAI's &lt;a href="https://openai.com/index/harness-engineering/" rel="noopener noreferrer"&gt;writeup on agent-first engineering&lt;/a&gt; goes further. Three engineers shipped a million lines of code across 1,500 PRs in five months with no manually-written code. Codex wrote the application logic, the tests, the CI configuration, the docs. The humans' job was to design the environment — wiring Chrome DevTools into the agent runtime so Codex could drive its own UI, exposing logs via LogQL and metrics via PromQL, making the app bootable per git worktree so each task ran on an isolated instance.&lt;/p&gt;

&lt;p&gt;Their framing: "Our most difficult challenges now center on designing environments, feedback loops, and control systems." That's the team at the company that trains the models saying the models aren't the bottleneck.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where the cycles should go
&lt;/h2&gt;

&lt;p&gt;For teams actually building agents, the practical implication is that model upgrades at the frontier buy roughly one point on benchmarks, while scaffolding improvements can buy twenty or more. The components with measurable impact are fairly consistent across the published case studies: tool orchestration (how tools get discovered and composed), context management (compacting aggressively, keeping recent tool outputs intact, persisting structured state between sessions), error recovery (the gap between 42% and 78% on SWE-bench is largely recovery from mistakes rather than fewer mistakes), deterministic feedback sensors like linters and structural tests that run on every change, and planning-execution separation where one agent decides and another does.&lt;/p&gt;

&lt;p&gt;None of this requires a frontier model. A well-scaffolded Sonnet beats a poorly-scaffolded Opus, at about a fifth the cost.&lt;/p&gt;

&lt;p&gt;The model-as-differentiator era appears to be ending. Frontier capability has converged within a point or two, and the interesting engineering — the work that actually moves benchmarks and production reliability — is happening in the scaffolding around the model.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Sources:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Böckeler, Birgitta. &lt;a href="https://martinfowler.com/articles/harness-engineering.html" rel="noopener noreferrer"&gt;"Harness engineering for coding agent users."&lt;/a&gt; martinfowler.com, April 2026.&lt;/li&gt;
&lt;li&gt;Mondragon, Sebastian. &lt;a href="https://particula.tech/blog/agent-scaffolding-beats-model-upgrades-swe-bench" rel="noopener noreferrer"&gt;"Agent Scaffolding Beats Model Upgrades: 42% to 78% on SWE-Bench."&lt;/a&gt; Particula Tech, March 2026.&lt;/li&gt;
&lt;li&gt;Anthropic. &lt;a href="https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents" rel="noopener noreferrer"&gt;"Effective harnesses for long-running agents."&lt;/a&gt; November 2025.&lt;/li&gt;
&lt;li&gt;Lopopolo, Ryan. &lt;a href="https://openai.com/index/harness-engineering/" rel="noopener noreferrer"&gt;"Harness engineering: leveraging Codex in an agent-first world."&lt;/a&gt; OpenAI, February 2026.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>platformengineering</category>
      <category>llm</category>
    </item>
    <item>
      <title>AI Agents Are Killing Seat-Based SaaS Pricing. Here's What's Replacing It.</title>
      <dc:creator>Michael Tuszynski</dc:creator>
      <pubDate>Sat, 18 Apr 2026 02:29:58 +0000</pubDate>
      <link>https://dev.to/michaeltuszynski/ai-agents-are-killing-seat-based-saas-pricing-heres-whats-replacing-it-2bk3</link>
      <guid>https://dev.to/michaeltuszynski/ai-agents-are-killing-seat-based-saas-pricing-heres-whats-replacing-it-2bk3</guid>
      <description>&lt;h1&gt;
  
  
  AI Agents Are Killing Seat-Based SaaS Pricing. Here's What's Replacing It.
&lt;/h1&gt;

&lt;p&gt;Intercom's Fin AI agent went from $1M to $100M+ ARR on one pricing move: $0.99 per resolved ticket. Not per seat, not per month. Per &lt;em&gt;outcome&lt;/em&gt;. Fin now handles 80%+ of Intercom's customer support volume and closes about a million conversations a week, and Intercom will refund up to $1 million if resolution targets aren't hit (&lt;a href="https://www.sequencehq.com/blog/how-intercom-cracked-outcome-based-pricing-with-finai" rel="noopener noreferrer"&gt;Sequence&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;That pricing model is telling you something the press releases aren't: when an agent does the work, the seat becomes fiction.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Seat Model Assumes Humans Do the Work
&lt;/h2&gt;

&lt;p&gt;Seat-based SaaS had a clean logic for twenty years. A human uses the tool. The tool's value scales with how many humans use it. Price the seat.&lt;/p&gt;

&lt;p&gt;AI agents break that logic in one step. OutSystems' 2026 State of AI Development report found 96% of organizations are using AI agents and agents are resolving 80%+ of employee service requests on average — a shift projected to cut IT service management licensing costs by up to 50% (&lt;a href="https://www.prnewswire.com/news-releases/ai-agents-force-rethink-of-saas-pricing-and-improve-customer-experiences-302734934.html" rel="noopener noreferrer"&gt;PR Newswire&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;If your ServiceNow fulfillers are closing 80% fewer tickets because Now Assist closed them first, you're going to want 80% fewer fulfiller seats. Your ServiceNow rep has noticed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Four Pricing Models Are Fighting to Replace It
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Outcome pricing.&lt;/strong&gt; Intercom's Fin charges $0.99 per resolved conversation. You pay when the customer confirms resolution or doesn't come back. Zero outcomes, zero charge. This is the model most aligned with buyer incentives — and the hardest for vendors to run. It forces the vendor's sales, CS, and product teams to actually deliver measurable value. Intercom's president put it bluntly: the model "exposed every weak link" inside the company (&lt;a href="https://gtmnow.com/how-intercom-built-the-highest-performing-ai-agent-on-the-market-using-outcome-based-pricing-with-archana-agrawal-president-at-intercom/" rel="noopener noreferrer"&gt;GTMnow&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Action/credit consumption.&lt;/strong&gt; Salesforce Agentforce charges $2 per conversation under the Conversations model, or $0.10 per standard action ($500 per 100K Flex Credits) under the Flex Credits model rolled out in late 2025 and recommended for most new deployments in 2026 (&lt;a href="https://aquivalabs.com/blog/agentforce-pricing-gets-a-long-overdue-fix-flex-credits-are-now-live/" rel="noopener noreferrer"&gt;Aquiva Labs&lt;/a&gt;). Microsoft Copilot Studio uses the same structure — $0.01 per credit or $200 per 25,000 credits pay-as-you-go via Azure (&lt;a href="https://azure.microsoft.com/en-us/pricing/details/copilot-studio/" rel="noopener noreferrer"&gt;Microsoft&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;Consumption pricing is vendor-friendly: it meters API calls, not business value, so the vendor gets paid whether the agent actually worked or not. It's also predictable enough for finance to plan around, which is why most enterprise CIOs will pick this over outcome pricing when given the choice.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Add-on uplift.&lt;/strong&gt; ServiceNow's Now Assist is sold as the Pro Plus SKU — a 50-60% uplift on your existing base tier, plus $50-$100+ per fulfiller per month, plus per-Assist token consumption with overage at $0.015-$0.04 (&lt;a href="https://www.redresscompliance.com/servicenow-now-assist-ai-pricing-guide.html" rel="noopener noreferrer"&gt;Redress Compliance&lt;/a&gt;). Now Assist's net new ACV crossed $600M in FY25 and is tracking toward $1B in FY26. The model works because the base platform is sticky — you're not going to rip out ServiceNow to avoid a 50% uplift. Expect every dominant seat-based vendor to ship this variant first.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Agent-as-employee.&lt;/strong&gt; The newest and most provocative: price the agent as a fraction of the full-time employee it replaces or augments. A $2,000/month "AI SDR" for a team that would otherwise hire a $90K human SDR. Monetizely's 2026 guide calls this the model that moves outcome pricing from support into sales, ops, and engineering (&lt;a href="https://www.getmonetizely.com/blogs/the-2026-guide-to-saas-ai-and-agentic-pricing-models" rel="noopener noreferrer"&gt;Monetizely&lt;/a&gt;). Expect "agentic enterprise license agreements" — roll-up contracts that bundle agent capacity across functions — to become the norm at the high end.&lt;/p&gt;

&lt;h2&gt;
  
  
  Microsoft Is Hedging Both Ways
&lt;/h2&gt;

&lt;p&gt;Watch what Microsoft is actually doing. Microsoft 365 Copilot still costs $18-$42.50 per user per month — pure seat pricing (&lt;a href="https://www.microsoft.com/en-us/microsoft-365-copilot/pricing" rel="noopener noreferrer"&gt;Microsoft&lt;/a&gt;). But Copilot Studio, where customers build their own agents, runs on consumption credits. Microsoft is collecting the seat revenue from customers who don't know the transition is happening and the consumption revenue from customers who do.&lt;/p&gt;

&lt;p&gt;This is the playbook. Seat revenue doesn't disappear on any vendor's P&amp;amp;L for a few years. It gets harvested slowly while the consumption line ramps. Salesforce is running the same hedge — Flex Credits for agent workloads, $125-$650 per user per month for Agentforce 1 Editions on top of whatever you're already paying for Sales Cloud and Service Cloud.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Means for Platform and Procurement Teams
&lt;/h2&gt;

&lt;p&gt;Three things you should be doing right now.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Renegotiate before the renewal, not at it.&lt;/strong&gt; If your existing seat-based vendor is deploying agents into your environment, you have leverage you won't have once they've re-based your contract around agent ACV. Vendors are booking agent revenue as net-new ACV to protect multiples. Your seat count is about to become the anchor on their growth number. Use that.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Instrument consumption from day one.&lt;/strong&gt; Enterprise AI agent deployments cost $150K-$800K to set up and $50K-$200K annually to run (&lt;a href="https://www.aimagicx.com/blog/ai-agent-market-52-billion-business-model-2026" rel="noopener noreferrer"&gt;AI Magicx&lt;/a&gt;). The per-unit economics mean nothing if you can't meter per-team, per-workflow, per-outcome. The FinOps teams that built cloud cost attribution in 2019-2022 are now building AgentOps. If you wait, your Q4 bill will surprise you.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Push vendors toward outcome pricing where you can.&lt;/strong&gt; It's the only model aligned with your interests. Not every workflow has a clean outcome metric, but customer support does, sales qualification does, L1 IT does. For those, refuse to sign consumption deals without outcome-based alternatives in the quote. The vendors who refuse are telling you something about their confidence in the agent.&lt;/p&gt;

&lt;h2&gt;
  
  
  Your TCO Goes Up Before It Goes Down
&lt;/h2&gt;

&lt;p&gt;Here's the part nobody's saying out loud: in year one of agent adoption, your SaaS bill gets bigger, not smaller.&lt;/p&gt;

&lt;p&gt;You're still paying for seats — those people haven't left yet, and you still need the UI. You're now also paying consumption credits, add-on uplifts, or outcome fees. ServiceNow Pro Plus doesn't replace Pro, it &lt;em&gt;sits on top of it&lt;/em&gt;. Agentforce doesn't replace Sales Cloud, it extends it. For 12-24 months, you're paying twice for the same work.&lt;/p&gt;

&lt;p&gt;The bill comes down only when you actually reduce seat counts — which requires workforce planning decisions most companies aren't ready to make. Gartner reports SaaS vendors are already pushing 10-20% renewal uplifts on average (&lt;a href="https://www.bettercloud.com/monitor/saas-industry/" rel="noopener noreferrer"&gt;BetterCloud&lt;/a&gt;). Add agent consumption on top of that and your TCO is up 30%+ before any optimization kicks in.&lt;/p&gt;

&lt;p&gt;The vendors know this. Their pricing models are designed to extract the uplift during the overlap period. If your FinOps practice doesn't expand into AgentOps by the end of 2026, you won't just pay more — you'll pay more without knowing why.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>saas</category>
      <category>platformengineering</category>
      <category>finops</category>
    </item>
    <item>
      <title>38% of AI Answers Are Wrong — And It's Your Prompt's Fault</title>
      <dc:creator>Michael Tuszynski</dc:creator>
      <pubDate>Tue, 14 Apr 2026 15:16:26 +0000</pubDate>
      <link>https://dev.to/michaeltuszynski/your-prompt-has-too-many-jobs-25gi</link>
      <guid>https://dev.to/michaeltuszynski/your-prompt-has-too-many-jobs-25gi</guid>
      <description>&lt;p&gt;Every week someone posts about AI hallucination like it's a mystery. It's not. A &lt;a href="https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2025.1622292/full" rel="noopener noreferrer"&gt;2025 Frontiers in AI study&lt;/a&gt; measured it: vague, multi-objective prompts hallucinate &lt;strong&gt;38.3% of the time&lt;/strong&gt;. Structured, single-focus prompts? &lt;strong&gt;18.1%&lt;/strong&gt;. That's a 20-point accuracy gap from how you write the prompt — not which model you pick.&lt;/p&gt;

&lt;p&gt;Everyone's debating GPT vs. Claude vs. Gemini. Nobody's talking about the fact that prompt structure matters more than model selection for most use cases.&lt;/p&gt;

&lt;h2&gt;
  
  
  The $0 Fix Nobody Uses
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://sqmagazine.co.uk/llm-hallucination-statistics/" rel="noopener noreferrer"&gt;Research from SQ Magazine&lt;/a&gt; breaks it down further: zero-shot prompts (no examples, no structure) hallucinate at &lt;strong&gt;34.5%&lt;/strong&gt;. Add a few examples and that drops to &lt;strong&gt;27.2%&lt;/strong&gt;. Add explicit instructions: &lt;strong&gt;24.6%&lt;/strong&gt;. Simply adding "If you're not sure, say so" cuts hallucination by another &lt;strong&gt;15%&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That last one is worth repeating. One sentence — "If you're not confident, say you don't know" — is worth more than upgrading your model tier. And it costs nothing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Multi-Task Prompts Are the Worst Offender
&lt;/h2&gt;

&lt;p&gt;"Summarize this doc, extract the key risks, and draft a response email" feels like one task. It's three. And each additional objective gives the model more room to fabricate connections between things that don't connect.&lt;/p&gt;

&lt;p&gt;Language models are next-token predictors. Single task = narrow probability distribution = the model knows where it's headed. Three tasks stacked together = triple the surface area for error. A small fabrication in the summary becomes a stated fact in the risk analysis becomes a confident assertion in the draft email.&lt;/p&gt;

&lt;p&gt;Longer, multi-part prompts increase error rates by &lt;a href="https://sqmagazine.co.uk/llm-hallucination-statistics/" rel="noopener noreferrer"&gt;roughly 10%&lt;/a&gt;. In legal contexts, hallucination rates run between &lt;strong&gt;58% and 88%&lt;/strong&gt;. That's not an AI problem. That's a prompting problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Actually Works (With Numbers)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;One prompt, one job.&lt;/strong&gt; Summarize the doc. Stop. Review it. Then extract risks from the verified summary. Then draft the email from verified risks. Three prompts, each building on confirmed output.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Constrain the output.&lt;/strong&gt; JSON, numbered lists, specific templates. &lt;a href="https://sqmagazine.co.uk/llm-hallucination-statistics/" rel="noopener noreferrer"&gt;Structured prompts cut medical AI hallucinations by 33%&lt;/a&gt;. The less room to improvise, the less it fabricates.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Give examples.&lt;/strong&gt; Zero-shot to few-shot: 34.5% → 27.2%. Two examples costs you 30 seconds and buys a 7-point accuracy gain.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Set refusal conditions.&lt;/strong&gt; "If confidence is below 70% or no evidence supports the claim, say 'insufficient data.'" You're not weakening the model. You're giving it a pressure valve so it doesn't fill gaps with fiction.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Model Isn't the Bottleneck
&lt;/h2&gt;

&lt;p&gt;The best models went from &lt;a href="https://www.aboutchromebooks.com/ai-hallucination-rates-across-different-models/" rel="noopener noreferrer"&gt;21.8% hallucination in 2021 to 0.7% in 2025&lt;/a&gt; on benchmarks. But benchmarks test clean, single-objective tasks. Real-world, multi-step workflows — the kind actual professionals run — depend more on how you ask than what you ask.&lt;/p&gt;

&lt;p&gt;You wouldn't hand a contractor one work order that says "remodel the kitchen, fix the plumbing, and repaint the exterior." You'd scope each job, inspect the work, then move on.&lt;/p&gt;

&lt;p&gt;The people getting the best results from AI already know this. Everyone else is blaming the model.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>promptengineering</category>
      <category>llm</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Building a Plugin Marketplace for AI-Native Workflows</title>
      <dc:creator>Michael Tuszynski</dc:creator>
      <pubDate>Fri, 10 Apr 2026 22:55:16 +0000</pubDate>
      <link>https://dev.to/michaeltuszynski/building-a-plugin-marketplace-for-ai-native-workflows-45lb</link>
      <guid>https://dev.to/michaeltuszynski/building-a-plugin-marketplace-for-ai-native-workflows-45lb</guid>
      <description>&lt;p&gt;Most AI coding tools ship as monoliths. One big system prompt, one set of capabilities, one-size-fits-all. That works fine for general software engineering. It falls apart the moment you need domain-specific workflows that vary by role, by team, and by engagement.&lt;/p&gt;

&lt;p&gt;I build presales systems at &lt;a href="https://www.presidio.com" rel="noopener noreferrer"&gt;Presidio&lt;/a&gt; — client research, SOW generation, meeting capture, deal operations. The kind of work where a solutions architect needs different tools than a deal desk analyst, and where loading everything into every session wastes tokens and degrades output quality.&lt;/p&gt;

&lt;p&gt;So I built a plugin marketplace for &lt;a href="https://docs.anthropic.com/en/docs/claude-code/overview" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt;. Nine modular plugins, independently installable, composable by role. Here's what I learned shipping it to a team.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Plugins Instead of One Big Workspace
&lt;/h2&gt;

&lt;p&gt;The original system was a monolith — 56 commands, 14 skills, 36 tools, all loaded into every session. It worked for me as the sole operator. The moment I tried to share it with other consultants, three problems surfaced immediately:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Context pollution.&lt;/strong&gt; A consultant doing SOW work doesn't need the meeting transcription pipeline, the competitive intel framework, or the deal management commands cluttering their context window. Every irrelevant token degrades the model's attention on the task at hand.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Onboarding friction.&lt;/strong&gt; "Clone this repo, read 200 lines of docs, configure 18 environment variables, and learn 56 commands" is not an adoption strategy. People need to install what they need and ignore what they don't.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Maintenance coupling.&lt;/strong&gt; A bug fix in the meeting recorder shouldn't require every user to pull an update that also touches their SOW pipeline. Independent versioning matters when your users are busy consultants, not developers.&lt;/p&gt;

&lt;p&gt;The fix was decomposition. Break the monolith into plugins that can be installed, updated, and removed independently.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Architecture That Emerged
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;fulcrum-plugins/
├── plugins/
│   ├── core/          # Foundation: persona, auth, OneDrive, shared rules
│   ├── intel/         # Client research pipeline
│   ├── meeting/       # Silent recording + transcription
│   ├── discovery/     # Call prep, qualification, opportunity analysis
│   ├── sow/           # SOW drafting, review, QA, redlines
│   ├── proposal/      # Solution decks and pricing workbooks
│   ├── ops/           # Daily briefing, weekly review, context switching
│   ├── engage/        # Deal management, delivery handoff
│   └── util/          # Freshness audits, triage, workspace maintenance
├── shared/            # Frameworks and templates used across plugins
└── docs/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each plugin is a self-contained directory with commands, scripts, rules, and hooks. One required dependency — &lt;code&gt;core&lt;/code&gt; — provides identity, authentication, and the shared file system. Everything else is optional.&lt;/p&gt;

&lt;p&gt;Installation is one command: &lt;code&gt;/plugin install sow@fulcrum-plugins&lt;/code&gt;. Uninstall is equally clean. No cross-plugin imports, no shared state beyond what &lt;code&gt;core&lt;/code&gt; provides.&lt;/p&gt;

&lt;h3&gt;
  
  
  Recommended Sets Over Required Bundles
&lt;/h3&gt;

&lt;p&gt;Rather than prescribing one configuration, the marketplace offers recommended sets:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Minimal (SOW-focused):&lt;/strong&gt; core + sow — for consultants who only write statements of work&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Meeting-heavy:&lt;/strong&gt; core + meeting + discovery — for consultants running client calls all day&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Full presales:&lt;/strong&gt; all 9 plugins — for people like me who touch every phase&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This respects how people actually work. Nobody uses every tool every day. The consultant who installs just &lt;code&gt;core + sow&lt;/code&gt; gets a focused, fast experience. The one who installs everything gets the full operating system. Both are first-class citizens.&lt;/p&gt;

&lt;h2&gt;
  
  
  Namespace Isolation Matters More Than You Think
&lt;/h2&gt;

&lt;p&gt;When I first decomposed the monolith, every plugin had commands like &lt;code&gt;/draft&lt;/code&gt;, &lt;code&gt;/review&lt;/code&gt;, &lt;code&gt;/status&lt;/code&gt;. Collisions everywhere. The fix was &lt;a href="https://docs.anthropic.com/en/docs/claude-code/plugins" rel="noopener noreferrer"&gt;namespace syntax&lt;/a&gt;: &lt;code&gt;/sow:draft&lt;/code&gt;, &lt;code&gt;/intel:company-intel&lt;/code&gt;, &lt;code&gt;/ops:weekly-review&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This felt verbose at first. It turned out to be the single most important design decision. Namespaces do three things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Eliminate ambiguity.&lt;/strong&gt; &lt;code&gt;/draft&lt;/code&gt; could mean a SOW draft, a proposal draft, or an email draft. &lt;code&gt;/sow:draft&lt;/code&gt; is unambiguous.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enable discovery.&lt;/strong&gt; A new user can type &lt;code&gt;/sow:&lt;/code&gt; and see every SOW command without memorizing a list. The namespace IS the documentation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Preserve plugin independence.&lt;/strong&gt; Two plugin authors can independently create a &lt;code&gt;status&lt;/code&gt; command without coordinating. &lt;code&gt;/engage:status&lt;/code&gt; and &lt;code&gt;/ops:status&lt;/code&gt; coexist without conflict.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I renamed all plugins in v1.1 specifically to get shorter prefixes — &lt;code&gt;sow-pipeline&lt;/code&gt; became &lt;code&gt;sow&lt;/code&gt;, &lt;code&gt;research-intel&lt;/code&gt; became &lt;code&gt;intel&lt;/code&gt;, &lt;code&gt;engagement-lifecycle&lt;/code&gt; became &lt;code&gt;engage&lt;/code&gt;. Every keystroke matters when you're typing these dozens of times a day.&lt;/p&gt;

&lt;h2&gt;
  
  
  Shared Lessons: Institutional Memory Without a Database
&lt;/h2&gt;

&lt;p&gt;The feature I'm most proud of has zero lines of application code. It's a folder on OneDrive.&lt;/p&gt;

&lt;p&gt;When a consultant learns something the hard way — a Salesforce field that's locked to AMs, a client's preferred meeting format, a compliance requirement that isn't documented anywhere — they drop a markdown file into a shared folder:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;shared-lessons/jane-doe/2026-04-09-sfdc-stage-lock.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At session start, a hook script scans all lesson files from the team, deduplicates by title, and renders them into a cached rule that loads into every session. The team's collective knowledge grows without anyone maintaining a wiki, attending a knowledge-sharing meeting, or filing a ticket.&lt;/p&gt;

&lt;p&gt;The constraints are deliberate: one lesson per file, no client-confidential data, attribution required, dates required (stale lessons get pruned). The format is simple enough that non-developers can contribute. The mechanism — a OneDrive folder sync'd through &lt;a href="https://www.microsoft.com/en-us/microsoft-teams/group-chat-software" rel="noopener noreferrer"&gt;Microsoft Teams&lt;/a&gt; — requires no new tools or logins.&lt;/p&gt;

&lt;p&gt;This is the pattern I keep returning to: &lt;strong&gt;use infrastructure people already have.&lt;/strong&gt; OneDrive, git, markdown files. Not a custom database, not a new SaaS tool, not an API integration. The best systems are the ones that disappear into workflows people already follow.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Didn't Work
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Plugin dependencies.&lt;/strong&gt; The original design had plugins declaring dependencies on each other — &lt;code&gt;sow&lt;/code&gt; depends on &lt;code&gt;intel&lt;/code&gt;, &lt;code&gt;engage&lt;/code&gt; depends on &lt;code&gt;discovery&lt;/code&gt;. In practice, this created install-order headaches and made it harder to reason about what was loaded. The v1.3 architecture dropped all inter-plugin dependencies. Each plugin is fully self-contained. If &lt;code&gt;sow&lt;/code&gt; needs client context, it reads the client's context file directly — it doesn't import a function from &lt;code&gt;intel&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Automatic plugin updates.&lt;/strong&gt; The original design auto-pulled updates on session start. In practice, this broke people mid-workflow when a command signature changed. The fix was making updates explicit — &lt;code&gt;/core:update&lt;/code&gt; when you're ready, with a statusline indicator showing when updates are available. Users update on their own schedule, not yours.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Granular versioning.&lt;/strong&gt; I initially tried to version each plugin independently. After three days of version coordination across nine plugins, I switched to a single marketplace version. All plugins share the version in &lt;code&gt;VERSION&lt;/code&gt;. SemVer at the marketplace level, not the plugin level. Simpler to reason about, simpler to communicate ("update to 1.3.1"), simpler to tag.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Adoption Signal
&lt;/h2&gt;

&lt;p&gt;The most telling metric isn't installs — it's which recommended set people choose. When most of your users install the minimal set and gradually add plugins over weeks, you've built something that earns trust incrementally. When they install everything on day one and complain it's overwhelming, you've just shipped a monolith with extra steps.&lt;/p&gt;

&lt;p&gt;So far, the pattern is healthy. New consultants start with &lt;code&gt;core + sow&lt;/code&gt; (the job requirement), then add &lt;code&gt;discovery&lt;/code&gt; after their first client call, then &lt;code&gt;meeting&lt;/code&gt; after they see someone else's AI-generated meeting notes. Pull, not push.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Plugin architectures aren't new. What's new is applying them to AI agent context — treating the model's knowledge, rules, and capabilities as composable modules rather than a static system prompt. The tools exist today in &lt;a href="https://docs.anthropic.com/en/docs/claude-code/plugins" rel="noopener noreferrer"&gt;Claude Code's plugin system&lt;/a&gt;. The hard part isn't the architecture. It's the discipline to decompose.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claudecode</category>
      <category>plugins</category>
      <category>devtools</category>
    </item>
    <item>
      <title>Context Engineering Is the New Prompt Engineering</title>
      <dc:creator>Michael Tuszynski</dc:creator>
      <pubDate>Wed, 08 Apr 2026 22:06:58 +0000</pubDate>
      <link>https://dev.to/michaeltuszynski/context-engineering-is-the-new-prompt-engineering-2231</link>
      <guid>https://dev.to/michaeltuszynski/context-engineering-is-the-new-prompt-engineering-2231</guid>
      <description>&lt;p&gt;Everyone's writing better prompts. Few are building better context.&lt;/p&gt;

&lt;p&gt;That's the gap. Prompt engineering treats AI like a search box — craft the perfect query, get the perfect answer. Context engineering treats AI like a new team member — give them the right docs, the right access, and a clear understanding of how work actually gets done. As Andrej Karpathy &lt;a href="https://x.com/kaborepathy/status/1937902191498797514" rel="noopener noreferrer"&gt;put it&lt;/a&gt;, the hottest new programming language is English — but the program isn't the prompt. It's the context surrounding it.&lt;/p&gt;

&lt;p&gt;I've spent the last six months building AI-native workflows at &lt;a href="https://www.presidio.com" rel="noopener noreferrer"&gt;Presidio&lt;/a&gt;, where I'm a Principal Solutions Architect. Not chatbots. Not demos. Production systems where Claude Code agents run real presales operations — client research, proposal generation, meeting analysis, deal tracking. The kind of work that used to live in someone's head and a dozen browser tabs.&lt;/p&gt;

&lt;p&gt;Here's what I learned about making AI actually useful.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prompts Are Requests. Skills Are Frameworks.
&lt;/h2&gt;

&lt;p&gt;The first mistake everyone makes: stuffing domain knowledge into prompts. "You are an expert in enterprise sales. When analyzing a deal, consider these 47 factors..."&lt;/p&gt;

&lt;p&gt;That breaks immediately. Prompts are ephemeral — they disappear when the conversation ends. Domain knowledge needs to persist across sessions, get version-controlled, and evolve as you learn what works.&lt;/p&gt;

&lt;p&gt;The pattern that works: &lt;strong&gt;skills files&lt;/strong&gt; — what &lt;a href="https://docs.anthropic.com/en/docs/claude-code/skills" rel="noopener noreferrer"&gt;Claude Code's plugin architecture&lt;/a&gt; calls reusable domain knowledge. Markdown documents that encode decision frameworks, not instructions. A skill isn't "analyze this deal." A skill is the 5-gate qualification framework your team actually uses, written as structured markdown with decision criteria, red flags, and exit conditions. The AI reads it and applies it. You update the framework once, every future session uses the new version.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;.claude/skills/
├── qualification-framework.md    # Decision gates with criteria
├── pricing-strategy.md           # Margin rules, discount authority
├── sow-review-rubric.md          # Evaluation checklist
└── competitive-positioning.md    # Differentiators by competitor
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Skills are reusable. Prompts are disposable. That distinction matters more than any prompting technique.&lt;/p&gt;

&lt;h2&gt;
  
  
  Context-as-Code: Version Control Your AI's Brain
&lt;/h2&gt;

&lt;p&gt;Every client engagement in my system has a single markdown file that serves as the AI's working memory for that account. Contacts, scope decisions, meeting history, action items, competitive intel — one file, version-controlled in git.&lt;/p&gt;

&lt;p&gt;Why markdown? Three reasons:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;It's grep-searchable.&lt;/strong&gt; When an agent needs to find every mention of a specific technology across all accounts, &lt;code&gt;grep -r "Kubernetes" clients/&lt;/code&gt; works instantly. Try that with a vector database.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It diffs cleanly.&lt;/strong&gt; Git shows you exactly what changed in the AI's understanding of an account. Who updated it, when, and why.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It's token-efficient.&lt;/strong&gt; Structured markdown compresses well in context windows. A 200-line context file gives an agent everything it needs to operate on an account without RAG retrieval latency.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The anti-pattern is treating AI memory as a black box — embeddings you can't inspect, vector stores you can't diff, context you can't version. If you can't &lt;code&gt;git blame&lt;/code&gt; your AI's knowledge, you don't control it.&lt;/p&gt;

&lt;h2&gt;
  
  
  One God-Agent Is a Trap
&lt;/h2&gt;

&lt;p&gt;I started with one agent that did everything. It was mediocre at all of it.&lt;/p&gt;

&lt;p&gt;The fix was domain specialization. Four agents, each with a clear role: one handles discovery and qualification, one handles technical design and proposals, one handles deal operations and pricing, one handles workspace maintenance. Each agent has its own tools, its own context, and a defined handoff protocol for passing work to another agent.&lt;/p&gt;

&lt;p&gt;This mirrors how real teams work. Your sales engineer doesn't do contract redlines. Your deal desk doesn't design architecture. Specialization isn't just about accuracy — it's about &lt;strong&gt;cost&lt;/strong&gt;. A maintenance agent running on Haiku costs 95% less than an Opus agent doing the same file cleanup.&lt;/p&gt;

&lt;p&gt;Model routing by task complexity is the easiest money you'll save in AI:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task Type&lt;/th&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;File organization, validation&lt;/td&gt;
&lt;td&gt;Haiku&lt;/td&gt;
&lt;td&gt;Structured, predictable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Research, summarization&lt;/td&gt;
&lt;td&gt;Sonnet&lt;/td&gt;
&lt;td&gt;Good reasoning, fast&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Strategy, complex writing&lt;/td&gt;
&lt;td&gt;Opus&lt;/td&gt;
&lt;td&gt;Needs deep reasoning&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Mistakes Must Become Infrastructure
&lt;/h2&gt;

&lt;p&gt;Every production AI system has failure modes you won't predict. The question is whether failures teach the system or just annoy you.&lt;/p&gt;

&lt;p&gt;My approach: every time an agent makes a mistake that I have to correct, it becomes a numbered rule in the system's configuration file. Not a mental note. Not a prompt tweak. A permanent, version-controlled rule that every future session reads on startup.&lt;/p&gt;

&lt;p&gt;After six months, I have 39 of these. Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"State files in .gitignore can vanish silently during merges — default to safe fallback values"&lt;/li&gt;
&lt;li&gt;"Never infer what a client said in a meeting — only quote from the transcript or flag it as an assumption"&lt;/li&gt;
&lt;li&gt;"Contract reverts produce the same error on every RPC — don't retry, they're non-recoverable"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These aren't prompt engineering. They're institutional memory encoded as code. The system gets smarter every time it fails, without retraining, fine-tuning, or hoping the model "remembers."&lt;/p&gt;

&lt;h2&gt;
  
  
  Don't Dump Everything Into Context
&lt;/h2&gt;

&lt;p&gt;The biggest performance killer in AI systems isn't the model — it's context pollution. Loading every piece of knowledge into every session degrades output quality and burns tokens.&lt;/p&gt;

&lt;p&gt;The pattern that works: &lt;strong&gt;modular context loading&lt;/strong&gt;. My system has 14 skills, 56 commands, and context files for dozens of accounts. But any given session loads only what's relevant — the specific client context, the specific workflow skills, the specific agent role. Everything else stays on disk until needed.&lt;/p&gt;

&lt;p&gt;Think of it like imports in code. You wouldn't &lt;code&gt;import *&lt;/code&gt; from every module in your codebase. Don't do it with AI context either.&lt;/p&gt;

&lt;p&gt;This also means your context files need to be &lt;strong&gt;current state, not changelogs&lt;/strong&gt;. A context file that accumulates three months of historical notes becomes noise. Describe what the system &lt;em&gt;is right now&lt;/em&gt; in 150 scannable lines. Put the changelog somewhere else.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Stack That Actually Works
&lt;/h2&gt;

&lt;p&gt;After building this across multiple enterprise engagements, here's the architecture I'd recommend for anyone building AI-native workflows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Structured context files&lt;/strong&gt; (markdown, git-tracked) over vector databases for domain knowledge&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Skills&lt;/strong&gt; (persistent frameworks) over prompts (ephemeral instructions) for domain expertise
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Specialized agents&lt;/strong&gt; with handoff protocols over one general-purpose agent&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost-aware model routing&lt;/strong&gt; — match model capability to task complexity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Error-to-rule pipelines&lt;/strong&gt; — every failure becomes a permanent system improvement&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Modular loading&lt;/strong&gt; — only load context relevant to the current task&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;None of this requires a framework. No &lt;a href="https://github.com/langchain-ai/langchain" rel="noopener noreferrer"&gt;LangChain&lt;/a&gt;, no LlamaIndex, no orchestration layer. It's markdown files, a CLI, and good engineering discipline. The AI does the reasoning. You do the architecture.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;The tools for building AI-native workflows exist today. The bottleneck isn't model capability — it's context architecture. Start treating your AI's knowledge like code: structured, versioned, reviewed, and intentionally loaded. That's the difference between a chatbot and a system.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>contextengineering</category>
      <category>claudecode</category>
      <category>devtools</category>
    </item>
    <item>
      <title>AWS Frontier Agents: What $50/Hour Pen Testing and $30/Hour SRE Means for Platform Teams</title>
      <dc:creator>Michael Tuszynski</dc:creator>
      <pubDate>Mon, 06 Apr 2026 00:57:56 +0000</pubDate>
      <link>https://dev.to/michaeltuszynski/aws-frontier-agents-what-50hour-pen-testing-and-30hour-sre-means-for-platform-teams-5jk</link>
      <guid>https://dev.to/michaeltuszynski/aws-frontier-agents-what-50hour-pen-testing-and-30hour-sre-means-for-platform-teams-5jk</guid>
      <description>&lt;p&gt;AWS just launched two autonomous AI agents — Security Agent and DevOps Agent — and they're both generally available now. These aren't chatbots with polished wrappers. They're persistent, autonomous systems that run for hours or days without human oversight, doing work that previously required dedicated teams.&lt;/p&gt;

&lt;p&gt;Here's what caught my attention, and why platform engineers should be paying close attention.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two Agents, Two Big Problems
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://aws.amazon.com/security-agent/" rel="noopener noreferrer"&gt;AWS Security Agent&lt;/a&gt;&lt;/strong&gt; handles penetration testing. Not the "run a scanner and hand you a PDF" kind — it ingests your source code, architecture diagrams, and documentation, then operates like a human pen tester. It identifies vulnerabilities, builds attack chains, and validates that findings are real exploitable risks. &lt;a href="https://aws.amazon.com/blogs/machine-learning/aws-launches-frontier-agents-for-security-testing-and-cloud-operations/" rel="noopener noreferrer"&gt;Bamboo Health reported&lt;/a&gt; it "surfaced findings that no other tool has uncovered." HENNGE K.K. cut their testing duration by over 90%.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://aws.amazon.com/devops-agent/" rel="noopener noreferrer"&gt;AWS DevOps Agent&lt;/a&gt;&lt;/strong&gt; handles incident response and operational tasks. It correlates telemetry, code, and deployment data across your stack — AWS, Azure, hybrid, on-prem — and integrates with the observability tools you already use: CloudWatch, Datadog, Dynatrace, New Relic, Splunk, Grafana. Western Governor's University cut mean time to resolution from two hours to 28 minutes during a production incident by pinpointing a Lambda configuration issue that had been buried in undiscovered internal docs.&lt;/p&gt;

&lt;p&gt;The preview numbers are worth noting: up to 75% lower MTTR, 80% faster investigations, and 94% root cause accuracy.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Pricing Makes the Strategy Obvious
&lt;/h2&gt;

&lt;p&gt;This is where it gets interesting.&lt;/p&gt;

&lt;p&gt;DevOps Agent costs &lt;a href="https://aws.amazon.com/devops-agent/pricing/" rel="noopener noreferrer"&gt;$0.0083 per agent-second&lt;/a&gt; — roughly $29.88 per hour. AWS's own pricing examples show a small team running 10 investigations per month pays about $40. An enterprise running 500 incidents per month pays around $2,300. For context, a single on-call SRE costs you $150k-$200k/year fully loaded.&lt;/p&gt;

&lt;p&gt;Security Agent runs at &lt;a href="https://aws.amazon.com/security-agent/pricing/" rel="noopener noreferrer"&gt;$50 per task-hour&lt;/a&gt;. A small API test costs about $173. A full application pen test runs around $1,200. Compare that to external pen testing firms charging $15k-$50k per engagement, with weeks of lead time and limited scope.&lt;/p&gt;

&lt;p&gt;Both agents include a 2-month free trial. AWS is clearly betting on adoption velocity — get teams hooked on the speed and economics, then make it sticky through integration depth.&lt;/p&gt;

&lt;p&gt;The DevOps Agent pricing also ties into existing AWS Support plans. Enterprise Support customers get 75% of their support charges back as DevOps Agent credits. Unified Operations customers get 100%. AWS is effectively saying: your support spend now buys you autonomous operations capacity.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Actually Means for Platform Teams
&lt;/h2&gt;

&lt;p&gt;AWS calls these "frontier agents" — autonomous systems that work independently, scale massively across concurrent tasks, and run persistently. The framing matters because it signals a product category, not a one-off feature.&lt;/p&gt;

&lt;p&gt;Three implications stand out:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Security becomes continuous, not periodic.&lt;/strong&gt; Most organizations pen test their top 5-10 applications once or twice a year because of cost and staffing constraints. At $50/task-hour, you can afford to test everything, continuously. The security posture shift from "we tested our critical apps last quarter" to "every app gets tested every sprint" is significant.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Incident response gets a tireless first responder.&lt;/strong&gt; The DevOps Agent doesn't replace your SRE team — it augments the 3am on-call rotation. It can start investigating before a human even picks up the page, correlating signals across your entire stack. By the time your engineer opens their laptop, the agent has already identified probable root cause with 94% accuracy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The multicloud angle is deliberate.&lt;/strong&gt; AWS built the DevOps Agent to work with Azure DevOps, GitHub, GitLab, and non-AWS observability tools. This isn't altruism — it's a land-and-expand play. Once your operational intelligence lives in AWS, migrating workloads away gets harder. But for teams running hybrid environments today, having a single agent that understands your entire topology is genuinely useful.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bigger Picture
&lt;/h2&gt;

&lt;p&gt;Google and Microsoft have their own agentic plays, but AWS shipping two production-ready autonomous agents with per-second billing and free trials is a concrete move. The fact that both agents work with tools like &lt;a href="https://aws.amazon.com/blogs/machine-learning/aws-launches-frontier-agents-for-security-testing-and-cloud-operations/" rel="noopener noreferrer"&gt;Claude Code and Kiro&lt;/a&gt; for generating validated fixes signals that AWS sees these agents as part of a broader autonomous development loop — not isolated point solutions.&lt;/p&gt;

&lt;p&gt;For platform engineering teams, the takeaway is practical: evaluate these agents against your current pen testing costs and incident response metrics. The economics alone justify a proof of concept. The free trial removes any excuse not to try.&lt;/p&gt;

&lt;p&gt;The real question isn't whether AI agents will handle DevOps and security tasks. It's how fast your organization adapts its processes, roles, and trust models to work alongside them.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>aiagents</category>
      <category>devops</category>
      <category>security</category>
    </item>
    <item>
      <title>The systems behind enterprise AI adoption success - IBM</title>
      <dc:creator>Michael Tuszynski</dc:creator>
      <pubDate>Sat, 28 Mar 2026 17:10:20 +0000</pubDate>
      <link>https://dev.to/michaeltuszynski/the-systems-behind-enterprise-ai-adoption-success-ibm-53n0</link>
      <guid>https://dev.to/michaeltuszynski/the-systems-behind-enterprise-ai-adoption-success-ibm-53n0</guid>
      <description>&lt;h1&gt;
  
  
  Everyone's Buying GPUs. Almost Nobody's Ready to Feed Them.
&lt;/h1&gt;

&lt;p&gt;The enterprise AI conversation has a blind spot the size of a data center. Every budget meeting I've sat in over the past 18 months has the same shape: GPU allocation gets 70% of the discussion time, model selection gets 20%, and the data infrastructure that actually feeds those models gets whatever's left over. Usually about ten minutes and a vague reference to "we'll figure out storage later."&lt;/p&gt;

&lt;p&gt;This is why most enterprise AI deployments stall after the proof of concept.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bottleneck Nobody Budgets For
&lt;/h2&gt;

&lt;p&gt;Here's what happens in practice. A team spins up a promising AI workload — retrieval-augmented generation, a fine-tuning pipeline, an inference service. It works great on a curated dataset in a dev environment. Then they try to run it against production data at scale and everything falls apart. Not because the model is wrong, but because the storage layer can't deliver data fast enough, the pipeline can't unify sources across hybrid environments, and nobody planned for the I/O characteristics of AI workloads.&lt;/p&gt;

&lt;p&gt;AI training and inference workloads have fundamentally different storage profiles than traditional enterprise applications. Training jobs need sustained sequential throughput across massive datasets. Inference needs low-latency random reads. Fine-tuning needs both, sometimes simultaneously. Your SAN that runs ERP just fine will choke on a distributed training job that's trying to saturate eight GPUs.&lt;/p&gt;

&lt;p&gt;IBM's recent framing of &lt;a href="https://www.ibm.com/think/insights/systems-behind-enterprise-ai-adoption-success" rel="noopener noreferrer"&gt;AI-ready infrastructure&lt;/a&gt; gets this right: the systems layer — storage, compute fabric, automation — is where enterprise AI succeeds or dies. Not in the model layer.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Data Gravity Problem
&lt;/h2&gt;

&lt;p&gt;The reason storage matters so much for AI isn't just throughput. It's data gravity.&lt;/p&gt;

&lt;p&gt;Enterprise data doesn't live in one place. It's spread across on-prem databases, cloud object stores, SaaS platforms, edge devices, and that one team's PostgreSQL instance that nobody wants to touch. &lt;a href="https://www.ibm.com/think/topics/enterprise-ai" rel="noopener noreferrer"&gt;IBM defines enterprise AI&lt;/a&gt; as the integration of AI across large organizations — but integration implies the data is accessible. In most companies, it isn't. Not in any unified, performant way.&lt;/p&gt;

&lt;p&gt;This creates a cascading failure. Your RAG pipeline needs product data from SAP, customer interactions from Salesforce, and technical documentation from Confluence. Each source has different access patterns, different latency profiles, different security boundaries. Stitching them together with API calls and batch ETL jobs introduces hours of lag and creates brittle pipelines that break every time someone changes a schema.&lt;/p&gt;

&lt;p&gt;The companies I've seen succeed at enterprise AI solve this problem first. They build a unified storage layer that can serve multiple AI workloads without requiring six different integration patterns. IBM's approach with Storage Fusion and FlashSystem targets exactly this — &lt;a href="https://www.ibm.com/think/insights/systems-behind-enterprise-ai-adoption-success" rel="noopener noreferrer"&gt;high-performance, unified storage&lt;/a&gt; that can handle the mixed I/O profiles of AI workloads across hybrid environments. Whether you're on their stack or not, the architectural principle holds: if your AI workloads can't access unified data at the speed they need it, no amount of GPU spend will fix your pipeline.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hybrid Cloud Is the Reality, Not the Exception
&lt;/h2&gt;

&lt;p&gt;There's still a persistent fantasy in some planning meetings that AI workloads will live entirely in one public cloud. Maybe someday. Right now, for regulated industries, for companies with significant on-prem investments, and for anyone who's done the math on data egress costs, hybrid is the reality.&lt;/p&gt;

&lt;p&gt;And hybrid AI infrastructure is hard. You need consistent orchestration across environments. You need storage tiering that can move hot data close to compute without manual intervention. You need security and governance that doesn't collapse the moment data crosses a network boundary.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.ibm.com/think/insights/ai-adoption-challenges" rel="noopener noreferrer"&gt;IBM identifies inadequate infrastructure as one of the top five AI adoption challenges&lt;/a&gt; — and in my experience, "inadequate" usually means "designed for a different era." The infrastructure that runs your web applications, your CI/CD pipelines, your traditional analytics workloads — it wasn't built for the throughput patterns, the data volumes, or the operational demands of production AI.&lt;/p&gt;

&lt;p&gt;This isn't a rip-and-replace argument. Nobody's going to throw out their storage infrastructure overnight. But you need a plan for how your existing infrastructure evolves to support AI workloads, and that plan needs to happen before you commit to production deployments.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Actually Works: Three Patterns From the Field
&lt;/h2&gt;

&lt;p&gt;After spending time with organizations that have moved past the POC phase into production AI, I see three common patterns:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Storage-first capacity planning.&lt;/strong&gt; Successful teams model their data pipeline throughput requirements before they size GPU clusters. They ask: "How fast can we feed data to training jobs?" and "What's our p99 latency for inference-time retrieval?" If the answers don't match the model's appetite, they fix storage before buying more compute.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Unified data access across environments.&lt;/strong&gt; Whether it's IBM Storage Fusion, a well-architected MinIO deployment, or a managed cloud storage layer with on-prem caching, the pattern is the same: AI workloads get a single namespace to read from, regardless of where the source data physically lives. This eliminates the integration tax that kills most pipelines.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Automation of the data lifecycle.&lt;/strong&gt; Production AI generates enormous amounts of intermediate data — checkpoints, embeddings, feature stores, evaluation datasets. Teams that automate tiering, retention, and cleanup avoid the "we ran out of storage on a Friday night" incident that's practically a rite of passage.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Uncomfortable Math
&lt;/h2&gt;

&lt;p&gt;Here's a rough calculation that sobers up most planning conversations. A mid-size enterprise running a fine-tuning pipeline on proprietary data with a 70B parameter model needs approximately 500TB of accessible, high-performance storage just for the training data, checkpoints, and model artifacts. That's before you add your RAG corpus, your vector store, and your evaluation datasets.&lt;/p&gt;

&lt;p&gt;Now multiply that by the number of AI initiatives in your roadmap. Most enterprises I talk to have between five and fifteen active AI projects. The storage footprint adds up fast, and it needs to perform — not just exist.&lt;/p&gt;

&lt;p&gt;The GPU shortage got all the headlines in 2024. The storage and data infrastructure gap is the quieter crisis that will define which companies actually ship production AI in 2026.&lt;/p&gt;

&lt;h2&gt;
  
  
  What CTOs Should Do Next Week
&lt;/h2&gt;

&lt;p&gt;Stop treating infrastructure as a downstream consequence of model selection. Flip it around.&lt;/p&gt;

&lt;p&gt;Audit your current storage throughput against the I/O demands of your planned AI workloads. Map where your training data lives and how many network hops separate it from your compute. Calculate the real cost of your data integration layer — not just the cloud bill, but the engineering hours spent maintaining brittle pipelines.&lt;/p&gt;

&lt;p&gt;Then have an honest conversation about whether your infrastructure roadmap matches your AI ambitions. If there's a gap — and there almost certainly is — close it before you scale your GPU footprint. The fastest accelerator in the world is useless if it's starving for data.&lt;/p&gt;

&lt;p&gt;The companies that figure this out won't just run AI. They'll run AI that actually works in production, at scale, without the 2 AM pages. That's a meaningful competitive advantage — and it starts with the infrastructure layer that nobody wants to talk about at the budget meeting.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>dataengineering</category>
      <category>systems</category>
    </item>
    <item>
      <title>Why Enterprise AI Infrastructure is Going Hybrid – and Geographic</title>
      <dc:creator>Michael Tuszynski</dc:creator>
      <pubDate>Thu, 26 Mar 2026 04:39:27 +0000</pubDate>
      <link>https://dev.to/michaeltuszynski/why-enterprise-ai-infrastructure-is-going-hybrid-and-geographic-1dba</link>
      <guid>https://dev.to/michaeltuszynski/why-enterprise-ai-infrastructure-is-going-hybrid-and-geographic-1dba</guid>
      <description>&lt;h1&gt;
  
  
  The Cloud Repatriation Nobody Expected: Why Enterprise AI Is Pulling Compute Back from the Cloud
&lt;/h1&gt;

&lt;p&gt;The original pitch for cloud computing was simple: stop buying servers, rent someone else's. For most workloads over the past fifteen years, that trade worked. But AI infrastructure has rewritten the economics, and enterprises are responding by doing something few predicted — they're moving compute &lt;em&gt;closer&lt;/em&gt; to the data, not further away.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.databank.com/resources/blogs/why-enterprise-ai-infrastructure-is-going-hybrid-and-geographic/" rel="noopener noreferrer"&gt;A recent DataBank survey&lt;/a&gt; found that 76% of enterprises plan geographic expansion of their AI infrastructure, while 53% are actively adding colocation to their deployment strategies. This isn't a minor adjustment. It's a structural shift in how organizations think about where AI workloads should run.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Economics Changed Before the Strategy Did
&lt;/h2&gt;

&lt;p&gt;Running inference on a large language model in a hyperscaler region costs real money. Not "line item you can bury in OpEx" money — more like "the CFO is asking questions in the quarterly review" money. GPU instance pricing on AWS, Azure, and GCP has remained stubbornly high because demand outstrips supply, and the cloud providers know it.&lt;/p&gt;

&lt;p&gt;The math gets worse when you factor in data gravity. Most enterprises generate data in dozens of locations — retail stores, manufacturing plants, regional offices, edge devices. Shipping all that data to us-east-1 for processing, then shipping results back, creates latency and egress costs that compound as AI adoption scales.&lt;/p&gt;

&lt;p&gt;Colocation flips this equation. You place GPU-dense compute in facilities close to where data originates, connect to cloud services where they make sense (object storage, managed databases, identity), and keep the expensive part — inference and fine-tuning — on hardware you control or lease at predictable rates.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why "Cloud-Smart" Beats "Cloud-First"
&lt;/h2&gt;

&lt;p&gt;The industry is moving toward what &lt;a href="https://seekingalpha.com/article/4843221-world-of-enterprise-ai-turning-hybrid" rel="noopener noreferrer"&gt;Seeking Alpha describes as a "cloud-smart" strategy&lt;/a&gt; — using public cloud, private cloud, and edge computing based on the workload profile rather than defaulting to one deployment model for everything.&lt;/p&gt;

&lt;p&gt;This makes sense when you break down what AI workloads actually need:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Training&lt;/strong&gt; still belongs in the cloud for most organizations. You need massive, bursty GPU capacity for weeks or months, then nothing. Buying that hardware outright is a terrible investment unless you're running training continuously. Hyperscaler reserved instances or on-demand capacity work fine here.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Inference&lt;/strong&gt; is the opposite profile. It's steady-state, latency-sensitive, and runs 24/7. The cost-per-token adds up fast at scale. Running inference on colocated or on-premises hardware — especially with purpose-built accelerators — can cut costs 40-60% compared to cloud GPU instances, depending on utilization rates.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fine-tuning&lt;/strong&gt; sits in the middle. You need GPU capacity for days, not months, and the data involved is often sensitive enough that you don't want it leaving your network. A colocated setup with good connectivity to your data sources handles this well.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Geography Problem Nobody Planned For
&lt;/h2&gt;

&lt;p&gt;Data sovereignty and residency requirements are accelerating the geographic distribution of AI infrastructure in ways that pure cloud strategies can't easily accommodate.&lt;/p&gt;

&lt;p&gt;The EU's AI Act imposes requirements on where and how AI systems process data. Healthcare organizations in the US deal with HIPAA locality requirements. Financial services firms face data residency rules that vary by jurisdiction. When your AI model needs to process customer data from Germany, running inference in a Virginia data center creates compliance headaches that no amount of architectural cleverness fully solves.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.databank.com/resources/blogs/why-enterprise-ai-infrastructure-is-going-hybrid-and-geographic/" rel="noopener noreferrer"&gt;Enterprises are responding by deploying AI infrastructure across multiple geographies&lt;/a&gt; — not because they want the operational complexity, but because regulators and customers demand it. The 76% planning geographic expansion aren't chasing some multicloud vision. They're meeting regulatory reality.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Edge Dimension
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.techaheadcorp.com/blog/why-modern-enterprises-need-hybrid-edge-cloud-ai/" rel="noopener noreferrer"&gt;Hybrid edge-cloud architectures&lt;/a&gt; add another layer. Manufacturing plants running quality inspection models can't tolerate 200ms round-trip latency to a cloud region. Autonomous systems need inference at the point of action. Retail environments process customer interactions in real time.&lt;/p&gt;

&lt;p&gt;These use cases demand on-site or near-site compute with cloud connectivity for model updates, monitoring, and periodic retraining. The architecture looks less like "cloud with edge caching" and more like "distributed compute with cloud coordination." The control plane lives in the cloud. The data plane runs where the data lives.&lt;/p&gt;

&lt;p&gt;This is a harder architecture to build and operate than a cloud-native deployment. It requires teams who understand networking, hardware lifecycle management, and distributed systems — skills that many organizations let atrophy during the cloud migration years.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Means for Infrastructure Teams
&lt;/h2&gt;

&lt;p&gt;If you're an infrastructure leader planning AI capacity for the next 2-3 years, here's the framework I'd use:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Audit your inference costs first.&lt;/strong&gt; Most organizations are surprised by how much they're spending on cloud GPU instances for inference once they aggregate across teams and projects. This number is your baseline for a hybrid business case.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Map data gravity.&lt;/strong&gt; Where does your training data originate? Where do inference requests come from? Where do results need to arrive? If the answer to all three is "the same cloud region," stay in the cloud. If it's "twelve different locations across three countries," you need a distributed strategy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Don't build a GPU data center.&lt;/strong&gt; Colocation with GPU leasing gives you the economics of owned hardware without the capital expenditure and refresh cycles. Companies like DataBank, Equinix, and CoreWeave are building exactly this model — dense GPU compute in colocation facilities with direct cloud interconnects.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Plan for heterogeneous accelerators.&lt;/strong&gt; NVIDIA's dominance in training is real, but inference has viable alternatives — AMD Instinct, Intel Gaudi, AWS Inferentia, Google TPUs. A hybrid strategy lets you match accelerators to workload profiles instead of paying the NVIDIA tax on everything.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Invest in platform engineering.&lt;/strong&gt; Hybrid AI infrastructure without a solid platform layer becomes an operational nightmare. You need consistent deployment pipelines, observability, and model lifecycle management that works across cloud regions, colo facilities, and edge locations. Kubernetes helps here, but it's the starting point, not the whole answer.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Uncomfortable Reality
&lt;/h2&gt;

&lt;p&gt;Going hybrid is operationally harder than going all-in on a single cloud provider. Anyone who tells you otherwise is selling colocation space. You'll manage more vendor relationships, more network paths, more failure modes.&lt;/p&gt;

&lt;p&gt;But the economics and the regulatory environment have shifted enough that "just put it all in AWS" is no longer a defensible strategy for AI-heavy workloads. The organizations figuring out hybrid now — while GPU supply is still constrained and cloud pricing remains elevated — will have a meaningful cost advantage over those who wait.&lt;/p&gt;

&lt;p&gt;The cloud isn't going away. It's just no longer the default answer for every AI workload. And the sooner infrastructure teams internalize that distinction, the better positioned they'll be when AI spending goes from "experimental budget" to "largest line item on the infrastructure bill."&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>cloud</category>
      <category>news</category>
    </item>
    <item>
      <title>Enterprise AI has an 80% failure rate. The models aren't the problem. What is?</title>
      <dc:creator>Michael Tuszynski</dc:creator>
      <pubDate>Sat, 21 Mar 2026 18:27:58 +0000</pubDate>
      <link>https://dev.to/michaeltuszynski/enterprise-ai-has-an-80-failure-rate-the-models-arent-the-problem-what-is-16k0</link>
      <guid>https://dev.to/michaeltuszynski/enterprise-ai-has-an-80-failure-rate-the-models-arent-the-problem-what-is-16k0</guid>
      <description>&lt;h1&gt;
  
  
  Enterprise AI Fails at 80% — And the Models Have Nothing to Do With It
&lt;/h1&gt;

&lt;p&gt;Most enterprise AI projects die quietly. No dramatic failure, no post-mortem email chain. They just... stop. The prototype gets a demo, leadership nods approvingly, and then six months later the Slack channel goes silent and the budget gets reallocated.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://matthopkins.com/business/the-154-billion-mistake-why-80-percent-of-companies-get-nothing-from-ai/" rel="noopener noreferrer"&gt;Roughly 80% of enterprise AI projects fail&lt;/a&gt; — double the failure rate of traditional software projects. That number has held steady for years now, even as the models themselves have gotten dramatically better. GPT-4, Claude, Gemini — pick your favorite. They all work. They work remarkably well, actually.&lt;/p&gt;

&lt;p&gt;So why does the enterprise keep fumbling the ball?&lt;/p&gt;

&lt;h2&gt;
  
  
  The Model Isn't Your Problem. Your Plumbing Is.
&lt;/h2&gt;

&lt;p&gt;Here's what I keep seeing from the architecture side: teams treat AI like a features problem when it's actually an infrastructure problem. They spin up a proof of concept using an API, get promising results in a notebook, and then hit a wall when someone asks "OK, how do we run this in production?"&lt;/p&gt;

&lt;p&gt;That wall has a name. It's called &lt;a href="https://medium.com/@archie.kandala/the-production-ai-reality-check-why-80-of-ai-projects-fail-to-reach-production-849daa80b0f3" rel="noopener noreferrer"&gt;the deployment gap&lt;/a&gt; — the distance between a working model and a production system that real users depend on. And it's enormous.&lt;/p&gt;

&lt;p&gt;A platform engineer on Reddit &lt;a href="https://www.reddit.com/r/platformengineering/comments/1ryqpn3/enterprise_ai_has_an_80_failure_rate_the_models/" rel="noopener noreferrer"&gt;put it bluntly&lt;/a&gt;: the failure pattern repeats across organizations regardless of size or industry. The models work fine. The org doesn't.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Five Ways Smart Companies Still Blow It
&lt;/h2&gt;

&lt;p&gt;I've watched this play out at dozens of enterprise accounts. The failure modes are predictable enough to catalog.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. They Solve the Wrong Problem
&lt;/h3&gt;

&lt;p&gt;This is the most common and most expensive mistake. A team picks a use case because it sounds impressive in a board deck, not because it maps to an actual operational bottleneck. "We'll use AI to predict customer churn!" Great. Do you have clean customer data? Do you have a process to act on those predictions? Is churn actually your biggest revenue leak?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://matthopkins.com/business/the-154-billion-mistake-why-80-percent-of-companies-get-nothing-from-ai/" rel="noopener noreferrer"&gt;Companies that fail at AI overwhelmingly choose the wrong problems first&lt;/a&gt;. They optimize for what's exciting instead of what's painful. The successful projects I've seen start with someone saying "this manual process costs us 200 hours a month and we keep getting it wrong."&lt;/p&gt;

&lt;h3&gt;
  
  
  2. They Live in Data Fantasy Land
&lt;/h3&gt;

&lt;p&gt;Every AI project starts with an assumption about data quality that turns out to be wildly optimistic. The data exists, sure. It's in four different systems, three different formats, with no consistent identifiers, maintained by teams who don't talk to each other.&lt;/p&gt;

&lt;p&gt;I worked with an enterprise that wanted to build an AI-powered inventory optimization system. The model was straightforward. The data pipeline took eleven months — not because the engineering was hard, but because getting three business units to agree on what "inventory" meant took that long.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. They Skip the Platform Layer
&lt;/h3&gt;

&lt;p&gt;This one hits close to home. Teams build AI applications without investing in the platform that supports them. No model registry. No feature store. No monitoring for drift. No rollback mechanism. No cost controls.&lt;/p&gt;

&lt;p&gt;Then the model goes sideways in production — and it will — and there's no way to detect it, debug it, or revert it. You're flying blind with a system that's making real decisions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://medium.com/@archie.kandala/the-production-ai-reality-check-why-80-of-ai-projects-fail-to-reach-production-849daa80b0f3" rel="noopener noreferrer"&gt;The production gap isn't a model problem — it's a platform engineering problem&lt;/a&gt;. The organizations that ship AI successfully treat ML infrastructure with the same rigor they'd apply to any other production system: observability, CI/CD, access controls, cost management.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. They Start With Tech Instead of Humans
&lt;/h3&gt;

&lt;p&gt;I've seen teams spend months evaluating which LLM to use, which vector database to pick, whether to fine-tune or RAG, which embedding model performs best on their benchmark — and zero time figuring out who will actually use this thing and how it fits into their workflow.&lt;/p&gt;

&lt;p&gt;The best AI system in the world is worthless if the end user Alt-F4s out of it because it adds three clicks to their existing process. &lt;a href="https://matthopkins.com/business/the-154-billion-mistake-why-80-percent-of-companies-get-nothing-from-ai/" rel="noopener noreferrer"&gt;Starting with tech instead of humans&lt;/a&gt; is the classic engineering trap: we build what's interesting to build, not what's useful to use.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. They Treat AI as a Project, Not a Product
&lt;/h3&gt;

&lt;p&gt;AI models degrade. The world changes, user behavior shifts, data distributions drift. A model that was 94% accurate in January might be 71% accurate by June. Traditional software doesn't do this. You deploy a calculator app and it keeps calculating correctly forever.&lt;/p&gt;

&lt;p&gt;AI requires ongoing investment: retraining, monitoring, evaluation, data quality maintenance. When leadership treats AI as a one-time project with a ship date and a done state, they're guaranteeing that the system will rot.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Actually Works
&lt;/h2&gt;

&lt;p&gt;After 25 years in tech — the last several spent watching enterprise AI projects succeed and fail — here's what separates the 20% that ship from the 80% that don't.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Start with the workflow, not the model.&lt;/strong&gt; Find a process where humans are doing repetitive cognitive work, making inconsistent decisions, or drowning in volume. Build AI into that workflow. Not as a standalone app — as an augmentation of what people already do.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Invest in platform before product.&lt;/strong&gt; You need model serving infrastructure, monitoring, cost tracking, and rollback capabilities before you need a sophisticated model. A simple model on a solid platform beats a sophisticated model on nothing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Set a 90-day production deadline.&lt;/strong&gt; If your AI project hasn't touched a real user in 90 days, it probably never will. Scope ruthlessly. Ship something small. Learn from real usage. The organizations that perpetually prototype never ship.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Budget for operations, not just development.&lt;/strong&gt; AI is more like a garden than a bridge. You don't build it and walk away. Plan for ongoing model evaluation, data quality work, and retraining cycles. If your budget only covers development, you're planning to fail.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Make the ROI case boring and specific.&lt;/strong&gt; Not "AI will transform our customer experience." Instead: "This model will reduce manual review time from 6 hours to 45 minutes per day for the claims processing team, saving $280K annually." When the value is that concrete, the project survives leadership changes and budget cuts.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Uncomfortable Truth
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://matthopkins.com/business/the-154-billion-mistake-why-80-percent-of-companies-get-nothing-from-ai/" rel="noopener noreferrer"&gt;The enterprise AI failure rate represents roughly $154 billion in wasted spend&lt;/a&gt;. That money didn't evaporate because GPT wasn't smart enough. It evaporated because organizations treated AI adoption as a technology challenge when it's actually an organizational design challenge.&lt;/p&gt;

&lt;p&gt;The models are good enough. They've been good enough for a while now. The question was never "can AI do this?" — it's "can your organization support AI doing this?"&lt;/p&gt;

&lt;p&gt;If you can't answer yes to that second question, no amount of model capability will save you. Fix the plumbing. Define the problem. Invest in the platform. Then — and only then — worry about which model to use.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Beyond Comprehension Debt: Why Context Architecture Is the Real AI Moat</title>
      <dc:creator>Michael Tuszynski</dc:creator>
      <pubDate>Thu, 19 Mar 2026 03:00:36 +0000</pubDate>
      <link>https://dev.to/michaeltuszynski/beyond-comprehension-debt-why-context-architecture-is-the-real-ai-moat-kfk</link>
      <guid>https://dev.to/michaeltuszynski/beyond-comprehension-debt-why-context-architecture-is-the-real-ai-moat-kfk</guid>
      <description>&lt;p&gt;Addy Osmani dropped a piece last week that's been making the rounds: "&lt;a href="https://addyosmani.com/blog/comprehension-debt/" rel="noopener noreferrer"&gt;Comprehension Debt — The Hidden Cost of AI-Generated Code&lt;/a&gt;." His thesis is sharp. Teams are shipping AI-generated code faster than anyone can understand it. Tests pass, PRs look clean, and nobody notices the growing gap between what's been deployed and what any human actually comprehends. When something breaks at 3am, that gap becomes the bill.&lt;/p&gt;

&lt;p&gt;He's right. And he's not seeing the whole picture.&lt;/p&gt;

&lt;p&gt;Osmani diagnosed one symptom of a larger condition. Comprehension debt — the gap between shipped code and understood code — is real, and it matters. But it's one line item on a ledger that most engineering organizations haven't even opened yet. If you're a CTO or VP of Engineering adopting AI-assisted development, comprehension debt is the problem you can &lt;em&gt;see&lt;/em&gt;. The ones that will actually sink you are the ones you can't.&lt;/p&gt;

&lt;h2&gt;
  
  
  The AI Debt Paradox
&lt;/h2&gt;

&lt;p&gt;There's a seductive argument circulating in engineering circles right now: AI makes rewriting cheap, so tech debt doesn't matter anymore. Why maintain a crumbling monolith when you can regenerate services in an afternoon?&lt;/p&gt;

&lt;p&gt;It's about 40% right, which makes it dangerous.&lt;/p&gt;

&lt;p&gt;Yes, the cost curve of rewriting code has collapsed. For bounded, well-specified modules, AI absolutely turns "rewrite" from a quarter-long initiative into a day's work. The classic excuse — "we can't touch that, it'll take months" — is dying. That's real progress.&lt;/p&gt;

&lt;p&gt;But here's the paradox: every time you "start from scratch," you throw away embedded knowledge about &lt;em&gt;why&lt;/em&gt; decisions were made. The 47 edge cases handled one by one over 18 months. The compliance requirement someone baked in after an audit. The OAuth flow three partners hardcoded against. AI can regenerate the code layer fast. It cannot regenerate the institutional context that made that code correct.&lt;/p&gt;

&lt;p&gt;Tech debt didn't disappear. It shape-shifted. And the new forms are harder to detect, harder to measure, and harder to pay down.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three Debts, Not One
&lt;/h2&gt;

&lt;p&gt;Osmani gave us a name for the first one. Here are the other two.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Comprehension Debt (Osmani's Contribution)
&lt;/h3&gt;

&lt;p&gt;This is the gap between the code your team ships and the code your team understands. Osmani nailed the mechanics: AI-generated code passes review because it &lt;em&gt;looks&lt;/em&gt; right, engineers approve PRs they haven't fully internalized, and the organizational assumption that "reviewed = understood" quietly breaks down.&lt;/p&gt;

&lt;p&gt;The insight is correct. But the prescription — slow down, review more carefully, quiz your engineers — is a &lt;em&gt;cultural&lt;/em&gt; intervention. It treats comprehension debt as a discipline problem. For Google-scale teams with senior engineers and strong review culture, that might work. For the other 95% of engineering organizations? You need more than discipline. You need infrastructure.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Context Debt
&lt;/h3&gt;

&lt;p&gt;This is the one nobody's naming clearly, and it's the most dangerous.&lt;/p&gt;

&lt;p&gt;Context debt is the accumulated loss of institutional knowledge about &lt;em&gt;why&lt;/em&gt; systems are built the way they are. It's not about whether engineers understand the code in front of them — it's about whether anyone understands the decisions, constraints, trade-offs, and edge cases that shaped it.&lt;/p&gt;

&lt;p&gt;Consider what lives in a mature codebase beyond the code itself:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Architectural rationale.&lt;/strong&gt; Why this service exists as its own deployment rather than a module in the monolith. Why the database schema looks the way it does. Why that particular API contract was chosen over three alternatives that were debated for weeks.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Boundary knowledge.&lt;/strong&gt; Which downstream consumers depend on specific response shapes. Which partner integrations are fragile. Which compliance requirements are baked into the data flow and why.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Failure memory.&lt;/strong&gt; The incident that revealed a race condition nobody anticipated. The scaling problem that drove the caching strategy. The security audit finding that explains the seemingly redundant validation layer.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of this lives in the code. Very little of it lives in documentation. Most of it lives in the heads of engineers who were there when the decisions were made. When AI regenerates a service from scratch, it produces code that compiles, passes tests, and handles the happy path. What it cannot produce is the scar tissue — the hard-won understanding of what goes wrong and why.&lt;/p&gt;

&lt;p&gt;Context debt accumulates every time a team rewrites without capturing context first. Every time an AI-generated solution replaces a human-authored one without preserving the &lt;em&gt;reasoning&lt;/em&gt; behind the original. Every time an engineer leaves and their knowledge of why things are the way they are walks out with them.&lt;/p&gt;

&lt;p&gt;This isn't new — context loss has always been a risk in software organizations. What's new is the &lt;em&gt;velocity&lt;/em&gt; at which AI-assisted development can destroy context. When rewriting is cheap, the incentive to understand before replacing drops to zero.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Integration Debt
&lt;/h3&gt;

&lt;p&gt;The third form is architectural. Integration debt is the growing inconsistency between AI-generated components that were each built in isolation without awareness of the broader system.&lt;/p&gt;

&lt;p&gt;AI coding assistants operate within a context window. They see the file you're working on, maybe some adjacent files, maybe a system prompt describing your stack. What they don't see is the full topology of your system — every service, every contract, every shared assumption that holds your architecture together.&lt;/p&gt;

&lt;p&gt;When three different engineers use AI to independently build three services that interact, each service might be internally excellent. Clean code, good patterns, thorough tests. But the interfaces between them — data formats, error handling conventions, retry semantics, authentication flows — will diverge unless someone is deliberately enforcing coherence.&lt;/p&gt;

&lt;p&gt;The numbers back this up. &lt;a href="https://www.coderabbit.ai/blog/2025-was-the-year-of-ai-speed-2026-will-be-the-year-of-ai-quality" rel="noopener noreferrer"&gt;CodeRabbit's 2026 analysis&lt;/a&gt; found teams merged 98% more PRs that were 154% larger year-over-year, while 61% of developers reported that AI produces code that "looks correct but is unreliable." The generation pressure is real. The verification pressure is downstream — and growing.&lt;/p&gt;

&lt;p&gt;Integration debt compounds quietly. It shows up as unexpected failures during deployment. As subtle data inconsistencies between services. As "it works on my machine" problems that are actually contract mismatches between components that were never designed to work together, despite technically needing to.&lt;/p&gt;

&lt;p&gt;The faster you generate components, the faster integration debt accumulates. And unlike code quality — which AI can actually help improve — integration coherence requires exactly the kind of big-picture architectural thinking that AI tools are worst at.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Composition Shift
&lt;/h2&gt;

&lt;p&gt;Here's why this matters strategically, not just technically.&lt;/p&gt;

&lt;p&gt;For two decades, "tech debt" mostly meant code-level debt: poor abstractions, missing tests, duplicated logic, outdated dependencies. That's the debt AI is genuinely good at paying down. Refactoring, test generation, dependency updates, code modernization — these are tasks where AI excels. If your tech debt balance sheet was entirely code-level debt, the "AI makes debt irrelevant" crowd would be right.&lt;/p&gt;

&lt;p&gt;But in any system of real complexity, code-level debt was always just the visible portion. The deeper liabilities — context, comprehension, integration — were always there. They were just overshadowed by the sheer volume of code-level problems.&lt;/p&gt;

&lt;p&gt;AI didn't eliminate tech debt. It paid down the most visible kind, revealing the structural kinds that were hiding underneath. The balance sheet didn't shrink. The composition changed. And most organizations are still using the old chart of accounts.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://dora.dev/research/2024/" rel="noopener noreferrer"&gt;2024 DORA report&lt;/a&gt; hints at this shift: despite widespread AI adoption, throughput dipped 1.5% and stability dropped 7.2% across 39,000 respondents. Teams are generating more code and shipping it less reliably. The metrics that matter — lead time, change failure rate, recovery time — aren't improving with velocity alone.&lt;/p&gt;

&lt;h2&gt;
  
  
  Context as Infrastructure, Not Culture
&lt;/h2&gt;

&lt;p&gt;This is where I part ways with the prevailing conversation.&lt;/p&gt;

&lt;p&gt;Osmani and others are framing these new debts as cultural and organizational challenges. Slow down. Review more carefully. Keep humans in the loop. These aren't wrong, but they're incomplete — and for many teams, impractical. You can't tell a startup burning runway to slow down their AI-assisted velocity for the sake of comprehension hygiene. You can't tell a team of five to institute Google-style code review rituals.&lt;/p&gt;

&lt;p&gt;What you &lt;em&gt;can&lt;/em&gt; do is treat context as infrastructure.&lt;/p&gt;

&lt;p&gt;I've been arguing for a while now that context management is the real skill gap in AI-assisted development — that getting value from AI tools is less about prompt engineering and more about maintaining rich, current, accessible context that those tools can leverage. (I wrote about this in "&lt;a href="https://mpt.solutions/context-management-generative-ai/" rel="noopener noreferrer"&gt;This Above All: To Thine Own Context Be True&lt;/a&gt;" earlier this year.)&lt;/p&gt;

&lt;p&gt;The same principle applies to the debt problem, but at organizational scale. Context debt isn't an inevitable consequence of AI adoption. It's an infrastructure failure — a failure to build systems that capture, maintain, and surface the institutional knowledge that makes code comprehensible and architecturally coherent.&lt;/p&gt;

&lt;p&gt;What does context infrastructure look like in practice?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Living architectural decision records.&lt;/strong&gt; Not dusty wiki pages nobody updates, but actively maintained documents that live alongside the code and get updated as part of the development workflow. &lt;a href="https://www.cognitect.com/blog/2011/11/15/documenting-architecture-decisions" rel="noopener noreferrer"&gt;Michael Nygard formalized the ADR pattern&lt;/a&gt; back in 2011 — Title, Status, Context, Decision, Consequences. The format is fifteen years old. What's changed is that AI makes the cost of &lt;em&gt;not&lt;/em&gt; having ADRs catastrophically higher. When AI generates a new implementation, the context for &lt;em&gt;why&lt;/em&gt; the old implementation existed this way needs to be right there — available to the engineer reviewing the change and to the AI generating it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Structured project memory.&lt;/strong&gt; Tools and conventions so the reasoning behind decisions persists beyond the individual who made them. This means treating context documents — system descriptions, constraint inventories, edge case catalogs — as first-class artifacts that get versioned, reviewed, and maintained with the same rigor as code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Integration contracts as explicit artifacts.&lt;/strong&gt; Rather than letting service interfaces emerge organically from individually-generated components, defining and maintaining explicit contracts that AI tools can reference during generation. The contract becomes the source of truth for integration coherence, not the individual developer's mental model of how everything fits together.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Context-aware generation workflows.&lt;/strong&gt; Configuring AI tools to ingest project context before generating code, rather than generating in a vacuum and hoping for coherence. This means investing in the scaffolding — the context files, the system prompts, the reference documents — that turn AI from a talented but amnesiac intern into a contributor who understands the system they're working within.&lt;/p&gt;

&lt;p&gt;None of this is revolutionary. It's the kind of engineering discipline that good teams have always practiced. What's different is the urgency. When humans wrote all the code, context accumulated naturally — slowly, and with gaps, but it accumulated. When AI generates code at 10x the velocity, context dissipates at 10x the rate unless you deliberately counteract it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Assessment Gap
&lt;/h2&gt;

&lt;p&gt;The biggest opportunity right now isn't in tooling — it's in assessment.&lt;/p&gt;

&lt;p&gt;Most engineering organizations have no way to measure their exposure to these new forms of debt. They can tell you their test coverage percentage, their deployment frequency, their mean time to recovery. They cannot tell you how much institutional context has been lost in the last six months of AI-assisted development. They cannot quantify how many of their AI-generated components have integration assumptions that conflict with each other. They cannot assess whether their team's comprehension of the codebase has kept pace with its growth.&lt;/p&gt;

&lt;p&gt;This is the gap that needs to be closed first. Before you can manage context debt, comprehension debt, and integration debt, you need to be able to see them. And right now, almost nobody can.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Means
&lt;/h2&gt;

&lt;p&gt;If you're leading an engineering organization that's adopting AI-assisted development — and at this point, that's nearly everyone — the question isn't whether these new forms of debt are accumulating. They are. The question is whether you're managing them deliberately or discovering them during incidents.&lt;/p&gt;

&lt;p&gt;The organizations that will thrive in the AI era aren't the ones that generate code fastest. They're the ones that maintain the richest context while moving fast. That's a different capability than what most teams are building right now, and it's one that compounds over time. The team that invests in context architecture today will be moving faster &lt;em&gt;and&lt;/em&gt; more safely a year from now. The team that optimizes only for generation velocity will be drowning in debt they can't see and can't name.&lt;/p&gt;

&lt;p&gt;The old tech debt conversation was about code quality. The new one is about knowledge architecture. And it's just getting started.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Michael Tuszynski is the founder of &lt;a href="https://mpt.solutions" rel="noopener noreferrer"&gt;MPT Solutions&lt;/a&gt;, where he writes about AI strategy, cloud architecture, and engineering leadership. With 25 years in software — including six years as a Senior Solutions Architect at AWS and a stint as CTO at Fandor — he focuses on helping teams adopt AI-assisted development without sacrificing the institutional context that makes their systems work.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Previously: &lt;a href="https://mpt.solutions/context-management-generative-ai/" rel="noopener noreferrer"&gt;This Above All: To Thine Own Context Be True&lt;/a&gt; — on why context management matters more than prompt engineering.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>engineeringleadership</category>
      <category>technicaldebt</category>
      <category>contextarchitecture</category>
    </item>
    <item>
      <title>Why does nobody teach the infrastructure problems that destroy developer productivity before production breaks</title>
      <dc:creator>Michael Tuszynski</dc:creator>
      <pubDate>Wed, 18 Mar 2026 02:36:58 +0000</pubDate>
      <link>https://dev.to/michaeltuszynski/why-does-nobody-teach-the-infrastructure-problems-that-destroy-developer-productivity-before-32mb</link>
      <guid>https://dev.to/michaeltuszynski/why-does-nobody-teach-the-infrastructure-problems-that-destroy-developer-productivity-before-32mb</guid>
      <description>&lt;h1&gt;
  
  
  The Production Gap: Why Nobody Teaches the Infrastructure That Actually Matters
&lt;/h1&gt;

&lt;p&gt;Every bootcamp, CS program, and YouTube tutorial series teaches you how to build features. Almost none of them teach you what happens when those features meet real traffic, real failure modes, and real users who do things you never anticipated.&lt;/p&gt;

&lt;p&gt;The result is predictable: developers ship code that works on their laptop, passes CI, and then falls apart the moment it hits production at scale. Not because the logic is wrong — because nobody taught them about connection pooling, graceful degradation, or what happens when your database runs out of connections at 2 AM on a Saturday.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Curriculum Blind Spot
&lt;/h2&gt;

&lt;p&gt;A &lt;a href="https://www.reddit.com/r/ExperiencedDevs/comments/1rvyprt/why_does_nobody_teach_the_infrastructure_problems/" rel="noopener noreferrer"&gt;thread on r/ExperiencedDevs&lt;/a&gt; captured this frustration perfectly: educational content focuses almost entirely on writing code and building features, while operational concerns — monitoring, error handling, memory management, rate limiting — only become relevant when applications break in production. By then, you're learning under fire.&lt;/p&gt;

&lt;p&gt;This isn't a minor gap. It's the gap between "I can build software" and "I can build software that stays running." And it's enormous.&lt;/p&gt;

&lt;p&gt;Think about what a typical full-stack course covers: React components, REST APIs, database queries, authentication flows. Maybe some Docker basics. Now think about what actually causes production incidents: thread pool exhaustion, cascading failures from a single downstream dependency, memory leaks that only manifest after 72 hours of uptime, DNS resolution failures, certificate expiration, connection storms after a deploy.&lt;/p&gt;

&lt;p&gt;These aren't edge cases. They're Tuesday.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Gap Exists
&lt;/h2&gt;

&lt;p&gt;Three forces keep operational knowledge out of the curriculum.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;First, it's hard to teach without real systems.&lt;/strong&gt; You can't simulate connection pool exhaustion on a laptop running SQLite. You can't demonstrate cascading failures with a single-service tutorial app. The infrastructure problems that destroy productivity only emerge at a certain scale of complexity, traffic, and time — none of which exist in a classroom.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Second, it's not glamorous.&lt;/strong&gt; "Build a full-stack app in 30 minutes" gets clicks. "Understanding TCP keepalive settings and why they matter for your connection pool" does not. Content creators optimize for engagement, and operational topics feel boring until the moment they're the only thing that matters.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Third, the people who know this stuff learned it the hard way and are too busy to teach it.&lt;/strong&gt; The senior SRE who understands why your Kubernetes pods are getting OOMKilled at 3x expected memory usage is probably dealing with an incident right now, not writing blog posts. Operational knowledge lives in war stories, incident retrospectives, and tribal knowledge passed between teammates — not in structured curricula.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real Cost
&lt;/h2&gt;

&lt;p&gt;This isn't just an education problem. It's a &lt;a href="https://medium.com/@chain.love/why-developer-productivity-is-really-an-infra-problem-af89528aef7a" rel="noopener noreferrer"&gt;productivity problem that masquerades as a people problem&lt;/a&gt;. When teams complain about slow velocity, the instinct is to look at process, hiring, or morale. But often the real bottleneck is that developers spend hours debugging infrastructure issues they were never trained to anticipate.&lt;/p&gt;

&lt;p&gt;A developer who doesn't understand connection pooling will open a new database connection per request, wonder why the app works in dev but times out under load, and then spend two days tracking down the issue. A developer who doesn't understand backpressure will build a message consumer that looks correct but silently drops events when the queue backs up. A developer who doesn't understand DNS caching will deploy a service that works perfectly until the load balancer rotates IPs.&lt;/p&gt;

&lt;p&gt;Each of these costs days — sometimes weeks — of debugging time. Multiply that across a team, and &lt;a href="https://coder.com/blog/the-uncomfortable-truth-about-developer-productivity-in-apac-tools-arent-the-prob" rel="noopener noreferrer"&gt;the infrastructure gap becomes the single largest drag on developer productivity&lt;/a&gt;. Not the tools, not the process, not the sprint ceremonies. The fact that half the team has never been taught how production systems actually behave.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Specific Knowledge That's Missing
&lt;/h2&gt;

&lt;p&gt;Here's my list of operational topics that every developer should understand before they're responsible for a production system. None of these show up in a typical CS degree or bootcamp:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Connection management.&lt;/strong&gt; How connection pools work, why they have limits, what happens when you exhaust them, and how to size them for your workload. This single topic prevents more production incidents than any framework feature.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Graceful degradation.&lt;/strong&gt; What your application should do when a dependency is slow or unavailable. The answer is never "throw a 500 and hope for the best," but that's what most tutorial code does.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Observability fundamentals.&lt;/strong&gt; Not "install Datadog" — actual understanding of what metrics matter, how to correlate logs across services, what a useful alert looks like vs. one that wakes you up for nothing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Memory and resource management.&lt;/strong&gt; How garbage collection actually works in your runtime. What causes memory leaks in languages that claim to manage memory for you. Why your Node.js service uses 2GB of RAM after running for a week.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rate limiting and backpressure.&lt;/strong&gt; How to protect your service from being overwhelmed, and how to be a good citizen when calling someone else's service. This is the difference between a service that handles traffic spikes and one that cascades failures across your entire platform.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Failure modes of distributed systems.&lt;/strong&gt; Partial failures, network partitions, split-brain scenarios, exactly-once delivery myths. You don't need a PhD in distributed systems theory, but you need to understand that the network is not reliable, clocks are not synchronized, and retries without backoff are a denial-of-service attack on your own infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Should Change
&lt;/h2&gt;

&lt;p&gt;I'm not expecting universities to overhaul their CS curricula overnight. But a few things would help.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bootcamps should include a "production readiness" module.&lt;/strong&gt; Before graduation, every student should deploy an app, load test it until it breaks, diagnose the failure, and fix it. That single exercise teaches more about real-world engineering than a semester of algorithm problems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Senior engineers need to write down what they know.&lt;/strong&gt; The gap persists partly because operational knowledge stays locked in people's heads. Incident retrospectives should be shared broadly. Internal tech talks on "how we debugged X" are worth 10x more than another talk on the latest framework.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Companies should invest in structured onboarding for production systems.&lt;/strong&gt; Don't throw a new hire at the codebase and hope they figure out the monitoring stack. Walk them through the architecture, show them where things break, explain the failure modes you've already seen. This is not hand-holding — it's preventing the next incident.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform teams should build paved roads.&lt;/strong&gt; If connection pooling is tricky, provide a standard library that does it correctly. If observability requires too much configuration, bake it into the deployment pipeline. Don't rely on every developer independently learning every operational concern — make the right thing the easy thing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Uncomfortable Truth
&lt;/h2&gt;

&lt;p&gt;The industry has a weird relationship with operational knowledge. We celebrate feature velocity and treat infrastructure work as unglamorous plumbing. We promote the developer who shipped the flashy new feature and overlook the one who quietly prevented 47 production incidents through better error handling and circuit breakers.&lt;/p&gt;

&lt;p&gt;Until we value the skills that keep systems running as much as the skills that build new ones, the production gap will persist. New developers will keep learning the hard way — at 2 AM, on a Saturday, with a Slack channel full of escalations and no idea why the connection pool is exhausted.&lt;/p&gt;

&lt;p&gt;The fix starts with acknowledging that knowing how to write code and knowing how to run code are two different skills. We teach the first one extensively. The second one, we mostly leave to chance. That's a choice, and it's the wrong one.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>learning</category>
      <category>productivity</category>
      <category>softwareengineering</category>
    </item>
  </channel>
</rss>
