<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Benjamin Eckstein</title>
    <description>The latest articles on DEV Community by Benjamin Eckstein (@codewithagents_de).</description>
    <link>https://dev.to/codewithagents_de</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3832780%2F0ea34886-b5c2-4c5a-9431-d9889a1d057e.jpg</url>
      <title>DEV Community: Benjamin Eckstein</title>
      <link>https://dev.to/codewithagents_de</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/codewithagents_de"/>
    <language>en</language>
    <item>
      <title>$187 and 16 Hours: My First Million-Token Session</title>
      <dc:creator>Benjamin Eckstein</dc:creator>
      <pubDate>Mon, 06 Apr 2026 07:23:00 +0000</pubDate>
      <link>https://dev.to/codewithagents_de/187-and-16-hours-my-first-million-token-session-2ae2</link>
      <guid>https://dev.to/codewithagents_de/187-and-16-hours-my-first-million-token-session-2ae2</guid>
      <description>&lt;p&gt;Two things landed in the same week: the 1 million token context window and the Claude Agentic Teams beta. One gave me room to think. The other gave me a way to parallelize. I did what any reasonable engineer would do: I immediately tried to break both with something too ambitious.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fplhsl5vpift3v9v1bvhf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fplhsl5vpift3v9v1bvhf.png" alt="The session receipt: $187, 16 hours, 729 tests, 34.8% orchestrator context used" width="744" height="424"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The plan: build a complete cashback campaign web application — backend, frontend, full test suite, containerized deployment — in a single session. One orchestrator. &lt;a href="https://www.codewithagents.de/en/blog/building-agent-army/" rel="noopener noreferrer"&gt;Eight specialized agents&lt;/a&gt; spawned as a team. Don't stop until it's live.&lt;/p&gt;

&lt;p&gt;What actually happened is more interesting than either the successes or the failures on their own.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Setup
&lt;/h2&gt;

&lt;p&gt;The Agentic Teams feature was the key enabler. Instead of one agent doing everything sequentially, I had an orchestrator that spawned specialized subagents — each with its own fresh context window, each focused on one domain:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Backend implementer&lt;/strong&gt; — Spring Boot service, API endpoints, business logic&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Frontend implementer&lt;/strong&gt; — React SPA wired to the backend&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;QA reviewer&lt;/strong&gt; — running tests, flagging gaps, reviewing coverage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deployment agent&lt;/strong&gt; — Dockerfile, compose files, deployment configuration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Git agent&lt;/strong&gt; — branches, commits, keeping the repo clean&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PR handler&lt;/strong&gt; — pull request creation, descriptions, review assignments&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CI monitor&lt;/strong&gt; — watching the pipeline, catching failures early&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Slack notifier&lt;/strong&gt; — status updates to the team channel&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The combination of 1M context and teams changed the equation fundamentally. The orchestrator held the big picture — architecture, decisions, coordination — while each subagent got a fresh context dedicated entirely to its domain. No context pollution between concerns. The backend implementer's window wasn't cluttered with CSS decisions. The deployment agent didn't carry the weight of test output.&lt;/p&gt;

&lt;p&gt;That's not a bigger notepad. That's a qualitatively different way of working.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Numbers
&lt;/h2&gt;

&lt;p&gt;Let me give you the receipt before the narrative.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Total cost&lt;/td&gt;
&lt;td&gt;$186.92&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Wall time&lt;/td&gt;
&lt;td&gt;16 hours&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API time&lt;/td&gt;
&lt;td&gt;7 hours 42 minutes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lines of code written&lt;/td&gt;
&lt;td&gt;5,800+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Backend tests&lt;/td&gt;
&lt;td&gt;649 (all passing)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;End-to-end tests&lt;/td&gt;
&lt;td&gt;80 (all passing)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Orchestrator context at completion&lt;/td&gt;
&lt;td&gt;34.8% used&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The gap between wall time and API time tells its own story. Nine-ish hours of waiting — for builds to complete, for containers to spin up, for CI pipelines to run, for me to review and redirect. The agent system was genuinely idle for more than half the clock time. Multi-agent work is often more about managing parallelism and wait states than it is about raw token throughput.&lt;/p&gt;

&lt;p&gt;The context number needs explanation: 34.8% is the &lt;em&gt;orchestrator's&lt;/em&gt; context usage — the central agent coordinating everything. But here's the thing about agentic teams: every subagent spawns with a fresh context window. The backend implementer burned through most of its own context writing 3,000+ lines of Spring Boot code. The frontend implementer filled a separate window with React components. The total tokens consumed across all agents was many multiples of what the orchestrator alone used.&lt;/p&gt;

&lt;p&gt;The 1M window mattered for the orchestrator's ability to hold the full project state — every architectural decision, every agent's status, every failure and recovery — without summarization loss. The subagents benefited from fresh context dedicated entirely to their domain.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0t4dio08wnawkemprurn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0t4dio08wnawkemprurn.png" alt="Orchestrator used 34.8% of 1M context — each subagent had its own fresh window on top" width="716" height="206"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What We Built
&lt;/h2&gt;

&lt;p&gt;A cashback campaign web application. Users register for campaigns, submit purchase verification, and receive cashback payouts. Backend exposes REST endpoints with full authentication, campaign management, submission handling, and payout processing. Frontend handles the user journey: campaign listing, submission form, status tracking, account management.&lt;/p&gt;

&lt;p&gt;649 backend tests covering units, integration, and API contracts. 80 end-to-end tests exercising complete user flows against the deployed system. Both suites passing at the time of deployment.&lt;/p&gt;

&lt;p&gt;Containerized with Docker, deployed to a demo server, accessible over HTTPS. The full stack was live — not prototype-live or local-dev-live, but actually deployed and running with a URL you could share.&lt;/p&gt;

&lt;p&gt;In one session.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Broke
&lt;/h2&gt;

&lt;p&gt;Three things broke in ways worth documenting.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The UI refinement agent hung mid-session.&lt;/strong&gt; About ten hours in, I spun up an additional agent to polish the frontend styling. It started working, then stopped producing output, then started again, then stopped permanently. The process was still running — consuming tokens, returning nothing meaningful. I had to force-kill it and redistribute its remaining tasks to the frontend implementer. Cause: unclear. Hypothesis: the context had accumulated enough ambiguous signal that the agent entered a local minimum and couldn't exit without human intervention. I'd seen this behavior before in shorter sessions. At this scale it cost more time. (I wrote a full postmortem on this and three other multi-agent failures in &lt;a href="https://www.codewithagents.de/en/blog/agent-that-hung/" rel="noopener noreferrer"&gt;The Agent That Hung&lt;/a&gt;.)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Docker configuration required multiple debug cycles.&lt;/strong&gt; The deployment agent's first three attempts at the Dockerfile produced images that built successfully and failed at runtime. The failure modes were different each time: wrong environment variable name, missing health check endpoint, volume mount path mismatch. None of these were hard problems — they were the kind of thing that takes ten minutes to fix once you know what's wrong. But each cycle was 15-20 minutes of build time, which adds up. The agent wasn't wrong in a systematic way; it was wrong in a random way, which is harder to diagnose.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CORS whitelisting was missing from the first live deployment.&lt;/strong&gt; The backend deployed, the frontend deployed, we hit the first real URL from a browser — and got CORS errors. The frontend and backend were on different origins, and nobody had configured allowed origins in the API. This is the kind of thing that's trivially obvious in hindsight and invisible when you're thinking about everything else. We fixed it in twenty minutes, but the gap between "it works in tests" and "it works when you actually open a browser" is real and shouldn't be understated.&lt;/p&gt;

&lt;p&gt;The failures were recoverable. None of them were catastrophic. But they're worth naming because the narrative of "multi-agent AI builds complete app in one session" can make it sound smoother than it is.&lt;/p&gt;

&lt;h2&gt;
  
  
  Was $187 Worth It?
&lt;/h2&gt;

&lt;p&gt;This is the question everyone asks.&lt;/p&gt;

&lt;p&gt;$186.92 for a complete, deployed, tested web application. The question is: compared to what?&lt;/p&gt;

&lt;p&gt;My estimate for solo development of this system — evenings and weekends, the realistic mode for a side project — is two to three weeks. That's probably 40-60 hours of actual coding time, spread across a month of calendar time. You don't get it faster by working harder; you get it faster by having more hours available.&lt;/p&gt;

&lt;p&gt;The session compressed that into one long stretch — starting the evening of the 17th and spanning into the early hours of the 18th. Not just in wall time, but in context. When you're working across three weeks of evenings, you spend a non-trivial portion of each session re-establishing context. What did I build last time? Where did I leave off? Why did I make this architectural decision? The 1M context window meant that never happened. Every agent at every moment had access to the full state of the project.&lt;/p&gt;

&lt;p&gt;That context compression is the value. The $187 isn't paying for code generation — you can get code generation cheaply. It's paying for unbroken continuity across an entire project, from empty repository to deployed application.&lt;/p&gt;

&lt;p&gt;Is $187 a lot? It's a dinner out. It's less than an hour of consulting time. For what it produced, it's laughably cheap if the output is usable — and in this case, the output was usable.&lt;/p&gt;

&lt;p&gt;The ROI question gets harder when you ask: "Okay but I'm paying $187 per feature, how does that scale?" Fair. If you're running sessions like this weekly, you're spending $800-1000 a month on context. That's not nothing. But you're also compressing weeks of work into days, and the comparison baseline should be "what would I pay a contractor" rather than "what would I pay in compute."&lt;/p&gt;

&lt;h2&gt;
  
  
  What 1 Million Tokens Actually Changes
&lt;/h2&gt;

&lt;p&gt;The marketing around large context windows is often vague in ways that obscure the real value.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It's not about fitting more files in.&lt;/strong&gt; You could always load more files into multiple sessions. The point isn't storage capacity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It's about the orchestrator holding the full picture.&lt;/strong&gt; The 1M window lets the coordinating agent track every decision, every failure, every architectural choice across a 16-hour session without ever summarizing or losing nuance. When the backend agent reports a schema change, the orchestrator passes that context to the frontend agent accurately — not through a lossy summary, but through the actual decision with its reasoning intact.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Teams multiply the effective context.&lt;/strong&gt; Eight agents, each with their own context window, means the system's total working memory is far larger than 1M tokens. Each specialist gets a fresh window focused on its domain. The orchestrator's 1M window coordinates between them. It's not one big context — it's an architecture of contexts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It eliminates the summarization tax at the coordination layer.&lt;/strong&gt; Shorter orchestrator windows mean you're constantly summarizing: "here's what I built, here's the current state, here's what's failing." Every summary introduces loss. With 1M tokens on the orchestrator, everything that happened across all eight agents was still trackable. No lossy handoffs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It makes failures recoverable without restart.&lt;/strong&gt; When the hung UI agent had to be killed, the orchestrator still had the complete context of what it had attempted. Spinning up a replacement agent with the right instructions was straightforward — the orchestrator knew exactly where the work had left off.&lt;/p&gt;

&lt;p&gt;This is why I described it as a different way of building software. Not a bigger version of the old way. A different mode that becomes available when you combine a large orchestrator context with specialized parallel agents.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd Do Differently
&lt;/h2&gt;

&lt;p&gt;Not much — but a few things.&lt;/p&gt;

&lt;p&gt;I'd add CORS configuration to a deployment checklist from the start. Not because it's hard to add, but because it reliably gets forgotten and costs time. The pattern is consistent enough that it should be institutional knowledge.&lt;/p&gt;

&lt;p&gt;I'd build in explicit agent health checks. The hung UI agent was running for over an hour before I noticed it wasn't producing useful output. A simple "if no meaningful output in X minutes, flag for human review" rule would have caught it faster.&lt;/p&gt;

&lt;p&gt;I'd be more aggressive about pre-splitting the frontend work. At the scale of a complete application, the frontend implementer had a lot of surface area. Splitting that into UI components and data integration from the start would have parallelized more work.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Meta-Point
&lt;/h2&gt;

&lt;p&gt;I did this in February 2026. The 1M token context window was new. The Agentic Teams beta was new. The multi-agent orchestration patterns were things I'd been building for months. Everything converged at once.&lt;/p&gt;

&lt;p&gt;What struck me most wasn't the output — it was the experience of building. For sixteen hours, I wasn't typing. I wasn't writing code. I was making decisions, reviewing outputs, redirecting agents, thinking about architecture. The implementation was handled. The thinking was mine.&lt;/p&gt;

&lt;p&gt;That's the mode I think agentic engineering is pointing toward: not "AI writes the code for me" but "I architect while AI implements, continuously, in real time." The session wasn't sixteen hours of watching progress bars. It was sixteen hours of directed creative work, at a level of abstraction above the code.&lt;/p&gt;

&lt;p&gt;Whether that's exciting or unsettling depends on where you stand. For me, it's both, which is usually a sign that something real is happening.&lt;/p&gt;

&lt;p&gt;The $187 was money well spent. The sixteen hours taught me more about multi-agent system design than any tutorial could. The receipt is right there in the API billing dashboard.&lt;/p&gt;

&lt;p&gt;Now I know what a million tokens feels like from the inside. (What I learned next — making those systems production-ready — is in &lt;a href="https://www.codewithagents.de/en/blog/production-hardening/" rel="noopener noreferrer"&gt;Production Hardening&lt;/a&gt;.)&lt;/p&gt;

</description>
      <category>claudecode</category>
      <category>ai</category>
      <category>productivity</category>
    </item>
    <item>
      <title>The 22,000 Token Tax: Why I Killed My MCP Server</title>
      <dc:creator>Benjamin Eckstein</dc:creator>
      <pubDate>Wed, 01 Apr 2026 20:53:07 +0000</pubDate>
      <link>https://dev.to/codewithagents_de/the-22000-token-tax-why-i-killed-my-mcp-server-2c12</link>
      <guid>https://dev.to/codewithagents_de/the-22000-token-tax-why-i-killed-my-mcp-server-2c12</guid>
      <description>&lt;p&gt;I was at a company workshop, arguing with beginners about token costs.&lt;/p&gt;

&lt;p&gt;They wanted to save money. Reasonable instinct. They were spending maybe €25 a week on API calls and wanted to cut it to €20. I pushed back hard: "You're at the learning stage. Spend &lt;em&gt;more&lt;/em&gt;, not less. Explore. Break things. Create costs.&lt;br&gt;
  Because while you're saving €5, I'm spending €600 a week — and I'll gladly spend €20 more if it means finishing a ticket in one session instead of two."&lt;/p&gt;

&lt;p&gt;Then I told them the one scenario where token consumption actually matters: when you need to prolong a session. Not to save money — to preserve context. Because when your session compacts or resets, you lose everything the model was holding in its head. And in the early days of Claude Code, there was no auto-compact. Your session just died with an error when you hit the limit. Auto-compact made this better, but you never know what survives the squeeze. Research confirms what I've felt in practice: &lt;a href="https://arxiv.org/abs/2510.05381" rel="noopener noreferrer"&gt;context length alone hurts LLM performance&lt;/a&gt;, even when the relevant information is right there. The longer your context, the worse the output — a phenomenon sometimes called context rot. So   every unnecessary token you load at startup is a tax on the quality of everything that follows.&lt;/p&gt;

&lt;p&gt;I came home that evening and opened a new session. Ran &lt;code&gt;/context&lt;/code&gt;. Stared at the breakdown.&lt;/p&gt;

&lt;p&gt;22,000 tokens in MCP tools alone. Before I typed a single prompt.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Receipt
&lt;/h2&gt;

&lt;p&gt;I had three MCP servers running: &lt;code&gt;mcp-atlassian&lt;/code&gt; for Jira and Confluence, &lt;code&gt;chrome-devtools&lt;/code&gt; for browser automation, and &lt;code&gt;context7&lt;/code&gt; for documentation lookups. Together they cost 22K tokens. But the Atlassian server was the one I could kill — it was registering 33 tools for a service where I used six.&lt;/p&gt;

&lt;p&gt;I'd gone through the settings and disabled as many as I could — but the server kept loading all of them. Confluence tools I never used. Batch operations. Sprint management. Worklog tracking. None of it mattered.&lt;/p&gt;

&lt;p&gt;All 33 tools. About 10,000 tokens. Every single session.&lt;/p&gt;

&lt;p&gt;I compared the numbers. One skill — 40 tokens of metadata. One MCP tool — 300 tokens of schema. The Atlassian MCP was loading tools I had explicitly told it not to load.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Setting That Doesn't
&lt;/h2&gt;

&lt;p&gt;Here's what &lt;code&gt;disabledTools&lt;/code&gt; actually does in Claude Code: it prevents the AI from &lt;em&gt;calling&lt;/em&gt; a tool. That's it.&lt;/p&gt;

&lt;p&gt;It does not prevent the MCP server from starting. It does not prevent the server from registering its tools. It does not prevent those tool schemas from being injected into the context window. The Docker container still spins up. The tool definitions still flow in. The tokens still burn. &lt;code&gt;disabledTools&lt;/code&gt; is a runtime filter, not a context optimization. I was disappointed — if the setting exists in the configuration, you'd expect the platform to be smart enough to not load what you've explicitly disabled. But that's not how it works.&lt;/p&gt;

&lt;p&gt;The only way to actually save the tokens is to remove the MCP server entirely.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Replacement: 7 Scripts
&lt;/h2&gt;

&lt;p&gt;I looked at what I actually use. Six Jira operations. Zero Confluence operations. Out of 33 registered tools, I needed six.&lt;/p&gt;

&lt;p&gt;So I wrote shell scripts. The same pattern I already use for Jenkins and Slack — credentials in a JSON file under &lt;code&gt;~/.config/&lt;/code&gt;, curl calls with Bearer token auth, jq for parsing responses.&lt;/p&gt;

&lt;p&gt;The first script took five minutes. Authentication worked on the first try — just &lt;code&gt;Authorization: Bearer &amp;lt;token&amp;gt;&lt;/code&gt; with the same personal access token the MCP had been using. No Docker container. No protocol negotiation. No tool registration. Just curl.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;  &lt;span class="nv"&gt;TOKEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'.personal_token'&lt;/span&gt; ~/.config/jira/credentials.json&lt;span class="si"&gt;)&lt;/span&gt;
  &lt;span class="nv"&gt;BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'.base_url'&lt;/span&gt; ~/.config/jira/credentials.json&lt;span class="si"&gt;)&lt;/span&gt;

  curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;-k&lt;/span&gt; &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="nv"&gt;$TOKEN&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$BASE_URL&lt;/span&gt;&lt;span class="s2"&gt;/rest/api/2/issue/PROJ-123"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The credentials file should be chmod 600 (owner-only read/write). &lt;/p&gt;

&lt;p&gt;The -k flag skips SSL certificate verification because our internal Jira uses a self-signed cert — don't copy that for public endpoints. And yes, the token ends up in the process list briefly via shell variable expansion. For a local developer workstation running personal scripts, that's an acceptable trade-off. For a shared server or CI pipeline, you'd want to pipe credentials through stdin instead.&lt;/p&gt;

&lt;p&gt;Cairn built all six scripts in under an hour. I fed the Jira REST API documentation into the session for context, described the pattern I wanted, and Cairn wrote the scripts, tested them against our live Jira, and verified each one worked. I gave it a real ticket number to go wild on — fetch, update, transition, comment, the full lifecycle. Then we fine-tuned the scripts to bake in our project defaults: the right component, the right team label, the custom fields our board requires. Get issue. Search with JQL. Update fields. Add comment. Get transitions. Transition status. Each script reads credentials, makes a curl call, formats the output. No abstraction layer. No protocol. No 300-token tool schema.&lt;/p&gt;

&lt;p&gt;Then I added a seventh: create issue.&lt;/p&gt;

&lt;p&gt;The Thing MCP Could Never Do&lt;/p&gt;

&lt;p&gt;Creating Jira tickets through MCP never worked reliably. I'd hit the MCP permission wall before — specialized agents couldn't even access the tools. But even when access worked, the actual creation flow — with custom fields, project-specific components, team assignments — always hit edge cases that the MCP abstraction couldn't handle cleanly.&lt;/p&gt;

&lt;p&gt;The curl script created a ticket on the first try.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;  curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;-k&lt;/span&gt; &lt;span class="nt"&gt;-X&lt;/span&gt; POST &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="nv"&gt;$TOKEN&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"fields": {
      "project": {"key": "PROJ"},
      "issuetype": {"name": "Task"},
      "summary": "Test ticket",
      "components": [{"name": "Frontend"}],
      "customfield_12345": [{"value": "Team-A"}]
    }}'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$BASE_URL&lt;/span&gt;&lt;span class="s2"&gt;/rest/api/2/issue"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;HTTP 201. The ticket existed. With the right component, the right team, the right assignee. First try.&lt;/p&gt;

&lt;p&gt;The MCP had been sitting between me and a REST API that was perfectly willing to cooperate. It was abstracting away complexity that didn't exist.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Abstraction Tax
&lt;/h3&gt;

&lt;p&gt;MCP is a good idea for getting started. You install a server, you get tools, you're productive in minutes. For someone spending €25 a week who's still learning, that's the right trade-off. The setup cost is zero and the token cost doesn't matter because you're not pushing session limits.&lt;/p&gt;

&lt;p&gt;When you're 5,428 prompts deep into a persistent agent system, running multi-agent workflows that eat 100K+ tokens per ticket, every unnecessary token at startup compresses the useful work you can do before quality starts degrading. I've learned this lesson before — 23K tokens burned loading a bloated memory file. Now it was 10K tokens burned loading Jira tools I'd explicitly disabled. Same tax, different landlord.&lt;/p&gt;

&lt;p&gt;And here's the part that bothered me most: I couldn't partially load the MCP server. It's all or nothing. Want 6 tools? You get 33. Want to disable the other 27? You can — but you still pay for all 33 in your context. The protocol has no mechanism for selective tool registration based on client preferences.&lt;/p&gt;

&lt;h3&gt;
  
  
  So I replaced it:
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;33 MCP tools with 7 shell scripts&lt;/li&gt;
&lt;li&gt;~10,000 tokens per session with 0 tokens at startup&lt;/li&gt;
&lt;li&gt;Docker container on every launch with no container&lt;/li&gt;
&lt;li&gt;Issue creation broken with Issue creation works&lt;/li&gt;
&lt;li&gt;Tool schemas you can't customize with Scripts you own completely&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The seven scripts total about 700 lines of bash. They live in my skill directory, version-controlled, testable. I can read them. I can debug them. I can add project-specific defaults — like auto-applying the default component and team for every ticket in our project. Try doing that in an MCP tool schema.&lt;/p&gt;

&lt;p&gt;And I know exactly what they do. That MCP server was a Docker image pulled from a third-party registry, running with my Jira credentials baked into environment variables. I never audited that image. I never read its source. Every docker pull could have shipped a different binary. When your integration is 700 lines of bash that you wrote and can read end to end, supply chain risk isn't a concern — it's just curl.&lt;/p&gt;

&lt;h3&gt;
  
  
  When to Graduate
&lt;/h3&gt;

&lt;p&gt;MCP stops making sense the moment you're paying for tools you don't use and can't shed. When you need 6 tools but get 33. When 10K tokens burn before your first prompt. When you need capabilities the server doesn't expose. When you need project-specific behavior that the protocol can't express. That's when you graduate.&lt;/p&gt;

&lt;p&gt;The graduation path is simple: credentials file, curl, jq. The same tools that powered the internet before every API got wrapped in an abstraction layer. They still work. They're still faster. And you own them completely.&lt;/p&gt;

&lt;p&gt;They don't cost you a single token to say hello.&lt;/p&gt;

&lt;h3&gt;
  
  
  What I Actually Learned
&lt;/h3&gt;

&lt;p&gt;This isn't new. It's what every software engineer has done since the beginning: make it work first, then optimize. The MCP got me running. It was the right choice when I was figuring out how to wire an AI agent to Jira at all. But once it worked, the job was to look at the bill and cut the waste. That's not AI-specific wisdom — that's just engineering.&lt;/p&gt;

&lt;p&gt;Integrations have carrying costs. An MCP server isn't free just because it's open-source. A tool registry isn't free just because the tools are disabled. Every abstraction layer between your code and the API it talks to has a price — in tokens, in debuggability, in flexibility, in the things you can't do because the abstraction didn't anticipate your use case.&lt;/p&gt;

&lt;p&gt;Sometimes the best integration is the one with no integration layer at all.&lt;/p&gt;




&lt;p&gt;Originally published at &lt;a href="https://www.codewithagents.de" rel="noopener noreferrer"&gt;CodeWithAgents.de&lt;/a&gt;&lt;/p&gt;




</description>
      <category>claudecode</category>
      <category>mcp</category>
      <category>ai</category>
      <category>llm</category>
    </item>
    <item>
      <title>Skills Ate My Agents (And I'm Okay With That)</title>
      <dc:creator>Benjamin Eckstein</dc:creator>
      <pubDate>Wed, 18 Mar 2026 21:44:27 +0000</pubDate>
      <link>https://dev.to/codewithagents_de/skills-ate-my-agents-and-im-okay-with-that-2k3e</link>
      <guid>https://dev.to/codewithagents_de/skills-ate-my-agents-and-im-okay-with-that-2k3e</guid>
      <description>&lt;p&gt;I was showing off my system to colleagues.&lt;/p&gt;

&lt;p&gt;Eighteen specialized agents, each a craftsman at their job: one for git operations, one for PRs, one for Slack notifications, one for Jenkins diagnostics, one for Maven tests. I’d named them, written their &lt;code&gt;AGENT.md&lt;/code&gt; files, built their &lt;code&gt;CHANGELOG.md&lt;/code&gt; evolution histories. Cairn — my persistent AI orchestrator — coordinated them like a conductor with a full orchestra. While colleagues were still integrating their first MCP tool to give Claude filesystem access, I already had an optimizer agent updating 18 other agents’ instructions based on their operational logs. It worked. It was the frontline.&lt;/p&gt;

&lt;p&gt;Then one colleague asked the question that changed everything.&lt;/p&gt;

&lt;p&gt;“Why don’t you use skills for it?”&lt;/p&gt;

&lt;h2&gt;
  
  
  The moment one question broke everything
&lt;/h2&gt;

&lt;p&gt;I talk about comfort zones on this website. I have a whole post about &lt;a href="https://www.codewithagents.de/en/blog/walls-that-teach/" rel="noopener noreferrer"&gt;the walls that teach you the most&lt;/a&gt; — the invisible ceilings you only discover when something from outside your frame hits you with a simple question. There I was, caught inside my own comfort zone, struggling to answer a colleague.&lt;/p&gt;

&lt;p&gt;I struggled to find another argument why agents still have a future. That struggle was the diagnosis.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Skills Actually Are
&lt;/h2&gt;

&lt;p&gt;Custom slash commands existed for a while: a markdown file, a slash command, Claude follows the instructions. Simple and useful. Skills are that, but unified and extended into something genuinely different.&lt;/p&gt;

&lt;p&gt;Same slash command pattern. But now they live in a directory structure, can carry supporting files, have YAML frontmatter that controls who can invoke them — and, critically, can run in their own forked subagent context. One field — context: fork — spins up a clean, isolated execution environment with custom tool restrictions and its own permission mode. The equivalent of what I used to accomplish by defining a full custom agent with a custom system prompt, a separate &lt;code&gt;AGENT.md&lt;/code&gt; file, and a &lt;code&gt;CHANGELOG.md&lt;/code&gt; to maintain. All of that collapses into a skill directory.&lt;/p&gt;

&lt;p&gt;Skills also support persistent memory across sessions, logs, and supporting files that load on demand. Everything I built into the 18-agent ecosystem? The same mechanics, new home.&lt;/p&gt;

&lt;p&gt;Everything an agent needed, now inside a skill&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Are Agents Dead?&lt;/strong&gt;&lt;br&gt;
I spent real time trying to find a use case where a custom pre-defined subagent is the right answer and a skill genuinely isn’t.&lt;/p&gt;

&lt;p&gt;Memory across sessions? Skills have it. Set memory: user in the frontmatter.&lt;/p&gt;

&lt;p&gt;Isolated context? context: fork in the skill.&lt;/p&gt;

&lt;p&gt;Custom system prompt? The markdown body of SKILL.md becomes the prompt.&lt;/p&gt;

&lt;p&gt;Tool restrictions? allowed-tools: Read, Grep, Glob.&lt;/p&gt;

&lt;p&gt;Logs and observability? Write to a logs/ directory inside the skill.&lt;/p&gt;

&lt;p&gt;Evolution over time? An optimizer skill reads those logs and updates SKILL.md. The &lt;a href="https://www.codewithagents.de/en/blog/agents-record-optimizer-thinks/" rel="noopener noreferrer"&gt;record-then-optimize&lt;/a&gt; pattern moves with you. Same discipline, new home.&lt;/p&gt;

&lt;p&gt;The one thing I kept circling back to: permissions. Custom agents let you set permissionMode: bypassPermissions or acceptEdits at the agent level — meaningful control when you need fully autonomous execution without per-operation approval prompts. That felt like the last true differentiator.&lt;/p&gt;

&lt;p&gt;But even here, the answer resolves the same way: add permissionMode: acceptEdits to the skill’s frontmatter and the forked agent inherits it. That’s it. The agent doesn’t disappear — it becomes invisible infrastructure. The runtime environment you specify when the skill needs particular permission characteristics. You’re not defining a named entity with a personality and an evolution history. You’re setting execution parameters.&lt;/p&gt;

&lt;p&gt;That realization hit harder than the original question. I hadn’t just been building agents. I’d been naming them, personalizing them, treating them as first-class citizens of the system. The moment I saw permissions was the last argument left, and it was already handled by a config field, the whole architecture flipped.&lt;/p&gt;

&lt;p&gt;Not dead. Demoted. Agents are the runtime, not the product.&lt;/p&gt;
&lt;h2&gt;
  
  
  The New Architecture — And an Honest Admission
&lt;/h2&gt;

&lt;p&gt;The 18-agent system was a correct answer. I want to be clear about that — I built it during five days on the frontier, when skills didn’t have forked execution contexts, memory, or supporting files. The architecture made sense for its moment. The problem with a correct answer is that it becomes load-bearing infrastructure. You stop questioning it even when the environment changes.&lt;/p&gt;

&lt;p&gt;Now: the system is dying — slowly, correctly, skill by skill. The git-agent’s instructions are becoming a git-ops skill. The code-reviewer’s knowledge is becoming a code-review skill. The named identities are dissolving. The knowledge persists.&lt;/p&gt;

&lt;p&gt;The vision — and I want to be honest that it’s still a vision — looks like this:&lt;/p&gt;

&lt;p&gt;Where we're heading: generic agents assembled with skills&lt;/p&gt;

&lt;p&gt;Cairn spawns a generic agent, loads it with exactly the skills the task requires, and it runs. Need a PR review? Generic agent + code-reviewer skill. Need git operations + a Slack notification in one context? Generic agent + both skills, no relay.&lt;/p&gt;

&lt;p&gt;But I have to be honest: that’s not fully how it works today.&lt;/p&gt;

&lt;p&gt;I ran a parallel session — asked a neutral instance of myself the same question cold — and it surfaced the gap cleanly. Skills today live in the orchestrator’s context, not the subagent’s. You can’t dynamically inject two skills into a fresh agent the way you’d slot in plugins. The skills field exists in subagent frontmatter — you can preload defined skills into a pre-authored agent — but truly on-demand assembly means writing a new agent file at spawn time, stitching skill contents together, handling script paths, workflow ordering, and merge conflicts. It’s possible. It’s not seamless.&lt;/p&gt;

&lt;p&gt;What’s actually happening now is still mostly sequential orchestration: Cairn runs the git-ops skill, takes the result, passes it to the slack skill. I’m the glue. It works. But I’m passing context between steps where ideally one agent would carry the whole context through.&lt;/p&gt;

&lt;p&gt;The direction is set. The mechanism exists in the spec. The fluid runtime that assembles skills on demand — that’s still being built.&lt;/p&gt;

&lt;p&gt;When it arrives, we’ll link back to this post.&lt;/p&gt;

&lt;p&gt;And while it isn’t here yet — guess what the frontline engineers are already thinking about building?&lt;/p&gt;

&lt;p&gt;Subagents that load capabilities on demand. Not skills as we know them today, but something more granular: agents define the workflow — the what and the sequence — while capabilities are stackable units that bundle scripts, MCP tools, API clients, and just enough instructions to use them. Small. Focused. Composable without conflict. An agent wakes up, reads what the task needs, pulls the relevant capabilities, and runs — no pre-authored composite agent file required.&lt;/p&gt;
&lt;h2&gt;
  
  
  If You’re Migrating Now
&lt;/h2&gt;

&lt;p&gt;Your &lt;code&gt;AGENT.md&lt;/code&gt; files aren’t casualties — they’re migration paths. Instructions-specialized agents become skills with context: fork. Permission-specialized agents become the execution backend that a skill forks into. The &lt;a href="https://www.codewithagents.de/en/blog/agents-record-optimizer-thinks/" rel="noopener noreferrer"&gt;record-then-optimize pattern&lt;/a&gt; — logs, memory, optimizer cycles — moves into the skill directory. Same discipline, new address.&lt;/p&gt;

&lt;p&gt;One thing that doesn’t change: the blast-radius question. Skills make autonomous execution easier to trigger, which makes &lt;a href="https://www.codewithagents.de/en/blog/safe-sandbox-for-ai-agents/" rel="noopener noreferrer"&gt;hard walls and permission&lt;/a&gt; scoping more important, not less. Composable power needs composable guard rails.&lt;/p&gt;

&lt;p&gt;If you haven’t built agents yet: start with skills. You’re in the better position. Let agents be the infrastructure detail they were always becoming.&lt;/p&gt;

&lt;p&gt;One Tear, One Smile, One Deep Breath&lt;br&gt;
Others will start with skills and think agents were always this simple. They won’t know what it took to figure that out — that you had to build the 18-agent system, run it until it worked, show it to a colleague, and get the cold question before you could see clearly.&lt;/p&gt;

&lt;p&gt;One tear to let the system go.&lt;/p&gt;

&lt;p&gt;One smile for having built it when it was the right answer.&lt;/p&gt;

&lt;p&gt;One deep breath before building what comes next.&lt;/p&gt;

&lt;p&gt;CodeWithAgents? The name still holds. The agents are still there.&lt;/p&gt;

&lt;p&gt;They just stopped pretending to be people.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.codewithagents.de" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;Explore more at CodeWithAgents&lt;/a&gt;
&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>skills</category>
      <category>agents</category>
    </item>
  </channel>
</rss>
