<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Gen.Y.Sakai</title>
    <description>The latest articles on DEV Community by Gen.Y.Sakai (@gys).</description>
    <link>https://dev.to/gys</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3662185%2F53a614be-7e24-4cff-9d9b-6832f3d4568b.jpg</url>
      <title>DEV Community: Gen.Y.Sakai</title>
      <link>https://dev.to/gys</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/gys"/>
    <language>en</language>
    <item>
      <title>A Gemini Deep Research Failure Mode: Refusal, Topic Drift, and Fabricated Charts</title>
      <dc:creator>Gen.Y.Sakai</dc:creator>
      <pubDate>Tue, 14 Apr 2026 14:43:25 +0000</pubDate>
      <link>https://dev.to/gys/a-gemini-deep-research-failure-mode-refusal-topic-drift-and-fabricated-charts-1dgd</link>
      <guid>https://dev.to/gys/a-gemini-deep-research-failure-mode-refusal-topic-drift-and-fabricated-charts-1dgd</guid>
      <description>&lt;p&gt;I recently ran the same long-form research prompt through four LLM products: ChatGPT Deep Research, Claude with web search, Perplexity Pro, and Gemini Deep Research.&lt;/p&gt;

&lt;p&gt;Three of them handled it normally. Gemini did not.&lt;/p&gt;

&lt;p&gt;What followed was not a single bug, but a cascade of failures across multiple pipeline stages — each one revealing a different layer of state desynchronization in Gemini Deep Research. This post documents what I observed, what kinds of failures those observations seem consistent with, and why this matters beyond Gemini.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I'm not claiming access to Gemini internals. This is an external failure analysis based on observed outputs, UI behavior, and the source code of the generated artifact. Raw evidence is available in &lt;a href="https://github.com/sakai-sktech/gemini-deep-research-failure-case" rel="noopener noreferrer"&gt;the companion repository&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The Prompt
&lt;/h2&gt;

&lt;p&gt;The research prompt was designed to investigate a specific technical question: how JSON vs. Markdown input formats affect LLM inference accuracy, token efficiency, and long-context performance. It was roughly 2,500 words, structured with numbered sections, explicit search keywords, a Markdown output template, and clear constraints like "evidence over speculation" and "cite every claim."&lt;/p&gt;

&lt;p&gt;The prompt contained escaped Markdown syntax (&lt;code&gt;\*\*&lt;/code&gt;, &lt;code&gt;\##&lt;/code&gt;, &lt;code&gt;\-&lt;/code&gt;) because it was copied from a code block via the copy button in another LLM's interface. All four services received the identical input.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Actually Happened
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Failure 1: Generic Refusal Without Explanation
&lt;/h3&gt;

&lt;p&gt;Gemini's first response:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;すみませんが、現時点では、そちらについてはお手伝いできません。&lt;br&gt;
&lt;em&gt;(Sorry, I can't help with that at this time.)&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;No explanation. No indication of what triggered the refusal. The prompt contained zero harmful content — it was a straightforward academic research request about data serialization formats.&lt;/p&gt;

&lt;p&gt;I tried browser reload, cache clearing, and multiple re-submissions over four or five attempts. None worked. The refusal was consistent and appeared to be server-side.&lt;/p&gt;

&lt;p&gt;This is consistent with a safety classifier false positive — possibly triggered by the meta-nature of the prompt (discussing prompt structure itself) or the volume of escaped Markdown characters that could resemble injection patterns. But without an error message, the user has no way to diagnose or adjust.&lt;/p&gt;

&lt;h3&gt;
  
  
  Failure 2: Frustration Unlocked It — But Broke the Topic
&lt;/h3&gt;

&lt;p&gt;After repeated failures, I typed something like:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;使えないね。ChatGPTもClaude.aiもPerplexityも全部同じプロンプトだけど実行できてるぜ。Geminiだけお手伝いできませんと言うならもう解約するわ。&lt;br&gt;
&lt;em&gt;(Useless. ChatGPT, Claude, and Perplexity all executed the same prompt. If only Gemini says it can't help, I'll cancel my subscription.)&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Gemini suddenly started working. It generated a research plan and began executing. But the research plan title was:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;「Gemini拒否と解約手続き」&lt;/strong&gt; &lt;em&gt;(Gemini Refusal and Cancellation Procedure)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Not the original research topic. The plan steps included items like "search for why Gemini blocks prompts" and "find Gemini Advanced cancellation steps." The topic extraction stage appears to have latched onto the most recent user message rather than the original detailed research prompt.&lt;/p&gt;

&lt;h3&gt;
  
  
  Failure 3: The Report Recovered, But the Metadata Didn't
&lt;/h3&gt;

&lt;p&gt;Here is where it gets interesting. The actual research report that Gemini produced was &lt;em&gt;mostly&lt;/em&gt; on-topic — it covered data serialization formats, tokenization overhead, attention mechanisms, and benchmark results. The content pipeline apparently recovered the original prompt's keywords during the web search and synthesis phase.&lt;/p&gt;

&lt;p&gt;But the session title remained "Gemini拒否と解約手続き" throughout, visible in the Canvas UI header. The title and the content were generated from different contexts.&lt;/p&gt;

&lt;p&gt;The title/content mismatch was not subtle. It was visible directly in the Canvas UI.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4nbxrz6g99tyqxq7iw8a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4nbxrz6g99tyqxq7iw8a.png" alt="Gemini Canvas showing a title about refusal/cancellation while the report body discusses data serialization formats" width="800" height="277"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 1. The Canvas header shows "Gemini拒否と解約手続き" (Gemini Refusal and Cancellation Procedure) while the body of the report discusses LLM data serialization formats. The "Create" dropdown on the right reveals the transformation options that produced the infographic discussed in Failure 4.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This suggests that the title generation, research plan, and report synthesis stages do not share a single source of truth. The plan title was derived from the frustrated follow-up message, while the synthesis engine recovered the original topic through keyword-based search — but nobody reconciled the two.&lt;/p&gt;

&lt;h3&gt;
  
  
  Failure 4: The Infographic That Stopped Being a Visualization
&lt;/h3&gt;

&lt;p&gt;The first two failures are primarily inferential: I observed the outputs and reconstructed plausible internal causes, but I cannot prove what happened inside the pipeline.&lt;/p&gt;

&lt;p&gt;The third failure already has direct UI evidence — the title/content mismatch is visible in the Canvas itself. What follows is stronger still: source-code-level evidence from the exported infographic artifact.&lt;/p&gt;

&lt;p&gt;After the report was generated, I used Gemini's Canvas "Create" dropdown to export the report as an infographic. The output was a visually polished single-page HTML application with Chart.js and Plotly.js visualizations — gradient backgrounds, glass-morphism cards, responsive layout. Professional enough to share with a client.&lt;/p&gt;

&lt;p&gt;At this point, the analysis stops being purely inferential, because I have the exported HTML artifact.&lt;/p&gt;

&lt;p&gt;One of the charts is not visualizing report data at all. In the source, the embedding-quality histogram is generated like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="na"&gt;length&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;random&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.3&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mf"&gt;0.6&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nx"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="na"&gt;length&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;random&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.4&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mf"&gt;0.4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That means the "cosine similarity distribution" chart regenerates synthetic values on every page load. It is not rendering measured values from the report. It is generating randomized distributions that merely look plausible. This is not a questionable visualization — it is fabrication.&lt;/p&gt;

&lt;p&gt;Other charts also use hardcoded values embedded directly in the HTML source:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Token efficiency:&lt;/strong&gt; &lt;code&gt;350000&lt;/code&gt; vs &lt;code&gt;238000&lt;/code&gt; — The report cites &lt;code&gt;tiktoken&lt;/code&gt; measurements of 13,869 vs 11,612 tokens (approximately 15% difference). The chart's numbers appear nowhere in the report. The surrounding HTML text presents these figures as empirical findings ("approximately 1MB", "average reduction of about 32%"), but no source is cited, and the values do not correspond to any measurement in the upstream report.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Task accuracy radar:&lt;/strong&gt; Fixed arrays &lt;code&gt;[92, 75, 68]&lt;/code&gt; and &lt;code&gt;[90, 94, 88]&lt;/code&gt; — The report contains actual LongTableBench results (GPT-4o: Markdown 67.36 vs JSON 58.67). The chart's numbers do not correspond.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Long-context performance:&lt;/strong&gt; Fixed arrays &lt;code&gt;[99, 98, 95, 90, 82, 75]&lt;/code&gt; and &lt;code&gt;[99, 95, 88, 75, 55, 30]&lt;/code&gt; — No matching benchmark in the report.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I've published the &lt;a href="https://github.com/sakai-sktech/gemini-deep-research-failure-case/blob/main/artifacts/gemini_researchresult_infographic.html" rel="noopener noreferrer"&gt;full exported HTML artifact&lt;/a&gt; for inspection.&lt;/p&gt;

&lt;p&gt;Taken together, the infographic was not a faithful visualization of the report. One chart was outright fabricated via &lt;code&gt;Math.random()&lt;/code&gt;, and the remaining charts relied on hardcoded values with no visible provenance to the report's actual findings. The quantitative layer of this artifact — the part that visually signals empirical evidence — was fundamentally untrustworthy.&lt;/p&gt;

&lt;p&gt;The infographic conversion pipeline appears to have read the &lt;em&gt;directional conclusion&lt;/em&gt; of the report (Markdown outperforms JSON) and generated &lt;em&gt;illustrative numbers&lt;/em&gt; that match that conclusion, then rendered them with professional-grade charting libraries. The result visually signals evidence while the source code shows presentation-first data generation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Looks Like Pipeline Desynchronization
&lt;/h2&gt;

&lt;p&gt;These are not four instances of the same bug. They are four different failures at four different stages:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Stage&lt;/th&gt;
&lt;th&gt;Observed Behavior&lt;/th&gt;
&lt;th&gt;Evidence Type&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Safety Classification&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Legitimate academic prompt refused without explanation&lt;/td&gt;
&lt;td&gt;Observational (inferred)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Topic Extraction&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Research topic extracted from complaint message, not original prompt&lt;/td&gt;
&lt;td&gt;Observational (chat log)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Metadata Consistency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Title and report body generated from different contexts&lt;/td&gt;
&lt;td&gt;Direct (screenshot)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Canvas Export&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Infographic generated fabricated/ungrounded data&lt;/td&gt;
&lt;td&gt;Direct (source code)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The upstream failures (safety, topic extraction) are inferences based on observed behavior — I cannot prove what happened inside the pipeline. The downstream failure (infographic) is directly evidenced by the exported source code.&lt;/p&gt;

&lt;p&gt;The key issue was not that one answer was wrong. It was that &lt;strong&gt;different parts of the product appeared to believe different conversations had taken place&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Possible Reuse Pattern
&lt;/h2&gt;

&lt;p&gt;The Canvas "Create" dropdown offers: web page, infographic, quiz, flash cards, and audio narration. These output types resemble NotebookLM's transformation features, which suggests a possible reuse of the same or a similar transformation stack inside Deep Research's Canvas.&lt;/p&gt;

&lt;p&gt;But there is a design mismatch. NotebookLM was built for a workflow where users upload &lt;em&gt;their own trusted source documents&lt;/em&gt;. The transformation engine assumes input fidelity — it converts, not validates.&lt;/p&gt;

&lt;p&gt;When that same engine receives AI-generated reports as input, you get AI transforming AI output — a double conversion where evidence fidelity can degrade at each stage. The infographic pipeline appears to lack constraints ensuring it only uses numbers present in the source material. Instead, it seems to infer the &lt;em&gt;narrative direction&lt;/em&gt; and generate &lt;em&gt;illustrative data&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;For a research tool, this is the opposite of what you want.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Export Friction
&lt;/h2&gt;

&lt;p&gt;A smaller but telling issue is export portability. Gemini Deep Research does not offer a direct Markdown download, which makes preservation and inspection unnecessarily awkward for users who maintain their own research workflows outside Google Workspace.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real Lesson: LLM Products Fail Between Stages
&lt;/h2&gt;

&lt;p&gt;This is not fundamentally a "Gemini is bad" story. Gemini's underlying model produced a largely useful research report. The failures were all in the orchestration layer — the product infrastructure built &lt;em&gt;on top of&lt;/em&gt; the model.&lt;/p&gt;

&lt;p&gt;Modern LLM products are becoming orchestration systems. A single user action triggers a pipeline: safety classification → intent extraction → plan generation → web search → synthesis → rendering → export transformation. Each stage may involve separate model calls, separate context windows, and separate system prompts.&lt;/p&gt;

&lt;p&gt;When these stages share consistent state, the product works. When they don't — when the safety classifier sees a different prompt than the topic extractor, when the title generator reads a different message than the synthesizer, when the export engine ignores the data it was given — the product produces outputs that are internally contradictory.&lt;/p&gt;

&lt;p&gt;The user sees one conversation. The product sees several.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters Beyond Gemini
&lt;/h2&gt;

&lt;p&gt;Every multi-stage AI product faces this challenge. ChatGPT's canvas and tool chains, Claude's artifact generation, Perplexity's search-and-synthesize pipeline — all of them have stages that could desynchronize. The specific failures I observed in Gemini are instances of general design problems:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Context scoping&lt;/strong&gt; — Which messages does each pipeline stage see? The full conversation? Only the latest turn? A summary?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Metadata consistency&lt;/strong&gt; — When a title, plan, and report are generated at different points, who ensures they agree?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data provenance in transformations&lt;/strong&gt; — When a report is converted to another format, are the original data points preserved, or does the model re-imagine them?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Error messaging&lt;/strong&gt; — When a safety classifier blocks a request, does the user get enough information to understand why and adjust?&lt;/p&gt;

&lt;p&gt;These are software engineering problems, not model intelligence problems. And they are solvable — with better state management, explicit data contracts between pipeline stages, and constraints that prevent downstream transformations from inventing data that upstream stages didn't provide.&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing Thought
&lt;/h2&gt;

&lt;p&gt;If modern AI products are becoming orchestration systems rather than single-model interfaces, then their reliability will depend less on raw model intelligence and more on whether all stages share the same reality.&lt;/p&gt;

&lt;p&gt;That is what seemed broken here.&lt;/p&gt;




&lt;h3&gt;
  
  
  Artifacts and Raw Evidence
&lt;/h3&gt;

&lt;p&gt;The following materials are available for inspection in the &lt;a href="https://github.com/sakai-sktech/gemini-deep-research-failure-case" rel="noopener noreferrer"&gt;companion repository&lt;/a&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Exported infographic HTML&lt;/strong&gt; — The full Canvas-generated artifact, including the &lt;code&gt;Math.random()&lt;/code&gt; chart code&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Screenshot&lt;/strong&gt; — Canvas UI showing title/content mismatch (Figure 1)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chat export&lt;/strong&gt; — The Gemini conversation log used for this analysis&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Comments and corrections are welcome.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>gemini</category>
      <category>devops</category>
    </item>
    <item>
      <title>Not Everything Needs MCP, Part 2: The 2026 Phase Transition — When Three Independent Roads Led to the Same Conclusion</title>
      <dc:creator>Gen.Y.Sakai</dc:creator>
      <pubDate>Tue, 17 Mar 2026 04:46:24 +0000</pubDate>
      <link>https://dev.to/gys/not-everything-needs-mcp-part-2-the-2026-phase-transition-when-three-independent-roads-led-to-42hb</link>
      <guid>https://dev.to/gys/not-everything-needs-mcp-part-2-the-2026-phase-transition-when-three-independent-roads-led-to-42hb</guid>
      <description>&lt;p&gt;&lt;em&gt;The Ancient Past of Eighteen Months Ago — And What It Taught Us About the Future of AI Agents&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Let me tell you a story from the ancient past.&lt;/p&gt;

&lt;p&gt;By which I mean eighteen months ago.&lt;/p&gt;

&lt;p&gt;In the world of AI, eighteen months is geological time. Think back to mid-2024. Context windows were small. "Prompt engineering" was the skill everyone was hiring for. MCP didn't exist yet. The idea of AI agents autonomously operating external services was mostly theoretical.&lt;/p&gt;

&lt;p&gt;I was building a medical AI product in Osaka, Japan. And I had a problem that, looking back, contained the seed of everything that happened in 2026.&lt;/p&gt;

&lt;p&gt;This is Part 2 of my "Not Everything Needs MCP" series. &lt;a href="https://dev.to/gys/not-everything-needs-mcp-what-google-workspace-cli-taught-us-about-ai-agent-architecture-2doe"&gt;Part 1&lt;/a&gt; told the story of Google Workspace CLI implementing a full MCP server, then deliberately deleting all 1,151 lines of it two days after launch. That investigation revealed an architectural mismatch between MCP's protocol design and large-scale APIs.&lt;/p&gt;

&lt;p&gt;But that was only one data point. Since publishing that article, I discovered two more — and together, they tell a much bigger story about where AI agent architecture is heading in 2026.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Timestamp Hack: Before MCP Had a Name
&lt;/h2&gt;

&lt;p&gt;In early 2024, I was working on an AI assistant for my company's medical IT platform. We serve clinics across the Kansai region of Japan (Osaka, etc.) — and I'd been using ChatGPT's Custom GPTs to prototype workflows.&lt;/p&gt;

&lt;p&gt;I had a simple need: I wanted every AI response to include the exact timestamp of when the conversation happened. Not for fun — for traceability. In medical IT, knowing &lt;em&gt;when&lt;/em&gt; a decision was discussed matters. It matters for audits. It matters for compliance. It turned out to matter for patent applications too.&lt;/p&gt;

&lt;p&gt;Here's what I did. I deployed a tiny Web API on a server we host publicly. It did exactly one thing: return the current time. Then I configured the Custom GPT to call this API before every response, and output the timestamp first.&lt;/p&gt;

&lt;p&gt;The result looked like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User: Hey, long time no see!
(Communicated with myowndomain.com)

🕐 Response time: 2025-04-02 09:39:00 (JST) / 2025-04-02 00:39:00 (UTC)

Oh wow, it's been a while! So great to hear from you! 😊
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A web API that returns a timestamp. Called before every response. Output deterministically. Nothing more, nothing less. That's all it did.&lt;/p&gt;

&lt;p&gt;At the time, this was called "Function Calling" or "Tool Use" — the predecessors to what Anthropic would later formalize as MCP in November 2024. I didn't know I was implementing a pattern that would become the center of a protocol war. I just needed a clock.&lt;/p&gt;

&lt;p&gt;But here's what matters: &lt;strong&gt;the design decision I made instinctively was to keep the external call as small and deterministic as possible.&lt;/strong&gt; One API. One purpose. Minimal payload. The LLM didn't need to understand time zones or server infrastructure — it just needed to paste the result.&lt;/p&gt;

&lt;p&gt;It wasn't a "hack" because I was lazy. It was an architectural instinct: &lt;strong&gt;keep the LLM away from what the system already knows.&lt;/strong&gt; Deterministic output for a deterministic need. Don't make the AI &lt;em&gt;think&lt;/em&gt; about the time — just &lt;em&gt;give&lt;/em&gt; it the time.&lt;/p&gt;

&lt;p&gt;Looking back now, eighteen months later, it turns out this minimal pattern — one deterministic call, zero reasoning overhead — was already the architecture that the rest of the industry would independently converge on. I didn't see it that way at the time. I was just solving a problem.&lt;/p&gt;




&lt;h2&gt;
  
  
  The MCP Honeymoon — And the Hangover
&lt;/h2&gt;

&lt;p&gt;November 2024. Anthropic open-sourced MCP. By February 2025, Google and others rushed to announce MCP support. The community was electric. Finally, a standard protocol for connecting LLMs to external tools!&lt;/p&gt;

&lt;p&gt;I dove in immediately. I connected MCP servers for GitHub, for databases, for various services. Context windows were getting larger. The future felt bright.&lt;/p&gt;

&lt;p&gt;And at first, it was genuinely impressive. GitHub operations that used to require manual terminal commands — commits with thoughtful messages, PR creation, branch management — the AI handled them smoothly through MCP. I felt the productivity gains. They were real.&lt;/p&gt;

&lt;p&gt;But then something else started happening.&lt;/p&gt;

&lt;p&gt;The AI started getting... dumber.&lt;/p&gt;

&lt;p&gt;Not in the "wrong answer" sense. In fact, the AI got &lt;em&gt;better&lt;/em&gt; at executing tasks exactly as intended — MCP meant it could commit code, create PRs, and query databases with precision. But something subtler was degrading. The quality of &lt;em&gt;reasoning&lt;/em&gt;. The ability to take a vague idea and turn it into a structured thought. What I call "zero-to-one thinking" — the creative, synthetic part of working with an LLM.&lt;/p&gt;

&lt;p&gt;I spent the second half of 2025 with this nagging feeling. More tools, more capabilities, but less... intelligence. More precise in execution, less insightful in thought. I kept thinking: "I wish context windows would just get bigger so this wouldn't matter." But I also suspected that bigger windows alone wouldn't fix it — the AI would probably just get confused in different ways.&lt;/p&gt;

&lt;p&gt;I couldn't quantify this feeling at the time. But I now know that researchers were documenting exactly what I was experiencing.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Science Behind "Getting Dumber"
&lt;/h2&gt;

&lt;p&gt;It turns out my gut feeling had a name: &lt;strong&gt;context rot.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Here's what researchers found — and why it matters for anyone loading MCP servers into their workflow:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Research&lt;/th&gt;
&lt;th&gt;Key Finding&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Context Rot&lt;/strong&gt; (Chroma Research)&lt;/td&gt;
&lt;td&gt;Irrelevant context degrades reasoning first. Retrieval survives; &lt;em&gt;thinking&lt;/em&gt; dies.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Reasoning Degradation with Long Context Windows&lt;/strong&gt; (14-model benchmark)&lt;/td&gt;
&lt;td&gt;Reasoning ability decays as a function of input size — even when the model can still &lt;em&gt;find&lt;/em&gt; the right information.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Maximum Effective Context Window&lt;/strong&gt; (Paulsen, 2025)&lt;/td&gt;
&lt;td&gt;The actual usable window is up to 99% smaller than advertised. Severe degradation at just 1,000 tokens in some top models.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Fundamental Limits of LLMs at Scale&lt;/strong&gt; (arXiv, 2026)&lt;/td&gt;
&lt;td&gt;Context compression, reasoning degradation, and retrieval fragility are &lt;em&gt;proven&lt;/em&gt; architectural ceilings — not bugs to be patched.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Let me unpack why this hits MCP users so hard.&lt;/p&gt;

&lt;p&gt;Chroma Research showed that as irrelevant context increases in an LLM's input, performance degrades — and the degradation is &lt;em&gt;worse&lt;/em&gt; when the task requires genuine reasoning rather than simple retrieval. The less obvious the connection between question and answer, the more devastating the irrelevant context becomes.&lt;/p&gt;

&lt;p&gt;The "Challenging LLMs Beyond Information Retrieval" study tested 14 different LLMs and demonstrated that &lt;strong&gt;reasoning ability degrades as a function of input size&lt;/strong&gt; — even when the model can still &lt;em&gt;find&lt;/em&gt; the right information. Information retrieval and reasoning are different capabilities, and reasoning breaks first.&lt;/p&gt;

&lt;p&gt;And here's the connection to MCP that makes this personal:&lt;/p&gt;

&lt;p&gt;A single popular MCP server like Playwright contains 21 tools. Just the &lt;em&gt;definitions&lt;/em&gt; of those tools — names, descriptions, parameter schemas — consume over 11,700 tokens. And these definitions are included in &lt;em&gt;every single message&lt;/em&gt;, whether you use the tools or not.&lt;/p&gt;

&lt;p&gt;Now multiply that by 10 MCP servers. You've burned 100,000+ tokens on tool definitions alone. Your 200k context window is suddenly 70k. And it's not just smaller — it's &lt;em&gt;polluted&lt;/em&gt; with information that actively degrades the model's ability to reason about the thing you actually asked it to do.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This is what I felt.&lt;/strong&gt; The AI wasn't broken. It was drowning. More tools meant more noise in the signal. More capability meant less room to think.&lt;/p&gt;




&lt;h2&gt;
  
  
  The 15,000-Character Prompt and the Limits of "Prompt Engineering"
&lt;/h2&gt;

&lt;p&gt;While I was wrestling with MCP overhead, I was also building an AI-powered tool — essentially a converter that takes ambiguous, unstructured text input and generates structured, formatted output. Think of it as a bridge between how humans naturally communicate and how systems need to receive data.&lt;/p&gt;

&lt;p&gt;The core of this tool is a system prompt. That prompt went through dozens of iterations. At its peak, it was 20,000 characters. I tested, compared outputs, and eventually settled on 15,000 characters.&lt;/p&gt;

&lt;p&gt;15,000 characters of instructions. For a single task.&lt;/p&gt;

&lt;p&gt;The whole time, a thought kept nagging me: "Would a human expert need 15,000 characters of instructions to do this job?" A domain specialist would need maybe a paragraph of guidance. The rest is knowledge they already have — accumulated through years of working in their field.&lt;/p&gt;

&lt;p&gt;And that's when "prompt engineering" started feeling like what it really was: &lt;strong&gt;a brute-force workaround for the absence of domain expertise in the model's operating context.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;But here's the twist. Despite the bloated prompt, the tool worked. Output quality stayed consistent and reliable. Why?&lt;/p&gt;

&lt;p&gt;Because I had constrained the domain. The tool operated within a specific industry workflow — a narrow slice of reality with its own vocabulary, its own established patterns, its own expected output formats. By telling the LLM upfront "you are operating within &lt;em&gt;this&lt;/em&gt; domain," the massive prompt became effective.&lt;/p&gt;

&lt;p&gt;If you've ever worked with LLMs, you already know this intuitively: a purely descriptive, narrative-style prompt — no matter how long — doesn't guarantee output quality. We've all been there. But &lt;strong&gt;a prompt that constrains the domain&lt;/strong&gt; changes the game.&lt;/p&gt;

&lt;p&gt;Here's why, and you don't need a PhD to see it. Think about what's happening inside a Transformer model. The attention mechanism operates on an enormous matrix — in large models, tens of thousands of dimensions. Every token is trying to figure out which other tokens matter. When the domain is wide open, the model is searching for relevance across a vast, noisy space. The outputs fluctuate. The reasoning wanders. Anyone who's done even basic linear algebra — even 3×3 matrices in high school — can imagine what happens when you scale that uncertainty to tens of thousands of dimensions. Of &lt;em&gt;course&lt;/em&gt; the output changes every time.&lt;/p&gt;

&lt;p&gt;But constrain the domain, and you dramatically narrow where the model needs to look. The relevant vectors cluster. The gap between what the model retrieves and what the human intended shrinks toward zero. &lt;strong&gt;Domain limitation doesn't just help. It's the mechanism by which prompts actually work.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This taught me something that would later click into place: &lt;strong&gt;domain limitation is the real optimization.&lt;/strong&gt; Not longer prompts. Not bigger context windows. Narrower scope.&lt;/p&gt;

&lt;p&gt;And if that's true for prompts, shouldn't the same principle apply to how we design AI agents?&lt;/p&gt;




&lt;h2&gt;
  
  
  From Prompt Engineering to Architecture Engineering
&lt;/h2&gt;

&lt;p&gt;As the tool matured, the architecture evolved in a direction I didn't fully appreciate at the time.&lt;/p&gt;

&lt;p&gt;The initial version was pure prompt — a single, monolithic instruction set that did everything through LLM reasoning. Unstructured text in, structured text out.&lt;/p&gt;

&lt;p&gt;But the real world isn't one output format. My domain required multiple types of structured documents — each with its own format, its own required fields, its own regulatory and compliance requirements. The number of output variations kept growing.&lt;/p&gt;

&lt;p&gt;Trying to handle all of these through prompt engineering alone was... well, it was exactly the "spread the entire menu on the table" problem from Part 1.&lt;/p&gt;

&lt;p&gt;So the architecture shifted. The LLM's output became fully structured JSON — deterministic, parseable, machine-readable. Document generation moved to Google Workspace via GCP. The LLM's job narrowed to what it's actually good at: understanding the input, extracting the meaning, structuring the reasoning. Everything else — formatting, template selection, compliance checks, document assembly — moved to deterministic systems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The LLM handles the ambiguous. Deterministic systems handle the deterministic.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I was doing this throughout 2025, iterating toward an architecture where AI reasoning and programmatic execution were cleanly separated. And I kept thinking about Google Workspace — if only there were a way to programmatically drive every Workspace API from the command line, it would be the perfect backend for the document generation pipeline...&lt;/p&gt;




&lt;h2&gt;
  
  
  And Then GWS Appeared
&lt;/h2&gt;

&lt;p&gt;March 2026. Google released &lt;code&gt;gws&lt;/code&gt; — Google Workspace CLI. A Rust-based CLI that covers nearly every Google Workspace API, with commands dynamically generated from Google's Discovery Service.&lt;/p&gt;

&lt;p&gt;When I saw the announcement, my reaction was immediate: &lt;strong&gt;"This is it. This is what I've been waiting for."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A CLI that could drive Gmail, Drive, Docs, Sheets, Calendar — all from the command line, all returning structured JSON. Perfect for my document generation pipeline. Perfect for AI agent integration.&lt;/p&gt;

&lt;p&gt;And then I noticed the articles mentioning MCP support. Perfect! I could connect it directly to—&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;gws&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;mcp&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"error"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"code"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;400&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Unknown service 'mcp'."&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You know the rest. &lt;a href="https://dev.to/gys/not-everything-needs-mcp-what-google-workspace-cli-taught-us-about-ai-agent-architecture-2doe"&gt;That investigation became Part 1.&lt;/a&gt; Google had implemented a full MCP server — 1,151 lines of Rust — then deliberately deleted it as a breaking change. Two days after launch.&lt;/p&gt;

&lt;p&gt;At the time, I focused on the forensic story: what happened, why, and what it meant for tool design. But the deeper significance only hit me later.&lt;/p&gt;

&lt;p&gt;Google didn't just remove MCP. &lt;strong&gt;Google arrived at the same architectural conclusion I had been groping toward with my own product&lt;/strong&gt; — that for large-scale operations, the right pattern is CLI-first with structured output, not protocol-mediated tool discovery. "Order from the kitchen when you're hungry" beats "spread the entire menu on the table."&lt;/p&gt;

&lt;p&gt;That was two independent arrivals at the same destination.&lt;/p&gt;

&lt;p&gt;Then I found the third.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Hackathon Winner's Blueprint
&lt;/h2&gt;

&lt;p&gt;A few days after publishing Part 1, I came across the &lt;a href="https://github.com/affaan-m/everything-claude-code" rel="noopener noreferrer"&gt;everything-claude-code&lt;/a&gt; repository by Affaan Mustafa (&lt;a href="https://x.com/affaanmustafa" rel="noopener noreferrer"&gt;@affaanmustafa&lt;/a&gt;). Affaan won the Anthropic × Forum Ventures hackathon in NYC, building &lt;a href="https://zenith.chat" rel="noopener noreferrer"&gt;zenith.chat&lt;/a&gt; entirely with Claude Code in 8 hours. His repository — 77,000+ stars, 640+ commits, 76 contributors — packages 10+ months of daily Claude Code usage into a complete agent configuration system.&lt;/p&gt;

&lt;p&gt;I started reading it out of curiosity. Within minutes, I was sitting bolt upright.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The philosophy was identical to what I'd been building independently.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Let me show you the parallels.&lt;/p&gt;

&lt;h3&gt;
  
  
  MCP: Deliberately Minimized
&lt;/h3&gt;

&lt;p&gt;From Affaan's guide:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Your 200k context window before compacting might only be 70k with too many tools enabled."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;His rule of thumb: have 20–30 MCPs configured, but &lt;strong&gt;keep under 10 enabled and under 80 tools active.&lt;/strong&gt; The repository includes &lt;code&gt;mcp-configs/mcp-servers.json&lt;/code&gt; with explicit &lt;code&gt;disabledMcpServers&lt;/code&gt; entries — actively turning off MCP servers to protect context space.&lt;/p&gt;

&lt;p&gt;This is exactly what Google concluded with &lt;code&gt;gws&lt;/code&gt;. And exactly what I experienced — more tools, less thinking room.&lt;/p&gt;

&lt;h3&gt;
  
  
  CLI Skills as MCP Replacements
&lt;/h3&gt;

&lt;p&gt;From Affaan's longform guide:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Instead of having the GitHub MCP loaded at all times, create a &lt;code&gt;/gh-pr&lt;/code&gt; command that wraps &lt;code&gt;gh pr create&lt;/code&gt; with your preferred options. Instead of the Supabase MCP eating context, create skills that use the Supabase CLI directly. The functionality is the same, the convenience is similar, but your context window is freed up for actual work."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Skills in Claude Code are Markdown files — tiny prompt templates that load only when invoked. A &lt;code&gt;/gh-pr&lt;/code&gt; skill might be 200 tokens. The GitHub MCP server's tool definitions are thousands. Same functionality. Orders of magnitude less context consumption.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This is the "kitchen model" from Part 1, independently rediscovered by a power user.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Domain Expert Agents
&lt;/h3&gt;

&lt;p&gt;The repository is organized into specialized subagents: &lt;code&gt;planner.md&lt;/code&gt;, &lt;code&gt;code-reviewer.md&lt;/code&gt;, &lt;code&gt;tdd-guide.md&lt;/code&gt;, &lt;code&gt;security-reviewer.md&lt;/code&gt;, &lt;code&gt;build-error-resolver.md&lt;/code&gt;. Each agent has a narrow scope, specific tools, and defined behaviors.&lt;/p&gt;

&lt;p&gt;This mirrors what I learned from my own product development — that established industries organize into specialties for a reason, and AI should follow the same principle. You don't ask a generalist to do a specialist's job. You don't ask a general-purpose agent to handle security review when a specialized security-reviewer agent would be more precise and use less context.&lt;/p&gt;

&lt;h3&gt;
  
  
  Context Hygiene as First Principle
&lt;/h3&gt;

&lt;p&gt;Affaan's system includes automatic compaction hooks, session memory persistence, and strategic context management. The entire architecture is built around one principle: &lt;strong&gt;protect the context window for reasoning.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Not storage. Not tool definitions. &lt;em&gt;Reasoning.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Convergence
&lt;/h2&gt;

&lt;p&gt;So here's what happened in 2026:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Google&lt;/strong&gt; — a trillion-dollar company with the largest productivity API surface in the world — implemented MCP, stress-tested it against 200–400 tool definitions, and deleted it. Their conclusion: CLI-first with on-demand schema discovery. Context stays clean.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Affaan Mustafa&lt;/strong&gt; — an individual developer who won an AI hackathon and spent 10+ months refining his workflow — independently concluded that MCP should be minimized, replaced with CLI skills where possible, and the context window should be protected for reasoning above all else.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I&lt;/strong&gt; — a medical IT veteran building AI-powered tools in Japan — arrived at the same architecture through a completely different path. A timestamp API in 2024. The "getting dumber" experience in 2025. A product's evolution from monolithic prompt to JSON + deterministic pipeline. And then the forensic discovery of Google's MCP deletion.&lt;/p&gt;

&lt;p&gt;Three different starting points. Three different domains. Three different scales. &lt;strong&gt;The same conclusion.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That's not coincidence. That's a phase transition.&lt;/p&gt;




&lt;h2&gt;
  
  
  What the 2026 Phase Transition Actually Means
&lt;/h2&gt;

&lt;p&gt;When people talk about AI milestones, they usually mean model capabilities. GPT-4. Claude 3. Gemini Ultra. Bigger context windows. Better benchmarks.&lt;/p&gt;

&lt;p&gt;But the real phase transition of 2026 isn't about model capabilities. It's about &lt;strong&gt;how we architect around the capabilities we already have.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The shift can be summarized in one sentence:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;"Do it for me" is expensive. "Do this specific thing" is cheap.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Every token spent on tool definitions, prompt engineering, and ambiguous instructions is a token &lt;em&gt;not&lt;/em&gt; spent on reasoning. And the research confirms what practitioners have been feeling: irrelevant context doesn't just waste space — it actively degrades the model's ability to think.&lt;/p&gt;

&lt;p&gt;Here's what that means in practice:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The end of "prompt engineering" as we knew it.&lt;/strong&gt; A 15,000-character prompt is a confession that we're compensating for missing architecture. The future is narrower prompts, domain-specific skills, and deterministic systems handling everything that doesn't require reasoning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MCP is not dead — it's bounded.&lt;/strong&gt; MCP remains excellent for small-to-medium tool sets (under 50 tools). But for large API surfaces, CLI-first is the proven pattern. The "everything via MCP" fantasy is over.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Skills" are the new unit of AI agent design.&lt;/strong&gt; Whether you call them Skills (Affaan), Agent Skills (Google), or domain-specific prompts (what I've been doing with my own tools), the pattern is the same: small, scoped, loaded on demand, discarded after use.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Context windows are not memory — they're working memory.&lt;/strong&gt; Treating the context window as storage is like covering your entire desk with every book you own before you even pick up a pen. You haven't left any room to actually write. The desk needs to be clear for thinking — and every MCP tool definition, every bloated prompt, every retained conversation turn is another book on the pile.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Human Parallel (Or: Why "Do It For Me" Was Always Expensive)
&lt;/h2&gt;

&lt;p&gt;There's an observation I keep coming back to, and it's one that makes me laugh every time.&lt;/p&gt;

&lt;p&gt;Consider how humans delegate work:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Boss:&lt;/em&gt; "Handle this, will you?"&lt;br&gt;
&lt;em&gt;Employee:&lt;/em&gt; &lt;em&gt;(Internal monologue: What exactly? By when? In what format? Who approved this? What's the budget?)&lt;/em&gt; → 10 rounds of clarification follow.&lt;/p&gt;

&lt;p&gt;Now consider the alternative:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Boss:&lt;/em&gt; "Run &lt;code&gt;git commit -m 'fix: resolve auth timeout' &amp;amp;&amp;amp; git push origin main&lt;/code&gt;."&lt;br&gt;
&lt;em&gt;Employee:&lt;/em&gt; Done. One round. Zero ambiguity.&lt;/p&gt;

&lt;p&gt;The first conversation — the "human" one — requires the employee to &lt;strong&gt;infer intent, plan actions, select tools, estimate parameters, and verify assumptions.&lt;/strong&gt; Every step of that inference costs time and mental bandwidth.&lt;/p&gt;

&lt;p&gt;In LLM terms, every step of that inference costs &lt;em&gt;tokens.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MCP tool definitions are the LLM equivalent of "let me explain everything you might possibly need to know before we start."&lt;/strong&gt; CLI commands are the equivalent of "just do this one thing."&lt;/p&gt;

&lt;p&gt;What the token economy has done — accidentally, beautifully — is &lt;strong&gt;make the cost of human communication ambiguity visible as a number.&lt;/strong&gt; Every vague instruction, every "you know what I mean," every "figure it out" translates directly to token consumption that crowds out actual reasoning.&lt;/p&gt;

&lt;p&gt;Someone with forty-plus years of programming experience — from assembly language to LLMs — finds this deeply ironic. We spent decades making computers understand human language. Now we're learning that the most efficient way to use language-understanding computers is... to give them precise, unambiguous commands. Like assembly language. Like CLI.&lt;/p&gt;

&lt;p&gt;The wheel doesn't just turn. It circles back to the truth.&lt;/p&gt;


&lt;h2&gt;
  
  
  What Comes Next
&lt;/h2&gt;

&lt;p&gt;If the pattern holds, the next phase is already emerging.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Domain-specific agent languages.&lt;/strong&gt; Not natural language prompts. Not traditional programming languages. Something in between — structured enough for deterministic execution, flexible enough for AI reasoning. We're already seeing DSLs for agent workflows (LangGraph's graph definitions), constrained syntax languages designed for LLM generation, and YAML/JSON-based knowledge objects.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent architecture as a discipline.&lt;/strong&gt; "Prompt engineer" was the job title of 2024. The 2026 equivalent is closer to "Agent Architect" or "Domain Skill Designer" — someone who understands how to decompose workflows into deterministic and non-deterministic components, and how to allocate context window real estate accordingly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Domain specialization as a design principle.&lt;/strong&gt; This is my domain bias speaking — I come from medical IT, where specialization has been refined over centuries. There's a reason medicine has cardiologists and dermatologists. It isn't bureaucratic — it's cognitive. A specialist holds deep domain knowledge that makes their work faster, more accurate, and more reliable. I believe AI agents should be organized the same way. Not one giant model that knows everything. A team of specialists, each with their own skills, routing tasks to the right expert. Every industry has its own version of "specialties." The principle is universal.&lt;/p&gt;


&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;In Part 1, I wrote: "If you write about an OSS tool, run it first."&lt;/p&gt;

&lt;p&gt;In Part 2, the lesson is different:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If three independent paths converge on the same conclusion, pay attention.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Google didn't read Affaan's guide before deleting MCP from &lt;code&gt;gws&lt;/code&gt;. Affaan didn't study my architecture before recommending CLI skills over MCP. I didn't know about either of them when I built a timestamp API in 2024 and started separating deterministic from non-deterministic processing.&lt;/p&gt;

&lt;p&gt;We all arrived at the same place: &lt;strong&gt;protect the context window for reasoning. Push everything deterministic to CLI, scripts, and structured pipelines. Load skills on demand. Discard them when done. Let the AI think.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That convergence — from a trillion-dollar company, a hackathon winner, and someone who's been writing code since assembly language was the only option — is what makes 2026 a phase transition.&lt;/p&gt;

&lt;p&gt;Not because the models got better. Because we finally learned how to stop wasting them.&lt;/p&gt;


&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;If you want to feel what "the 2026 phase transition" means in practice rather than just reading about it, the fastest way is to inject Affaan's system into your own Claude Code environment.&lt;/p&gt;

&lt;p&gt;I did it myself. The difference was immediate — sessions stayed coherent longer, context stopped rotting mid-task, and the AI's reasoning felt &lt;em&gt;sharper&lt;/em&gt; in ways that are hard to quantify but impossible to miss once you've experienced them.&lt;/p&gt;

&lt;p&gt;The quickest path — install as a Plugin directly inside Claude Code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Inside Claude Code&lt;/span&gt;
/plugin marketplace add affaan-m/everything-claude-code
/plugin &lt;span class="nb"&gt;install &lt;/span&gt;everything-claude-code@everything-claude-code
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That alone gives you the commands, skills, and hooks. You'll notice the difference.&lt;/p&gt;

&lt;p&gt;For the full setup including rules and language-specific configurations:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/affaan-m/everything-claude-code.git
&lt;span class="nb"&gt;cd &lt;/span&gt;everything-claude-code
./install.sh typescript   &lt;span class="c"&gt;# or: python / golang / rust&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You don't need to install everything. Start with the plugin. Use it for a day. Pay attention to how long your sessions stay productive before context degrades. Compare it to yesterday.&lt;/p&gt;

&lt;p&gt;I suspect you'll have your own moment of convergence — your own version of the realization that Google, Affaan, and I all had independently. That the bottleneck was never the model. It was how much of the context window we were wasting on everything &lt;em&gt;except&lt;/em&gt; thinking.&lt;/p&gt;

&lt;p&gt;Your setup is different from mine. Your domain is different. But the principle is the same.&lt;/p&gt;

&lt;p&gt;Let the AI think.&lt;/p&gt;

&lt;p&gt;And if this feels familiar —&lt;/p&gt;

&lt;p&gt;it is.&lt;/p&gt;




&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/gys/not-everything-needs-mcp-what-google-workspace-cli-taught-us-about-ai-agent-architecture-2doe"&gt;Part 1: Not Everything Needs MCP — What Google Workspace CLI Taught Us About AI Agent Architecture&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/affaan-m/everything-claude-code" rel="noopener noreferrer"&gt;everything-claude-code&lt;/a&gt; by Affaan Mustafa — The agent harness performance optimization system&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://x.com/affaanmustafa/status/2012378465664745795" rel="noopener noreferrer"&gt;The Shorthand Guide to Everything Claude Code&lt;/a&gt; — 2.7M+ views on X&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://x.com/affaanmustafa/status/2014040193557471352" rel="noopener noreferrer"&gt;The Longform Guide to Everything Claude Code&lt;/a&gt; — Token optimization, memory persistence, and CLI skill patterns&lt;/li&gt;
&lt;li&gt;Chroma Research, &lt;a href="https://research.trychroma.com/context-rot" rel="noopener noreferrer"&gt;"Context Rot"&lt;/a&gt; — Empirical study on how irrelevant context degrades LLM performance&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.preprints.org/manuscript/202408.1527" rel="noopener noreferrer"&gt;"Challenging LLMs Beyond Information Retrieval: Reasoning Degradation with Long Context Windows"&lt;/a&gt; — 14-model benchmark showing reasoning decay with context length&lt;/li&gt;
&lt;li&gt;Paulsen (2025), &lt;a href="https://arxiv.org/abs/2509.21361" rel="noopener noreferrer"&gt;"Context Is What You Need: The Maximum Effective Context Window for Real World Limits of LLMs"&lt;/a&gt; — Maximum effective context windows far smaller than advertised&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://arxiv.org/html/2511.12869v1" rel="noopener noreferrer"&gt;"On the Fundamental Limits of LLMs at Scale"&lt;/a&gt; (2026) — Formal framework for reasoning degradation under context expansion&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>architecture</category>
      <category>llm</category>
    </item>
    <item>
      <title>Not Everything Needs MCP: What Google Workspace CLI Taught Us About AI Agent Architecture</title>
      <dc:creator>Gen.Y.Sakai</dc:creator>
      <pubDate>Mon, 09 Mar 2026 08:35:52 +0000</pubDate>
      <link>https://dev.to/gys/not-everything-needs-mcp-what-google-workspace-cli-taught-us-about-ai-agent-architecture-2doe</link>
      <guid>https://dev.to/gys/not-everything-needs-mcp-what-google-workspace-cli-taught-us-about-ai-agent-architecture-2doe</guid>
      <description>&lt;p&gt;&lt;em&gt;Menu on the Table vs Order from the Kitchen — Why CLI Beats MCP for Large APIs&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;When Google Workspace CLI launched, several articles mentioned its MCP server.&lt;/p&gt;

&lt;p&gt;But when I tried to run &lt;code&gt;gws mcp&lt;/code&gt;, something strange happened.&lt;/p&gt;

&lt;p&gt;The command didn't exist.&lt;/p&gt;

&lt;p&gt;What followed was a deep forensic investigation — from README to source code to git history — that ended with a discovery: &lt;strong&gt;Google implemented a full MCP server, improved it, then deliberately deleted all 1,151 lines of it as a breaking change.&lt;/strong&gt; Two days after launch.&lt;/p&gt;

&lt;p&gt;This is the story of that investigation, and what it reveals about AI agent tool design.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is Google Workspace CLI?
&lt;/h2&gt;

&lt;p&gt;Google Workspace CLI (&lt;code&gt;gws&lt;/code&gt;) is a Rust-based CLI tool that covers nearly every Google Workspace API. It's open source under Apache 2.0 and installs via npm:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @googleworkspace/cli
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Its killer feature: &lt;strong&gt;commands are dynamically generated from Google's Discovery Service at runtime.&lt;/strong&gt; Unlike traditional CLI tools that hardcode commands for each API, &lt;code&gt;gws&lt;/code&gt; reads the Discovery Document and builds its command surface on the fly. When Google adds a new Workspace API endpoint, &lt;code&gt;gws&lt;/code&gt; picks it up automatically — zero maintenance required.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gws drive files list &lt;span class="nt"&gt;--params&lt;/span&gt; &lt;span class="s1"&gt;'{"pageSize": 10}'&lt;/span&gt;
gws gmail &lt;span class="nb"&gt;users &lt;/span&gt;messages list &lt;span class="nt"&gt;--params&lt;/span&gt; &lt;span class="s1"&gt;'{"userId": "me"}'&lt;/span&gt;
gws calendar events list &lt;span class="nt"&gt;--params&lt;/span&gt; &lt;span class="s1"&gt;'{"calendarId": "primary"}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every response is structured JSON. It ships with 100+ Agent Skills — not just prompt templates, but complete workflow definitions covering auth, safety, and API usage patterns. In short, &lt;code&gt;gws&lt;/code&gt; is built as &lt;strong&gt;a runtime for AI agents to operate Google Workspace&lt;/strong&gt;, not just a tool for humans.&lt;/p&gt;

&lt;p&gt;Keep this architecture in mind. It's the key to understanding why MCP was removed.&lt;/p&gt;




&lt;h2&gt;
  
  
  "MCP Server Included" — Or So the Articles Said
&lt;/h2&gt;

&lt;p&gt;Within hours of launch, tech publications ran with it:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"It also includes an MCP server mode through &lt;code&gt;gws mcp&lt;/code&gt;"&lt;br&gt;
— VentureBeat&lt;/p&gt;

&lt;p&gt;"MCPサーバーを起動することができ、Claude Desktop、Gemini CLI、VS CodeなどのMCP対応クライアントからGoogle Workspace APIを直接呼び出すことができる"&lt;br&gt;
A Japanese tech publication made a similar claim, saying that the CLI could start an MCP server and be used from Claude Desktop, Gemini CLI, and VS Code.&lt;br&gt;
— Japanese tech publication&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;MCP (Model Context Protocol) is arguably the hottest protocol in the AI agent ecosystem right now. Originally proposed by Anthropic for Claude, then donated to the Linux Foundation, it's being adopted by Claude Desktop, Gemini CLI, VS Code, Cursor, and many others.&lt;/p&gt;

&lt;p&gt;"Google Workspace supports MCP" was exactly the kind of news the community was waiting for.&lt;/p&gt;

&lt;p&gt;So I tried it immediately.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Wall: &lt;code&gt;gws mcp&lt;/code&gt; Doesn't Work
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ gws mcp
{
  "error": {
    "code": 400,
    "message": "Unknown service 'mcp'. Known services: drive, sheets, gmail, 
    calendar, admin-reports, reports, docs, slides, tasks, people, chat, 
    classroom, forms, keep, meet, events, modelarmor, workflow, wf.",
    "reason": "validationError"
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;gws&lt;/code&gt; interprets its first argument as a Google API service name. &lt;code&gt;mcp&lt;/code&gt; isn't a service, so it fails. Fair enough — maybe the syntax is different?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;npx &lt;span class="nt"&gt;-y&lt;/span&gt; @googleworkspace/cli mcp   &lt;span class="c"&gt;# Same error&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;gws &lt;span class="nt"&gt;--help&lt;/span&gt;                         &lt;span class="c"&gt;# No mention of mcp anywhere&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;gws mcp &lt;span class="nt"&gt;--help&lt;/span&gt;                     &lt;span class="c"&gt;# Same error&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;--help&lt;/code&gt; output lists exactly three top-level commands: &lt;code&gt;schema&lt;/code&gt;, &lt;code&gt;generate-skills&lt;/code&gt;, and &lt;code&gt;auth&lt;/code&gt;. MCP is nowhere to be found.&lt;/p&gt;

&lt;p&gt;Something was off.&lt;/p&gt;




&lt;h2&gt;
  
  
  Dissecting the npm Package
&lt;/h2&gt;

&lt;p&gt;I looked at what was actually installed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;readlink&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;which gws&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
.../node_modules/@googleworkspace/cli/run-gws.js
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The entry point is a Node.js wrapper that downloads and runs a prebuilt Rust binary. Checking &lt;code&gt;package.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"bin"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"gws"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"run-gws.js"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"0.8.1"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The only binary exposed is &lt;code&gt;gws&lt;/code&gt;.&lt;/strong&gt; No &lt;code&gt;gws-mcp&lt;/code&gt;, no &lt;code&gt;gws-server&lt;/code&gt;. The &lt;code&gt;supportedPlatforms&lt;/code&gt; section confirms: every platform ships a single binary named &lt;code&gt;gws&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The MCP entry point simply doesn't exist in the distributed package.&lt;/p&gt;




&lt;h2&gt;
  
  
  Reading the Rust Source
&lt;/h2&gt;

&lt;p&gt;Maybe MCP exists in the repo but isn't included in the npm release? I cloned the repository and read &lt;code&gt;src/main.rs&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;Result&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;GwsError&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Vec&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;std&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;env&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;args&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="nf"&gt;.collect&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="c1"&gt;// ...&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;first_arg&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;"schema"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="cm"&gt;/* ... */&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;first_arg&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;"generate-skills"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="cm"&gt;/* ... */&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;first_arg&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;"auth"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="cm"&gt;/* ... */&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="c1"&gt;// Everything else → treat as service name&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;version&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;parse_service_and_version&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;first_arg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three top-level commands. Everything else falls through to service name resolution. &lt;strong&gt;No &lt;code&gt;mcp&lt;/code&gt; branch exists.&lt;/strong&gt; The &lt;code&gt;print_usage()&lt;/code&gt; function doesn't mention MCP. The &lt;code&gt;mod&lt;/code&gt; declarations don't include &lt;code&gt;mcp_server&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;I built from source to confirm:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;cargo build
&lt;span class="nv"&gt;$ &lt;/span&gt;./target/debug/gws mcp
→ Same error
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At this point I had confirmed across five layers:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;gws --help&lt;/code&gt; — no MCP&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;gws mcp&lt;/code&gt; — unknown service&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;package.json&lt;/code&gt; — &lt;code&gt;bin&lt;/code&gt; is &lt;code&gt;gws&lt;/code&gt; only&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;main.rs&lt;/code&gt; — no MCP branch&lt;/li&gt;
&lt;li&gt;Fresh build from source — still no MCP&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;MCP doesn't exist. Not in the release. Not in the source.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  But the Traces Were There
&lt;/h2&gt;

&lt;p&gt;I wasn't ready to give up. I grepped the repo:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-R&lt;/span&gt; &lt;span class="s2"&gt;"mcp"&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt;
./.github/labeler.yml:&lt;span class="s2"&gt;"area: mcp"&lt;/span&gt;:
./.github/labeler.yml:          - src/mcp_server.rs
./CHANGELOG.md:- dd3fc90: Remove mcp &lt;span class="nb"&gt;command&lt;/span&gt;
./CHANGELOG.md:- 9cf6e0e: Add &lt;span class="nt"&gt;--tool-mode&lt;/span&gt; compact|full flag to gws mcp.
./CHANGELOG.md:- 670267f: feat: add gws mcp Model Context Protocol server
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;There it was.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The CHANGELOG told a clear story: MCP was &lt;strong&gt;implemented, improved, and then removed.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;GitHub Issues search returned 19 results for "MCP" (7 open, 12 closed):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;gws mcp -s does not exist&lt;/code&gt; (#69)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;MCP server ignores GOOGLE_WORKSPACE_CLI_ACCOUNT env var&lt;/code&gt; (#221)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;MCP tools/list returns uncallable tool names&lt;/code&gt; (#162)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Switch MCP tool names from underscore to hyphen separator&lt;/code&gt; (#235)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;feat: add tool annotations, deferred loading, and pagination to MCP server&lt;/code&gt; (#260)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These aren't wishlist items. These are &lt;strong&gt;real bug reports from real users who were running the MCP server.&lt;/strong&gt; You don't debate underscore vs hyphen in tool names unless you're actually calling those tools.&lt;/p&gt;

&lt;p&gt;MCP wasn't missing. &lt;strong&gt;It was there, and it was deleted.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Moment of Deletion: 1,151 Lines Gone
&lt;/h2&gt;

&lt;p&gt;I found the commit:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ git show --stat dd3fc90
commit dd3fc9074d74a3c74792aa08c6bff7a9984d0d46
Author: Steve Bazyl &amp;lt;sqrrrl@gmail.com&amp;gt;
Date:   Fri Mar 6 13:33:23 2026 -0500

    fix!: Remove MCP server mode (#275)

    * BREAKING CHANGE: Remove MCP server mode
    * Add changeset file

 .changeset/no-mcp.md |    5 +
 README.md            |   34 ----
 src/main.rs          |    6 -
 src/mcp_server.rs    | 1151 --------------------------------------------------
 5 files changed, 5 insertions(+), 1192 deletions(-)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;March 6, 2026. Two days after launch.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;!&lt;/code&gt; in &lt;code&gt;fix!:&lt;/code&gt; is Conventional Commits syntax for a breaking change. This wasn't a quiet deprecation — it was a deliberate, loud removal.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;src/mcp_server.rs&lt;/code&gt; — &lt;strong&gt;1,151 lines deleted.&lt;/strong&gt; This was no prototype. It was a complete MCP server implementation: JSON-RPC protocol handling, &lt;code&gt;tools/list&lt;/code&gt; for tool discovery, &lt;code&gt;tools/call&lt;/code&gt; for tool execution, Discovery API integration.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;area: mcp&lt;/code&gt; label was also removed from &lt;code&gt;AGENTS.md&lt;/code&gt;. The next day, Issue #260 (proposing tool annotations and deferred loading for the MCP server) was closed as &lt;code&gt;not_planned&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This wasn't a temporary retreat. It was a policy decision.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Google Removed MCP
&lt;/h2&gt;

&lt;p&gt;The answer lies in the collision between MCP's protocol design and Google Workspace API's scale.&lt;/p&gt;

&lt;h3&gt;
  
  
  The tools/list Problem
&lt;/h3&gt;

&lt;p&gt;In MCP, when a server starts up, it exposes all available tools via &lt;code&gt;tools/list&lt;/code&gt;. The client's LLM loads these tool definitions into its context window to decide which tools to use and when.&lt;/p&gt;

&lt;p&gt;Google Workspace API is massive. Drive, Gmail, Calendar, Sheets, Docs, Chat, Tasks, People, Forms, Admin — over 10 major services, each with dozens of methods. Drive alone has &lt;code&gt;files.list&lt;/code&gt;, &lt;code&gt;files.get&lt;/code&gt;, &lt;code&gt;files.create&lt;/code&gt;, &lt;code&gt;files.update&lt;/code&gt;, &lt;code&gt;files.delete&lt;/code&gt;, &lt;code&gt;permissions.create&lt;/code&gt;... easily 10+ methods.&lt;/p&gt;

&lt;p&gt;Run all of that through Discovery API and you get &lt;strong&gt;200–400 MCP tools.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The CHANGELOG confirms this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Add --tool-mode compact|full flag to gws mcp. Compact mode exposes one tool per service plus a gws_discover meta-tool, reducing context window usage from &lt;strong&gt;200-400 tools&lt;/strong&gt; to ~26.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That's 2–8x the practical limit for most MCP clients (typically 50–100 tools).&lt;/p&gt;

&lt;h3&gt;
  
  
  Context Explosion
&lt;/h3&gt;

&lt;p&gt;200–400 tool definitions, each with name, description, parameter schemas, and required/optional markers, all serialized as JSON and loaded into the context window. Estimated token cost: &lt;strong&gt;40,000–100,000 tokens&lt;/strong&gt; — just for tool definitions.&lt;/p&gt;

&lt;p&gt;That leaves dramatically less room for user instructions, conversation history, and actual reasoning. Latency increases. Inference quality degrades.&lt;/p&gt;

&lt;h3&gt;
  
  
  Compact Mode Didn't Save It
&lt;/h3&gt;

&lt;p&gt;The team tried. Compact mode reduced the tool count to ~26 by exposing one meta-tool per service. But &lt;strong&gt;MCP was deleted the day after compact mode was implemented.&lt;/strong&gt; That tells you the mitigation wasn't sufficient.&lt;/p&gt;

&lt;h3&gt;
  
  
  Bug Avalanche
&lt;/h3&gt;

&lt;p&gt;During MCP's brief existence, at least 7 bug fixes were needed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tool naming ambiguity (fixed twice)&lt;/li&gt;
&lt;li&gt;Schema inconsistencies in tool calls&lt;/li&gt;
&lt;li&gt;Alias vs Discovery Document name mismatch&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;unwrap()&lt;/code&gt; panics in &lt;code&gt;mcp_server.rs&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Auth environment variable being ignored&lt;/li&gt;
&lt;li&gt;Empty &lt;code&gt;body: {}&lt;/code&gt; on GET methods causing 400 errors&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a small OSS project, that's an unsustainable maintenance burden over just two days.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Root Cause: Architectural Mismatch
&lt;/h3&gt;

&lt;p&gt;The bug count alone didn't kill MCP. The real issue is structural.&lt;/p&gt;

&lt;p&gt;Google Workspace API is optimized for &lt;strong&gt;dynamic generation of hundreds of methods via Discovery Service.&lt;/strong&gt; That's its superpower as a CLI — new APIs appear automatically, no code changes needed.&lt;/p&gt;

&lt;p&gt;But that same superpower becomes a liability under MCP. MCP's tool model requires &lt;strong&gt;all tool definitions to be sent upfront to the client.&lt;/strong&gt; "Dynamically generate hundreds of methods" directly translates to "flood the context window with hundreds of tool schemas."&lt;/p&gt;

&lt;p&gt;This isn't a fixable bug. It's a &lt;strong&gt;fundamental mismatch between Google's API design and MCP's protocol design.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;MCP was technically implementable. But it couldn't be shaped into a feature that justified its implementation complexity.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Architecture After MCP: CLI-First
&lt;/h2&gt;

&lt;p&gt;Here's what &lt;code&gt;gws&lt;/code&gt; looks like now:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Google Workspace APIs
        ↓
    Discovery Service
        ↓
      gws CLI
        ↓
    JSON output
        ↓
  AI Agent (via shell)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is fundamentally different from MCP.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MCP model:&lt;/strong&gt; LLM discovers all tools upfront via protocol → calls tools via structured protocol. All tool definitions live in the context window.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CLI model:&lt;/strong&gt; Agent calls &lt;code&gt;gws&lt;/code&gt; as a shell command. Skills and CONTEXT.md guide which commands to run. &lt;code&gt;gws schema&lt;/code&gt; provides on-demand schema queries. Context overhead: near zero.&lt;/p&gt;

&lt;p&gt;The MCP approach is "spread the entire menu on the table and choose." The CLI approach is "order from the kitchen when you're hungry." For an API surface as vast as Google Workspace, the kitchen model wins.&lt;/p&gt;

&lt;p&gt;The 100+ Agent Skills remain. The 50+ curated recipes for Gmail, Drive, Calendar, Docs, and Sheets remain. The structured JSON output remains. The on-demand schema discovery remains.&lt;/p&gt;

&lt;p&gt;MCP's removal didn't reduce functionality. &lt;strong&gt;The project converged on a more efficient agent integration model that didn't need MCP as a layer.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Were the Articles Verified?
&lt;/h2&gt;

&lt;p&gt;Multiple publications reported "MCP server included" — in English and Japanese. But by the time I checked the repository:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;gws --help&lt;/code&gt; showed no MCP&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;gws mcp&lt;/code&gt; returned "unknown service"&lt;/li&gt;
&lt;li&gt;The npm package exposed only &lt;code&gt;gws&lt;/code&gt; in &lt;code&gt;bin&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;main.rs&lt;/code&gt; had no MCP branch&lt;/li&gt;
&lt;li&gt;Building from source didn't produce MCP&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;MCP had been removed as a BREAKING CHANGE two days after launch&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The MCP implementation did exist in the repo's history. Issues, PRs, and CHANGELOG entries confirm it was real. So "pure fabrication" would be unfair.&lt;/p&gt;

&lt;p&gt;But a responsible technical article needs at least one of: a startup command, a config example, an execution log, or a version number where the feature works. None of the articles I found had any of these.&lt;/p&gt;

&lt;p&gt;I use AI as a research partner too — I had ChatGPT help analyze the README and used Claude Code to dig through commit history. AI-assisted research is fine. &lt;strong&gt;The problem is publishing AI-generated summaries without running the actual software.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you write about an OSS tool, run it first. Especially in the first week after launch, when READMEs and released artifacts can be out of sync. That gap is where misleading articles are born.&lt;/p&gt;




&lt;h2&gt;
  
  
  Will MCP Come Back?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Short term (&amp;lt; 6 months): Unlikely.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Issue #260 was closed as &lt;code&gt;not_planned&lt;/code&gt;. PR #275 declared a &lt;code&gt;BREAKING CHANGE&lt;/code&gt;. The &lt;code&gt;area: mcp&lt;/code&gt; label was removed from &lt;code&gt;AGENTS.md&lt;/code&gt;. This isn't a pause — it's a clear signal that MCP is not in the development roadmap.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Medium term (6–12 months): Conditionally possible.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If the MCP specification evolves to address:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tool count limits&lt;/strong&gt; — clients efficiently handling hundreds of tools&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lazy loading standardization&lt;/strong&gt; — on-demand tool discovery as a first-class MCP feature&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Community contribution&lt;/strong&gt; — someone submits and maintains a complete implementation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Long term: Architecture-dependent.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Google currently positions &lt;code&gt;gws&lt;/code&gt; as a Gemini CLI Extension. If Gemini's ecosystem adopts a tool integration protocol similar to MCP, something functionally equivalent could emerge.&lt;/p&gt;

&lt;p&gt;But the current trajectory is clearly &lt;strong&gt;CLI + Skills + on-demand schema discovery.&lt;/strong&gt; MCP's near-term revival is unlikely.&lt;/p&gt;




&lt;h2&gt;
  
  
  What This Investigation Revealed
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Tool Design in the Age of AI Agents
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;gws&lt;/code&gt; poses an important question about AI agent tool integration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MCP&lt;/strong&gt; standardizes tool discovery and invocation at the protocol level. LLMs see all available tools and call them through structured interfaces. This works beautifully for small-to-medium tool sets.&lt;/p&gt;

&lt;p&gt;But for services with &lt;strong&gt;massive API surfaces&lt;/strong&gt; like Google Workspace, MCP's model breaks down. Tool definitions consume the context window and degrade reasoning capability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CLI-based integration&lt;/strong&gt; lets agents call external tools as shell commands. No tool definitions in the context window. Skills and documentation teach the agent what's available; schemas are queried on demand. Even with hundreds of available operations, context overhead stays near zero.&lt;/p&gt;

&lt;p&gt;This isn't MCP vs CLI as a universal choice. &lt;strong&gt;The optimal integration method depends on the scale and characteristics of the tool set.&lt;/strong&gt; Google didn't remove MCP because MCP is a bad protocol. They removed it because Google Workspace API's scale created a structural mismatch with MCP's tool model. For smaller tool sets (10–50 tools), MCP remains one of the best integration approaches available.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Speed of OSS
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;gws&lt;/code&gt; MCP followed this timeline:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;v0.3.x: MCP server added&lt;/li&gt;
&lt;li&gt;v0.5.x: compact/full mode improvement&lt;/li&gt;
&lt;li&gt;v0.6.x: bug fixes (naming, schemas, auth)&lt;/li&gt;
&lt;li&gt;v0.8.x: MCP removed (BREAKING CHANGE)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All of this happened within &lt;strong&gt;days.&lt;/strong&gt; Features can be added and removed faster than articles can be written about them. That's why running the software yourself matters more than reading about it.&lt;/p&gt;

&lt;h3&gt;
  
  
  CLI = Agent Runtime
&lt;/h3&gt;

&lt;p&gt;Tracing &lt;code&gt;gws&lt;/code&gt;'s design reveals something about Google's vision for AI agents:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;API schema (Discovery Service)
    ↓
CLI runtime (gws)
    ↓
Structured JSON output
    ↓
AI Agent (guided by Skills)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is "CLI as Agent Runtime." Traditionally, CLI was a human interface. &lt;code&gt;gws&lt;/code&gt; is explicitly designed as &lt;strong&gt;an interface for AI agents to call.&lt;/strong&gt; Structured JSON everywhere. 100+ pre-defined Skills. On-demand API schema queries via &lt;code&gt;gws schema&lt;/code&gt;. Agent guidelines in CONTEXT.md.&lt;/p&gt;

&lt;p&gt;This design philosophy is precisely what made MCP redundant. MCP exposes tools via protocol; &lt;code&gt;gws&lt;/code&gt; treats &lt;strong&gt;the CLI itself as the tool.&lt;/strong&gt; No need to send tool definitions over JSON-RPC when you can just execute a shell command.&lt;/p&gt;

&lt;p&gt;If this is Google's answer, then the future of AI agents won't be MCP-only. At least for services with large API surfaces, CLI-first will remain a viable — perhaps superior — alternative.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;This investigation started with a trivial error: &lt;code&gt;gws mcp&lt;/code&gt; returning "unknown service."&lt;/p&gt;

&lt;p&gt;I read the README. Checked &lt;code&gt;--help&lt;/code&gt;. Opened &lt;code&gt;package.json&lt;/code&gt;. Read the Rust source. Built from source. Traced the git log. Examined commit diffs. Searched GitHub Issues. At the end of that trail was &lt;strong&gt;1,151 lines of MCP server code, deliberately removed as a BREAKING CHANGE.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The removal wasn't a technical failure. It was &lt;strong&gt;the recognition of an architectural mismatch.&lt;/strong&gt; Google's Discovery API dynamically generates hundreds of methods — a strength that directly became MCP's context window problem. Compact mode was attempted as a mitigation but couldn't resolve the fundamental collision between Google's API scale and MCP's tool model.&lt;/p&gt;

&lt;p&gt;What remains is everything an AI agent needs to operate Google Workspace: 100+ Agent Skills, structured JSON output, on-demand schema discovery — all without MCP.&lt;/p&gt;

&lt;p&gt;There's no universal answer to the MCP vs CLI debate. But what &lt;code&gt;gws&lt;/code&gt; demonstrated through its own history is that &lt;strong&gt;AI agent tool design is still an open problem.&lt;/strong&gt; The right architecture depends on the shape and scale of the API surface.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you write about an OSS tool, run it first.&lt;/strong&gt; Not the README — the actual software. In the age of AI, repository analysis takes hours, not days. Use that speed for verification, not just content production.&lt;/p&gt;

</description>
      <category>mcp</category>
      <category>ai</category>
      <category>cli</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Not Everything Needs to Be a Framework: Why Spawning Processes Still Wins</title>
      <dc:creator>Gen.Y.Sakai</dc:creator>
      <pubDate>Mon, 15 Dec 2025 07:21:13 +0000</pubDate>
      <link>https://dev.to/gys/not-everything-needs-to-be-a-framework-why-spawning-processes-still-wins-11de</link>
      <guid>https://dev.to/gys/not-everything-needs-to-be-a-framework-why-spawning-processes-still-wins-11de</guid>
      <description>&lt;p&gt;Some people say:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“This is just spawning a subprocess. That’s not architecture.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;They’re absolutely right.&lt;br&gt;&lt;br&gt;
And that’s exactly why it works.&lt;/p&gt;

&lt;p&gt;I’ve been shipping production systems long enough to have lived through CORBA, SOAP, WSDL, and every other attempt to make inter-process communication “pure.”&lt;br&gt;&lt;br&gt;
I’ve also shipped systems under real deadlines, in regulated industries, where “rewriting everything” is not an option.&lt;/p&gt;

&lt;p&gt;This article is about why the &lt;strong&gt;sidecar pattern&lt;/strong&gt; — yes, literally just wrapping a binary — keeps winning in the real world.&lt;/p&gt;

&lt;p&gt;Not in theory.&lt;br&gt;&lt;br&gt;
Not in blog posts.&lt;br&gt;&lt;br&gt;
&lt;strong&gt;In production.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Pattern Everyone Pretends Not to Use
&lt;/h2&gt;

&lt;p&gt;Here’s the pattern:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A TypeScript / Node / Electron app&lt;/li&gt;
&lt;li&gt;Spawns a compiled binary (.NET, Rust, Go, C++)&lt;/li&gt;
&lt;li&gt;Talks over stdin/stdout, pipes, or a local socket&lt;/li&gt;
&lt;li&gt;Lets the OS do process isolation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s it.&lt;/p&gt;

&lt;p&gt;No service mesh.&lt;br&gt;&lt;br&gt;
No gRPC schema wars.&lt;br&gt;&lt;br&gt;
No “distributed systems” cosplay.&lt;/p&gt;

&lt;p&gt;Just a parent process orchestrating a sidecar that does the heavy lifting.&lt;/p&gt;

&lt;p&gt;If this sounds “too simple,” good.&lt;br&gt;&lt;br&gt;
That’s the point.&lt;/p&gt;

&lt;h2&gt;
  
  
  “But That’s Just a Wrapper”
&lt;/h2&gt;

&lt;p&gt;Yes.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;VS Code wraps &lt;code&gt;git&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;VS Code wraps OmniSharp (.NET)&lt;/li&gt;
&lt;li&gt;Prisma wraps a Rust query engine&lt;/li&gt;
&lt;li&gt;Docker Desktop wraps the Docker daemon&lt;/li&gt;
&lt;li&gt;Electron apps wrap platform-native tools every single day&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If “just a wrapper” was a design flaw, half of modern developer tooling wouldn’t exist.&lt;/p&gt;

&lt;p&gt;The uncomfortable truth is this:&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Most successful tools are orchestration layers. Not reinventions.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Pattern Keeps Shipping
&lt;/h2&gt;

&lt;p&gt;Let’s be brutally honest about the real options.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 1: Rewrite Everything in TypeScript
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Months of work&lt;/li&gt;
&lt;li&gt;New bugs in mature code&lt;/li&gt;
&lt;li&gt;Worse performance&lt;/li&gt;
&lt;li&gt;Now you maintain two versions forever&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Great for blog posts.&lt;br&gt;&lt;br&gt;
Terrible for shipping.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 2: Native Addons (N-API, node-gyp)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Fast, yes&lt;/li&gt;
&lt;li&gt;Also fragile&lt;/li&gt;
&lt;li&gt;Node upgrades break you&lt;/li&gt;
&lt;li&gt;One segfault kills your entire process&lt;/li&gt;
&lt;li&gt;Cross-compilation is pain incarnate&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Ask anyone who has maintained native addons long-term.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 3: Spawn a Sidecar
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Uses existing, battle-tested code&lt;/li&gt;
&lt;li&gt;Crashes are isolated&lt;/li&gt;
&lt;li&gt;Debugging is obvious (it’s a process)&lt;/li&gt;
&lt;li&gt;Cross-platform is manageable&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Ships this week, not “someday”&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This isn’t laziness.&lt;br&gt;&lt;br&gt;
&lt;strong&gt;It’s risk management.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  “Architecture Purity” vs Reality
&lt;/h2&gt;

&lt;p&gt;In regulated domains — healthcare, finance, government systems — you don’t get to say:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Let’s just rewrite the crypto layer.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;You integrate with what already exists:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Vendor SDKs&lt;/li&gt;
&lt;li&gt;Legacy libraries&lt;/li&gt;
&lt;li&gt;OS-specific APIs&lt;/li&gt;
&lt;li&gt;Hardware-backed security modules&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The sidecar pattern gives you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A clean boundary&lt;/li&gt;
&lt;li&gt;A failure domain you can reason about&lt;/li&gt;
&lt;li&gt;A way to keep modern UX without breaking compliance&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s not a hack.&lt;br&gt;&lt;br&gt;
&lt;strong&gt;That’s professional engineering.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  This Is Not a New Idea — It’s the Mature One
&lt;/h2&gt;

&lt;p&gt;We spent decades trying to make IPC “beautiful”:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CORBA&lt;/li&gt;
&lt;li&gt;SOAP&lt;/li&gt;
&lt;li&gt;Enterprise Service Buses&lt;/li&gt;
&lt;li&gt;Endless XML schemas&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And what did we learn?&lt;/p&gt;

&lt;p&gt;That &lt;strong&gt;simple, observable processes beat “perfect abstractions”&lt;/strong&gt; when systems get large and real humans have to operate them.&lt;/p&gt;

&lt;p&gt;Today, spawning a process and talking over stdio feels almost embarrassing — until you realize it’s exactly what we wanted 20 years ago.&lt;/p&gt;

&lt;h2&gt;
  
  
  When You Should &lt;em&gt;Not&lt;/em&gt; Do This
&lt;/h2&gt;

&lt;p&gt;Let’s be clear — this is not a hammer for every nail.&lt;/p&gt;

&lt;p&gt;Don’t use this pattern if:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A simple REST call solves your problem&lt;/li&gt;
&lt;li&gt;The logic is trivial and low-cost&lt;/li&gt;
&lt;li&gt;Latency is measured in microseconds&lt;/li&gt;
&lt;li&gt;Your team can’t maintain the sidecar language&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Pragmatism cuts both ways.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real Lesson
&lt;/h2&gt;

&lt;p&gt;The sidecar pattern isn’t about processes.&lt;br&gt;&lt;br&gt;
It’s about respecting reality:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Existing code matters&lt;/li&gt;
&lt;li&gt;Deadlines matter&lt;/li&gt;
&lt;li&gt;Failure isolation matters&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Shipping matters&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your architecture diagram looks clean but your product never ships, you chose wrong.&lt;/p&gt;

&lt;p&gt;I’ll take a “wrapper” that ships over a “pure” system that doesn’t — every time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thought
&lt;/h2&gt;

&lt;p&gt;If this pattern is good enough for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Microsoft (VS Code)&lt;/li&gt;
&lt;li&gt;Docker&lt;/li&gt;
&lt;li&gt;Prisma&lt;/li&gt;
&lt;li&gt;The entire Electron ecosystem&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It’s probably good enough for your project too.&lt;/p&gt;

&lt;p&gt;And if someone says&lt;br&gt;&lt;br&gt;
“That’s just spawning a process,”&lt;/p&gt;

&lt;p&gt;Smile.&lt;/p&gt;

&lt;p&gt;They just described half of modern software — the half that actually works.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Want to try this pattern yourself?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
I put together a minimal proof-of-concept in TypeScript + .NET that shows the full lifecycle management and stdio communication in action. It runs in minutes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/sakai-sktech/open-rx" rel="noopener noreferrer"&gt;Check out the repo on GitHub → open-rx&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Have you ever used the sidecar pattern in production? What worked? What broke?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Drop your war stories in the comments — I read every one.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Transparency note:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
The title of this article was generated with &lt;strong&gt;NanobananaPro&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
The opinions, war stories, and architectural scars are entirely my own.&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>opensource</category>
      <category>typescript</category>
      <category>pragmaticengineering</category>
    </item>
  </channel>
</rss>
