<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ferhat Atagün</title>
    <description>The latest articles on DEV Community by Ferhat Atagün (@ferhatatagun).</description>
    <link>https://dev.to/ferhatatagun</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3968879%2Fd5dc6bce-be71-47fa-a658-dbe02d4d37d5.png</url>
      <title>DEV Community: Ferhat Atagün</title>
      <link>https://dev.to/ferhatatagun</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ferhatatagun"/>
    <language>en</language>
    <item>
      <title>See the prompt before you ship it</title>
      <dc:creator>Ferhat Atagün</dc:creator>
      <pubDate>Mon, 08 Jun 2026 11:29:39 +0000</pubDate>
      <link>https://dev.to/ferhatatagun/see-the-prompt-before-you-ship-it-51ao</link>
      <guid>https://dev.to/ferhatatagun/see-the-prompt-before-you-ship-it-51ao</guid>
      <description>&lt;p&gt;The way most teams find out their prompt is too long is in the bill. The way most teams find out their prompt is approaching the context window is when the model starts dropping the system instructions. The way most teams find out their prompt-caching boundary is in the wrong place is by graphing a hit ratio that won't climb above 30%.&lt;/p&gt;

&lt;p&gt;All three of these are diagnosable in advance, in about four seconds, for free. The reason they keep happening is that the tools every Claude developer reaches for — chat playgrounds, IDE plugins, the official SDK — are &lt;em&gt;post-hoc&lt;/em&gt;. They show you what just happened. None of them shows you what your prompt looks like &lt;em&gt;before&lt;/em&gt; you press send.&lt;/p&gt;

&lt;p&gt;The other four tools I've shipped in this suite are all post-hoc too. &lt;a href="https://claudoscope-labs.vercel.app" rel="noopener noreferrer"&gt;claudoscope&lt;/a&gt; x-rays a finished response. &lt;a href="https://agentreplay.vercel.app" rel="noopener noreferrer"&gt;agent-replay&lt;/a&gt; scrubs a finished trace. &lt;a href="https://prompt-lab-promptly.vercel.app" rel="noopener noreferrer"&gt;prompt-lab&lt;/a&gt; compares two finished runs. &lt;a href="https://tool-lab-bice.vercel.app" rel="noopener noreferrer"&gt;tool-lab&lt;/a&gt; sandboxes the agent loop. They're all "look at what just happened" microscopes. None of them is a "look at what you're about to do" lens.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://context-lens-sigma.vercel.app" rel="noopener noreferrer"&gt;&lt;strong&gt;context-lens&lt;/strong&gt;&lt;/a&gt; is. Paste a system prompt and a user message; see exactly how the API will count them, where in the 200K window you sit, where caching boundaries fall, and what each call will cost. The pre-flight check that turns a guess into a measurement.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Token cost, context-window position, and prompt-caching layout are all knowable from the prompt alone — you don't need to send the request.&lt;/li&gt;
&lt;li&gt;Anthropic's &lt;code&gt;count_tokens&lt;/code&gt; endpoint gives you the exact number; a &lt;code&gt;~3.7 chars/token&lt;/code&gt; heuristic gives you a good-enough number while you type.&lt;/li&gt;
&lt;li&gt;The most useful single number is "tokens × calls/day × dollars/token" — once you can compute it before deploying, "ship this prompt" stops being an aesthetic call and becomes a budget call.&lt;/li&gt;
&lt;li&gt;A 4× difference in input length between two equivalent prompts is normal. Catching it before it goes to production saves more than the tool costs to build.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What you can actually pre-flight
&lt;/h2&gt;

&lt;p&gt;Three things, all derivable from the prompt text alone:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Exact token count.&lt;/strong&gt; Not an estimate. Anthropic ships a &lt;code&gt;/v1/messages/count_tokens&lt;/code&gt; endpoint that takes the exact same shape as &lt;code&gt;/v1/messages&lt;/code&gt; (system, messages, tools) and returns just the &lt;code&gt;input_tokens&lt;/code&gt; number. Same tokenization as the actual API call would use. No model invocation, no output, no cost beyond a single tiny request.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Position in the context window.&lt;/strong&gt; Sonnet 4.5 has a 200K-token window. Going past it doesn't error; the model silently drops the oldest content, which usually means dropping your system instructions, which usually means the model stops doing what you asked. The math is &lt;code&gt;(input + max_output) / 200_000&lt;/code&gt;. You should never see "78% of window" in production without knowing about it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Cost per call.&lt;/strong&gt; Multiply input tokens by input price (&lt;code&gt;$3/M&lt;/code&gt; on Sonnet), output tokens by output price (&lt;code&gt;$15/M&lt;/code&gt;), and you have one number for the cost of one call. Multiply by your traffic and you have the bill. The interesting move: do this &lt;em&gt;before&lt;/em&gt; you commit to a prompt design, not after.&lt;/p&gt;

&lt;p&gt;The fourth thing — where prompt-caching boundaries should sit — is harder to derive purely from text, but it's still pre-flight: you choose where to put &lt;code&gt;cache_control&lt;/code&gt; based on which prefix is &lt;em&gt;stable&lt;/em&gt; across your real traffic. context-lens won't choose for you, but it will show you the boundaries you've chosen so you can sanity-check them.&lt;/p&gt;

&lt;h2&gt;
  
  
  The four-fold cost difference no one was looking for
&lt;/h2&gt;

&lt;p&gt;A real example, the worked-out kind. Two versions of the same agent system prompt:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Version&lt;/th&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;Input tokens (counted)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;A&lt;/td&gt;
&lt;td&gt;Markdown headings, examples, long taxonomy, JSON schema embedded&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3,847&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;B&lt;/td&gt;
&lt;td&gt;Single paragraph, schema implied by one example, no preamble&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;612&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Same model (Sonnet 4.5). Same user inputs (a code review task). The output was substantively equivalent on five real traffic samples — both caught the same critical bugs, both produced valid JSON, both came in under 800 output tokens.&lt;/p&gt;

&lt;p&gt;The cost differential is mechanical:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A: &lt;code&gt;(3847 × 3 + 800 × 15) / 1_000_000&lt;/code&gt; = &lt;strong&gt;$0.0235&lt;/strong&gt; per call&lt;/li&gt;
&lt;li&gt;B: &lt;code&gt;(612 × 3 + 800 × 15) / 1_000_000&lt;/code&gt; = &lt;strong&gt;$0.0138&lt;/strong&gt; per call&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At 10,000 calls per day, that's &lt;strong&gt;$97/day saved&lt;/strong&gt;, or &lt;strong&gt;$3,000/month&lt;/strong&gt;. For a single prompt rewrite that took two hours to test in context-lens.&lt;/p&gt;

&lt;p&gt;The salient detail: I didn't &lt;em&gt;intend&lt;/em&gt; version B to be cheaper. I intended it to be more readable. The cost reduction was a side-effect that I would not have noticed without the pre-flight number, because both prompts felt "about the same length" to me in an editor. context-lens told me one was 6.3× the length of the other, in the only metric that matters: the metric the API uses.&lt;/p&gt;

&lt;p&gt;The lesson is that "feels about the same" is a uniformly bad estimator for token count, and you stop making the mistake the day you start measuring before you ship.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why the heuristic mode exists
&lt;/h2&gt;

&lt;p&gt;context-lens does two things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Live as you type: a fast heuristic, roughly &lt;code&gt;3.7 chars/token&lt;/code&gt; for English-ish text, that updates with every keystroke. No API call, no key required, instant.&lt;/li&gt;
&lt;li&gt;On demand: a real API call to &lt;code&gt;count_tokens&lt;/code&gt; that gives you the exact number Anthropic will use.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The heuristic isn't quite right — Turkish, code, and JSON all tokenize differently than English prose, sometimes by 30%. But it's a real-time signal while you iterate, which is more useful than an accurate-but-asynchronous one while you write. When you're ready to commit, you click the button and get the exact number. The two modes are intentional: one for the iteration phase, one for the verification phase.&lt;/p&gt;

&lt;p&gt;The pattern generalizes. Every place you have a fast-approximate metric and a slow-exact one, ship both, label them clearly, default to the fast one. The fast metric should never be wrong by more than ~30%; otherwise it's not a useful approximation. ~3.7 chars/token meets that bar for the languages context-lens has to handle.&lt;/p&gt;

&lt;h2&gt;
  
  
  What about prompt caching
&lt;/h2&gt;

&lt;p&gt;Caching is the lever most teams underuse — and the one context-lens helps with most by surfacing where the boundaries are. Anthropic lets you mark any segment of your prompt as cacheable with &lt;code&gt;cache_control: { type: "ephemeral" }&lt;/code&gt;. The next 5 minutes, requests that share that exact prefix get the cached portion at &lt;strong&gt;10% of the input price&lt;/strong&gt;. The math flips: a 4,000-token system prompt that costs &lt;code&gt;$0.012&lt;/code&gt; per cold call costs &lt;code&gt;$0.0012&lt;/code&gt; per warm call. That's 10×.&lt;/p&gt;

&lt;p&gt;The catch: every byte before the &lt;code&gt;cache_control&lt;/code&gt; boundary must be identical. If you interpolate the user's name into the system prompt — gone. If your tool list reorders between requests — gone. If you append a timestamp — gone.&lt;/p&gt;

&lt;p&gt;context-lens shows you the structure you're sending. It doesn't auto-detect cacheability for you, but it does let you toggle "assume input is cache-read" and see what the cost would be if your caching worked. If &lt;code&gt;$0.012 → $0.0012&lt;/code&gt; is interesting at your traffic level, the path to verify it works is in &lt;a href="https://claudoscope-labs.vercel.app" rel="noopener noreferrer"&gt;claudoscope&lt;/a&gt;, which shows you the actual cache-read and cache-write breakdown on a live call. The two tools are complementary: context-lens predicts, claudoscope measures.&lt;/p&gt;

&lt;p&gt;I wrote a longer piece on the caching observability case in &lt;a href="https://ferhatatagun.com/blog/prompt-caching-nobody-measures" rel="noopener noreferrer"&gt;Prompt caching is the cheapest Claude optimization. Nobody measures it.&lt;/a&gt; if you want the full argument.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd recommend you do this week
&lt;/h2&gt;

&lt;p&gt;Three escalating moves:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Today (5 minutes):&lt;/strong&gt; Take whatever prompt your team is shipping right now. Paste it into context-lens with a representative user message. Note the token count. Now write a 1-paragraph version of the same prompt and paste that. If the count drops by 50% with no quality regression on three real inputs, you have a free production cost cut.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;This sprint (an afternoon):&lt;/strong&gt; Add a pre-merge step to your prompt-change workflow: every PR that touches a prompt must include the context-lens token counts (before / after) in the description. Same energy as showing test results. If a PR triples your input tokens, that should be a conversation, not a stealth deploy.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;This quarter (a habit):&lt;/strong&gt; Track the prompt-cost-per-feature number across your product as a real metric. If feature X costs &lt;code&gt;$0.02/call&lt;/code&gt; and feature Y costs &lt;code&gt;$0.20/call&lt;/code&gt;, that's information you should know about before the bill teaches you. context-lens is the cheapest place to start collecting it — &lt;code&gt;count_tokens&lt;/code&gt; is free to call.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The economics of LLM apps in 2026 are not about model selection, mostly. They're about prompt design. Teams that can see their prompts before they ship them will out-compete teams that can't, on cost first and on quality second. The "see them" part is what's missing in most setups, and what context-lens is for.&lt;/p&gt;




&lt;p&gt;I shipped this in &lt;a href="https://context-lens-sigma.vercel.app" rel="noopener noreferrer"&gt;&lt;strong&gt;context-lens&lt;/strong&gt;&lt;/a&gt; — paste a Claude prompt, see what it costs before you ship. BYOK, no backend, runs in the browser. Source: &lt;a href="https://github.com/ferhatatagun/context-lens" rel="noopener noreferrer"&gt;github.com/ferhatatagun/context-lens&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The same protocol-level approach also powers four sibling tools — &lt;a href="https://claudoscope-labs.vercel.app" rel="noopener noreferrer"&gt;claudoscope&lt;/a&gt;, &lt;a href="https://agentreplay.vercel.app" rel="noopener noreferrer"&gt;agent-replay&lt;/a&gt;, &lt;a href="https://prompt-lab-promptly.vercel.app" rel="noopener noreferrer"&gt;prompt-lab&lt;/a&gt;, &lt;a href="https://tool-lab-bice.vercel.app" rel="noopener noreferrer"&gt;tool-lab&lt;/a&gt;. All open source, all BYOK: &lt;a href="https://ferhatatagun.com/tools" rel="noopener noreferrer"&gt;ferhatatagun.com/tools&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>claude</category>
      <category>anthropic</category>
      <category>llm</category>
      <category>webdev</category>
    </item>
    <item>
      <title>What I learned shipping four open-source Claude dev-tools in two weekends</title>
      <dc:creator>Ferhat Atagün</dc:creator>
      <pubDate>Mon, 08 Jun 2026 11:29:08 +0000</pubDate>
      <link>https://dev.to/ferhatatagun/what-i-learned-shipping-four-open-source-claude-dev-tools-in-two-weekends-1f4f</link>
      <guid>https://dev.to/ferhatatagun/what-i-learned-shipping-four-open-source-claude-dev-tools-in-two-weekends-1f4f</guid>
      <description>&lt;p&gt;About a month ago I tried to import the Anthropic SDK into a Next.js project and the bundler crashed. The fix was straightforward — talk to the Messages API directly, ~150 lines of TypeScript replacing the SDK — but the side-effect was that I now had a hand-rolled SSE client lying around, with all of Claude's streaming behaviour visible to me at the protocol level for the first time.&lt;/p&gt;

&lt;p&gt;That client became the seed of four small open-source tools, shipped over two weekends. Each one points a different microscope at the same protocol:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://claudoscope-labs.vercel.app" rel="noopener noreferrer"&gt;&lt;strong&gt;claudoscope&lt;/strong&gt;&lt;/a&gt; — live x-ray of token economics: input, cache write, cache read, output, all visible as the response streams.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://agentreplay.vercel.app" rel="noopener noreferrer"&gt;&lt;strong&gt;agent-replay&lt;/strong&gt;&lt;/a&gt; — paste a Claude agent trace, replay it step-by-step on a cinematic timeline.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://prompt-lab-promptly.vercel.app" rel="noopener noreferrer"&gt;&lt;strong&gt;prompt-lab&lt;/strong&gt;&lt;/a&gt; — run two prompts (or models) on the same input, side by side, with output/cost/latency compared.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://tool-lab-bice.vercel.app" rel="noopener noreferrer"&gt;&lt;strong&gt;tool-lab&lt;/strong&gt;&lt;/a&gt; — define Claude tools in a JSON editor, type the mock responses by hand, watch the agent loop play out.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All four run only in your browser, BYOK, no backend, MIT-licensed. Together they're around 400 KB gzipped; the shared SSE client is the same file in all four repos. Five long-form posts on &lt;a href="https://ferhatatagun.com/blog" rel="noopener noreferrer"&gt;ferhatatagun.com/blog&lt;/a&gt; and Medium document the engineering decisions behind each one.&lt;/p&gt;

&lt;p&gt;The work is done — the more interesting question for me now is what shipping them in this shape, on this timeline, taught me about building developer tools in the AI-tooling era.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Resistance from the official SDK ended up being the most generative constraint. Without the crash, I would never have written the parser, and without the parser, I would never have noticed how much the SDK hides.&lt;/li&gt;
&lt;li&gt;"One tool per insight" beats "one tool for everything." Each of the four tools makes exactly one thing visible. They compose because they don't try to.&lt;/li&gt;
&lt;li&gt;BYOK + browser-only is a credibility multiplier. The threshold for "I'll try this" drops dramatically when there's no account to make and no server to trust.&lt;/li&gt;
&lt;li&gt;A &lt;code&gt;&amp;lt;150-line&lt;/code&gt; shared protocol client across four projects is a more interesting reuse pattern than "extract into a library." It travels by copy-paste, but with intent.&lt;/li&gt;
&lt;li&gt;The articles are not promotion; they're scaffolding. Every tool needs a long-form artifact that explains &lt;em&gt;why&lt;/em&gt; it exists, not what it does.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The constraint that made the work possible
&lt;/h2&gt;

&lt;p&gt;If the Anthropic SDK had imported cleanly into my Next.js bundle, none of this exists. I would have used the SDK, never seen the SSE event stream, never realized that the four &lt;code&gt;usage&lt;/code&gt; fields are sitting there in every response, and shipped some boring product feature instead.&lt;/p&gt;

&lt;p&gt;What broke first was the bundler — &lt;code&gt;node:fs/promises&lt;/code&gt; from inside an agent-toolset module, deep in the SDK's transitive imports. The fix wasn't subtle: don't use the SDK. Talk to &lt;code&gt;api.anthropic.com&lt;/code&gt; directly with &lt;code&gt;fetch&lt;/code&gt;. Add the &lt;code&gt;anthropic-dangerous-direct-browser-access&lt;/code&gt; header. Parse the SSE stream by hand. About 150 lines.&lt;/p&gt;

&lt;p&gt;The interesting part wasn't the parser — it was what I saw &lt;em&gt;because&lt;/em&gt; of the parser. I'd been calling Claude for months without ever noticing that &lt;code&gt;cache_creation_input_tokens&lt;/code&gt; and &lt;code&gt;cache_read_input_tokens&lt;/code&gt; were distinct fields. I'd never looked at the granular order of &lt;code&gt;content_block_delta&lt;/code&gt; events. I'd never noticed that &lt;code&gt;tool_use&lt;/code&gt; inputs arrive as partial-JSON deltas you have to accumulate. The SDK had been doing me a favor by hiding this stuff, and I'd been doing my apps a disservice by letting it.&lt;/p&gt;

&lt;p&gt;The lesson, restated: when an SDK fights you, the fight is the gift. The work to bypass it gives you ground-truth visibility you'd never have bought yourself.&lt;/p&gt;

&lt;h2&gt;
  
  
  One tool, one thing it makes visible
&lt;/h2&gt;

&lt;p&gt;The temptation, once I had the SSE parser, was to build "a Claude developer dashboard" — one tool that did everything. I almost did. The reason I didn't is that the most useful diagnostic tools I've ever used (Wireshark, Chrome DevTools' specific panels, the React Profiler) all share a property: each panel makes &lt;em&gt;exactly one thing&lt;/em&gt; visible in a way no other tool does.&lt;/p&gt;

&lt;p&gt;So I broke the work into four:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Makes visible&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;claudoscope&lt;/td&gt;
&lt;td&gt;The four &lt;code&gt;usage&lt;/code&gt; fields, live, as cost in dollars&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;agent-replay&lt;/td&gt;
&lt;td&gt;The decision sequence inside a &lt;code&gt;messages&lt;/code&gt; array&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;prompt-lab&lt;/td&gt;
&lt;td&gt;The latency/cost/output diff between two variants&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;tool-lab&lt;/td&gt;
&lt;td&gt;What the model actually does with your tool schemas&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Each is a small surface area. None of them does the other three's job. They're all the same shape — paste-some-JSON, watch-some-output, see-the-thing — but the "thing" is intentionally different in each.&lt;/p&gt;

&lt;p&gt;This decomposition cost me something: I have four landing pages to maintain, four READMEs, four sets of cross-links. But it bought me an asymmetric thing: a clear pitch per tool. "X-ray a Claude API call" is easier to share than "an all-in-one Claude developer console." On a Show HN front page or a Twitter timeline, the small specific claim wins.&lt;/p&gt;

&lt;h2&gt;
  
  
  BYOK + browser-only as a trust multiplier
&lt;/h2&gt;

&lt;p&gt;The first version of each tool, in my head, had a backend. A small Node service, an API key kept server-side, maybe a rate limiter. I started building the first one this way, then stopped at the deploy step and asked: why am I making the user trust me with their key?&lt;/p&gt;

&lt;p&gt;There is no good answer. For a developer tool that the user is going to use for ten minutes to debug their own work, no backend is necessary. Their key, their requests, their data. The browser is the right runtime; &lt;code&gt;localStorage&lt;/code&gt; is the right persistence layer; "nothing leaves your tab" is the right privacy guarantee.&lt;/p&gt;

&lt;p&gt;What this changed: the "try it" threshold collapsed. No account creation. No OAuth dance. No "should I trust this site with my key?" hesitation. Open the URL, paste a key, hit send. The tool is yours in under thirty seconds. The Anthropic header named &lt;code&gt;anthropic-dangerous-direct-browser-access&lt;/code&gt; was clearly built for exactly this kind of usage — a developer wants to look at the protocol directly, on their own machine, with their own credentials.&lt;/p&gt;

&lt;p&gt;The flip side: this design only works for &lt;em&gt;developer tools used by their own creator&lt;/em&gt;. A production app that ships keys to users would still need a backend. But for the diagnostic case, BYOK + browser-only is the right architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  A 150-line client, copied four times
&lt;/h2&gt;

&lt;p&gt;The shared SSE streaming client is &lt;code&gt;src/lib/anthropic.ts&lt;/code&gt; in all four repos. Same file. Same 150ish lines. I considered extracting it to an npm package — &lt;code&gt;@ferhatatagun/claude-fetch&lt;/code&gt; or similar — and decided against it three times.&lt;/p&gt;

&lt;p&gt;The case against extraction is intuitive once you've worked at scale: a shared library across four tools creates a fan-out problem. A breaking change in the library breaks all four; a non-breaking change requires version-pinning logic; a hotfix requires four PR's to deploy. Meanwhile the four tools are &lt;em&gt;small enough that the file is reviewable in five minutes&lt;/em&gt;. There's nothing to abstract over.&lt;/p&gt;

&lt;p&gt;What I do instead: the file at the top of &lt;code&gt;src/lib/anthropic.ts&lt;/code&gt; in each repo says, in a comment, where it was last synced from. When I improve the parser in one tool, I diff the file across the four repos and reconcile. It takes minutes, not hours, and the four tools stay in sync without the ceremony of a published package.&lt;/p&gt;

&lt;p&gt;This isn't a universal pattern — for ten projects it would break down, for a hundred it's clearly wrong. But for four tools shipped by one person on weekends, it's strictly better than the npm-and-versioning alternative.&lt;/p&gt;

&lt;h2&gt;
  
  
  The articles aren't marketing — they're scaffolding
&lt;/h2&gt;

&lt;p&gt;Each of the four tools has a long-form post that explains why it exists. claudoscope has two (one on the streaming client itself, one on cache observability). prompt-lab, tool-lab, and agent-replay each have one. There are also five matching Turkish translations on ferhatatagun.com.&lt;/p&gt;

&lt;p&gt;These posts are not promotion in the marketing sense. I'm not optimizing them for SEO and I'm not pumping them on LinkedIn for impressions. (OK, I'm pumping them on LinkedIn a little. But that's not the point.)&lt;/p&gt;

&lt;p&gt;The point is: a tool that does one specific thing benefits massively from an artifact that explains &lt;em&gt;why&lt;/em&gt; that specific thing is worth doing. "Here's a tool to A/B test Claude prompts" is a less convincing pitch than "you're choosing prompts by vibes; here's what side-by-side reveals that sequential doesn't, with a worked example, and a tool for it." The article does the persuasion; the tool catches the convinced reader.&lt;/p&gt;

&lt;p&gt;Without the writing, the tools look like toys. With the writing, they look like the natural conclusion of an argument. The two work as a pair.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd do differently
&lt;/h2&gt;

&lt;p&gt;A handful of small things I'd front-load if I were starting over:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Demo mode from day one.&lt;/strong&gt; I added &lt;code&gt;?demo=1&lt;/code&gt; to three of the four tools as an afterthought. It's the single highest-conversion feature — users who land on a tool and don't have a key still need something to look at, or they bounce. Should have been there at first commit.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Per-tool OG cards.&lt;/strong&gt; I shipped each tool with a generic OG image and went back two days later to make per-tool 1200×630 cards in the right brand color. The first two days of traffic that came in via shared links looked generic. Should have been there at launch.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cross-linking inside the tools.&lt;/strong&gt; Each tool's footer points to the other three. I added this in the second weekend. The first weekend, every tool was a silo, and visitors discovered them one at a time. Should have been baked into the template.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;A "what's in this for me" line on the landing page.&lt;/strong&gt; I had four hero descriptions like "see what Claude is doing." Better: "see prompt caching save you 90% of your bill, live, as you debug." Specific outcome &amp;gt; vague capability. I corrected this in the second pass.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;None of these are large fixes. They're all things that, if you've ever shipped a small developer tool, you already know. Knowing and remembering at the moment of shipping are different things.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;p&gt;Whatever the API surface adds next, the same pattern applies: ship a small visualizer for it the day it lands. Anthropic shipped MCP, batch, files, computer-use, and citations over the last year, and most of them still don't have great developer-side observability tools. Each one is a 200-300 line tool waiting to be built.&lt;/p&gt;

&lt;p&gt;For now, the four-tool suite is at a natural stopping point. The work I'm interested in now is around adoption — making it visible enough that the people who need these tools can find them. If you've read this far and one of the four sounds like it would have saved you time last week, take it for a spin and let me know what's missing.&lt;/p&gt;




&lt;p&gt;All four tools: &lt;a href="https://ferhatatagun.com/tools" rel="noopener noreferrer"&gt;ferhatatagun.com/tools&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Source on &lt;a href="https://github.com/ferhatatagun" rel="noopener noreferrer"&gt;github.com/ferhatatagun&lt;/a&gt;. MIT, BYOK, no backend.&lt;/p&gt;

&lt;p&gt;Articles on each one: &lt;a href="https://ferhatatagun.com/blog" rel="noopener noreferrer"&gt;ferhatatagun.com/blog&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>claude</category>
      <category>anthropic</category>
      <category>opensource</category>
      <category>webdev</category>
    </item>
    <item>
      <title>How I debug Claude agents by replaying their trace</title>
      <dc:creator>Ferhat Atagün</dc:creator>
      <pubDate>Mon, 08 Jun 2026 11:28:37 +0000</pubDate>
      <link>https://dev.to/ferhatatagun/how-i-debug-claude-agents-by-replaying-their-trace-484</link>
      <guid>https://dev.to/ferhatatagun/how-i-debug-claude-agents-by-replaying-their-trace-484</guid>
      <description>&lt;p&gt;Your agent did something weird in production. A user reported it, you found the failed run in your logs, and you're now staring at a JSON file that's 400 messages long, half of them are &lt;code&gt;tool_result&lt;/code&gt; blocks the size of small databases, and somewhere in there is the moment the agent decided to do the wrong thing.&lt;/p&gt;

&lt;p&gt;You can't re-run the agent: the API state has moved on, the tool would behave differently now, the prompt has been updated three times since. You have only the trace.&lt;/p&gt;

&lt;p&gt;The way most of us read agent traces is: open the JSON in an editor, ctrl+F for the tool name we suspect, scroll through walls of escaped strings, try to mentally reconstruct the sequence. It takes thirty minutes, by the end of which you have one of three answers — "yeah I see what went wrong," "I'm pretty sure I see what went wrong," or "I have no idea what went wrong." About a third of the time it's the third one, and you go ship a band-aid that may or may not fix the actual problem.&lt;/p&gt;

&lt;p&gt;The thing nobody talks about is that this isn't a hard problem. The JSON contains all the information. The issue is purely &lt;em&gt;presentational&lt;/em&gt; — it's nearly impossible to read.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Agent traces are a sequence of decisions but stored as a wall of nested JSON. The signal is there; the format is the problem.&lt;/li&gt;
&lt;li&gt;The right primitive isn't a JSON viewer — it's a timeline. Each thought, tool call, tool result, and final answer becomes its own discrete, color-coded step.&lt;/li&gt;
&lt;li&gt;Once you can scrub through the trace step by step, the failure point becomes visually obvious in seconds instead of minutes.&lt;/li&gt;
&lt;li&gt;This is post-hoc, not interactive. You don't need to re-run the agent or hit the API — replay works on the raw trace alone.&lt;/li&gt;
&lt;li&gt;A browser-only tool can do this in 4 seconds. No backend, no key, just paste the JSON.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What an agent trace actually contains
&lt;/h2&gt;

&lt;p&gt;When you save a Claude agent run, you usually persist the &lt;code&gt;messages&lt;/code&gt; array — the full conversation including the model's responses and the tool results you fed back. A six-step agent run looks roughly like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json-doc"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"user"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Find me the cheapest flight from IST to LAX next Tuesday"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"assistant"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"I'll search for flights and check prices."&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"tool_use"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"tu_01"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"search_flights"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"input"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="c1"&gt;...&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"user"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"tool_result"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"tool_use_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"tu_01"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"[&amp;lt;2KB of JSON&amp;gt;]"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"assistant"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Looking at three of those..."&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"tool_use"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"tu_02"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"get_price"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"input"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="c1"&gt;...&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="c1"&gt;// ...four more steps...&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every interesting moment of the agent's behaviour is in there: which tool it picked, what arguments it constructed, what it said about its own reasoning, how it interpreted the result. The structure is fundamentally a &lt;strong&gt;sequence of discrete events&lt;/strong&gt;, not a "document."&lt;/p&gt;

&lt;p&gt;But you read it as a document, because that's what an editor shows you. The brain has to do the work of converting "alternating role: assistant / role: user with tool_result content blocks" into "step 3 was a tool call to get_price with argument X, which returned Y, which the agent then interpreted as Z."&lt;/p&gt;

&lt;p&gt;That conversion is what kills your debugging time. Doing it manually for a 12-step trace takes minutes. Doing it for a 60-step agent on a complex task takes hours.&lt;/p&gt;

&lt;h2&gt;
  
  
  The right primitive: a timeline of decisions
&lt;/h2&gt;

&lt;p&gt;The reframe is: stop reading the trace as JSON, start watching it as a sequence of decisions. Each step is one of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;💭 &lt;strong&gt;Thought&lt;/strong&gt; — the model wrote text (the part of its response that isn't a tool call)&lt;/li&gt;
&lt;li&gt;🔧 &lt;strong&gt;Tool call&lt;/strong&gt; — the model invoked a tool with specific arguments&lt;/li&gt;
&lt;li&gt;📥 &lt;strong&gt;Tool result&lt;/strong&gt; — what came back, fed into the next turn&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Final answer&lt;/strong&gt; — the model's &lt;code&gt;end_turn&lt;/code&gt;, no more tools&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Color-code those four event types. Lay them out in order, one card per event. You now have a timeline you can scrub, step through, and play back. The information density per card is high enough that you can read the entire trace at a glance, and zoom in only on the cards that look suspicious.&lt;/p&gt;

&lt;p&gt;The structural insight: agent debugging is closer to debugging a script with breakpoints than to reading source code. You want to step through, not skim. JSON gives you no steps; the timeline gives you nothing else.&lt;/p&gt;

&lt;h2&gt;
  
  
  The bugs that become obvious in this view
&lt;/h2&gt;

&lt;p&gt;Three failure modes I see repeatedly when I drop a trace into the timeline:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. The wrong tool, picked silently.&lt;/strong&gt; The model called &lt;code&gt;search_archive&lt;/code&gt; when it should have called &lt;code&gt;search_recent&lt;/code&gt;. In JSON this is one line out of 200 that flies past your eye. In the timeline it's a card with a tool name you didn't expect, and you click on it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Hallucinated arguments.&lt;/strong&gt; The model called the right tool but with an argument shape that doesn't match the schema — usually because the schema is ambiguous. In JSON you see &lt;code&gt;{"q": "foo", "limit": "10"}&lt;/code&gt; and don't notice that &lt;code&gt;limit&lt;/code&gt; should have been an integer. In the timeline the tool result card right after shows a 400 error and you trace it back one step.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. The infinite loop precursor.&lt;/strong&gt; Some agents get stuck in a pattern where they keep calling the same tool with slightly different inputs, never reaching a conclusion. In JSON it's a wall of near-identical blocks. In the timeline it's a visual rhythm — five purple cards in a row with the same tool name — that you can see in your peripheral vision the moment you scroll.&lt;/p&gt;

&lt;p&gt;In all three cases, the bug isn't subtle. It just &lt;em&gt;looks&lt;/em&gt; subtle when it's hidden in JSON.&lt;/p&gt;

&lt;h2&gt;
  
  
  What replay gives you that re-running doesn't
&lt;/h2&gt;

&lt;p&gt;The temptation when an agent fails is to re-run it with print statements, see what happens, iterate. Don't. Three reasons:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It costs API calls.&lt;/strong&gt; A failed agent that called 15 tools costs you 15× input tokens to re-run. With caching maybe less; either way, the bill is real. Replay is free.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The API state has moved.&lt;/strong&gt; The tool you call today might return different data than the tool returned during the original run. You're not debugging the original failure anymore; you're debugging &lt;em&gt;whatever happens now&lt;/em&gt;, which might be a totally different bug.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The model is stochastic.&lt;/strong&gt; Even at temperature 0, retries can produce different outputs. Re-running an agent and getting a &lt;em&gt;different&lt;/em&gt; failure mode means you've now got two bugs to investigate. The trace is the canonical artifact of what actually happened.&lt;/p&gt;

&lt;p&gt;Replay sidesteps all three. You're inspecting a frozen artifact, deterministically, at whatever speed you want. The bug doesn't move while you're looking at it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this looks like in agent-replay
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://agentreplay.vercel.app" rel="noopener noreferrer"&gt;&lt;strong&gt;agent-replay&lt;/strong&gt;&lt;/a&gt; is the tool I built for this. Paste your trace into a JSON pane on the left. The right pane renders it as a cinematic timeline:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Each event is a card with an icon and color&lt;/li&gt;
&lt;li&gt;You can press space to play through the trace at 1× speed (one event per second), or scrub manually&lt;/li&gt;
&lt;li&gt;Click any card to see the full content — the thought text, the tool call's input JSON, the raw tool result, expanded&lt;/li&gt;
&lt;li&gt;Filter by event type — "show me only the tool calls" or "show me only the assistant thoughts" — when you want to focus&lt;/li&gt;
&lt;li&gt;The whole thing is in your browser; no key needed, no backend, your trace never leaves the tab&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There's a sample trace seeded on &lt;code&gt;?demo=1&lt;/code&gt; if you want to see what a 12-step agent looks like without copying your own data anywhere.&lt;/p&gt;

&lt;p&gt;The thing I keep finding: the moment I'm debugging is no longer "where in the JSON did the agent screw up." It's "which card looks wrong, and what does the next card show as a consequence." A 30-minute investigation becomes a 30-second one. Not because the tool is doing anything clever — it's just showing the same data in the right shape.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd recommend you do this week
&lt;/h2&gt;

&lt;p&gt;Three escalating moves:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Today (5 minutes):&lt;/strong&gt; Find the last weird agent run you have a trace for. Paste it into agent-replay. See how long it takes to find the failure point. If it's faster than your usual JSON-scrolling approach, you just changed your debugging workflow.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;This week (an afternoon):&lt;/strong&gt; Add a trace-export endpoint to your agent. Every agent run, finished or failed, dumps the &lt;code&gt;messages&lt;/code&gt; array to S3 or a database row. You need the trace before you need to debug it, not after.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;This quarter (a habit):&lt;/strong&gt; When a user reports "the agent did something weird," your first move is to pull the trace and open it in a timeline view, &lt;em&gt;before&lt;/em&gt; you read the user's report carefully. Most of the time you'll know what happened before you finish reading the bug report.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Agent debugging is presented as an emerging engineering discipline. It isn't — it's a tooling problem we've solved many times before for non-AI systems. We just haven't built the tools yet for this one. Once the trace is in the right shape, the bugs are obvious. The work is laying out the data, not interpreting it.&lt;/p&gt;




&lt;p&gt;I shipped this in &lt;a href="https://agentreplay.vercel.app" rel="noopener noreferrer"&gt;&lt;strong&gt;agent-replay&lt;/strong&gt;&lt;/a&gt; — paste a trace, scrub the timeline. No key, no backend, runs in the browser. Source: &lt;a href="https://github.com/ferhatatagun/agent-replay" rel="noopener noreferrer"&gt;github.com/ferhatatagun/agent-replay&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The same SSE client (for traces that include streaming events) also powers three sibling tools — &lt;a href="https://claudoscope-labs.vercel.app" rel="noopener noreferrer"&gt;claudoscope&lt;/a&gt;, &lt;a href="https://prompt-lab-promptly.vercel.app" rel="noopener noreferrer"&gt;prompt-lab&lt;/a&gt;, &lt;a href="https://tool-lab-bice.vercel.app" rel="noopener noreferrer"&gt;tool-lab&lt;/a&gt;. All open-source, all BYOK: &lt;a href="https://ferhatatagun.com/tools" rel="noopener noreferrer"&gt;ferhatatagun.com/tools&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>claude</category>
      <category>anthropic</category>
      <category>agents</category>
      <category>debugging</category>
    </item>
    <item>
      <title>Build the sandbox before you write a single tool</title>
      <dc:creator>Ferhat Atagün</dc:creator>
      <pubDate>Mon, 08 Jun 2026 11:28:05 +0000</pubDate>
      <link>https://dev.to/ferhatatagun/build-the-sandbox-before-you-write-a-single-tool-2ja3</link>
      <guid>https://dev.to/ferhatatagun/build-the-sandbox-before-you-write-a-single-tool-2ja3</guid>
      <description>&lt;p&gt;The first time you ship a Claude agent that uses tools you'll do it the obvious way: design the schema, write the actual tool function, hit the API, parse the &lt;code&gt;tool_use&lt;/code&gt; block, run the function, feed the result back, loop. It works. It also has a fundamental ordering bug:&lt;/p&gt;

&lt;p&gt;You wrote the tools before you knew if they were the right tools.&lt;/p&gt;

&lt;p&gt;By the time you've stood up a database query function, two API calls, and a thing that hits the file system, you've sunk maybe a day. You run the agent. It calls a non-existent tool. It hallucinates an argument shape that doesn't match your schema. It picks the wrong tool when both would have worked. &lt;em&gt;Now&lt;/em&gt; you're going to redesign the schema, and the four real tool implementations you wrote are going in the bin or being rewritten.&lt;/p&gt;

&lt;p&gt;The thing that makes this worse is that the failure mode looks like an "agent quality" problem when it's actually a "premature implementation" problem. The model knew what it wanted; you'd just built the wrong scaffolding around it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tool implementations are the slowest part of agent development; tool &lt;em&gt;design&lt;/em&gt; is the fastest part to get wrong.&lt;/li&gt;
&lt;li&gt;Decouple them: write the tool schemas, run the agent loop with mocked responses, see how the model picks and uses the tools — then write the real implementations only for the tools that survived.&lt;/li&gt;
&lt;li&gt;The right mental model is "you play the role of every tool, by hand" — slow for the agent, fast for you, brutal for bad designs.&lt;/li&gt;
&lt;li&gt;This is a fifteen-minute exercise for a five-tool agent that would otherwise take a day, and it catches design mistakes before they touch your codebase.&lt;/li&gt;
&lt;li&gt;The whole thing fits in a browser tool with no backend.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What "premature implementation" actually looks like
&lt;/h2&gt;

&lt;p&gt;A worked example. I was building a code review agent. My first instinct was four tools:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;read_file&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;read a file from the repo&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;search_code&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;grep across the repo&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;get_diff&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;show the diff for this PR&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;post_comment&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;leave a review comment&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I implemented all four. Real filesystem access. Real git invocation. Real GitHub API call. Probably four hours total. Then I ran the agent on a real PR.&lt;/p&gt;

&lt;p&gt;What happened: the agent called &lt;code&gt;get_diff&lt;/code&gt; first (good), then called &lt;code&gt;search_code&lt;/code&gt; for every single identifier in the diff (catastrophic — the diff had 200 lines, 50 unique identifiers, my rate limit ran out). It never called &lt;code&gt;read_file&lt;/code&gt; because the diff already contained the context. It called &lt;code&gt;post_comment&lt;/code&gt; once at the end with a 4,000-word essay instead of inline comments.&lt;/p&gt;

&lt;p&gt;Three of my four "real" tools were either misused or unused. The agent design was wrong, not the implementations. If I'd run the loop with mocked responses first, I would have:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Noticed it called &lt;code&gt;search_code&lt;/code&gt; 50 times → split the tool into &lt;code&gt;search_code(query, limit=5)&lt;/code&gt; with an explicit budget&lt;/li&gt;
&lt;li&gt;Noticed it never used &lt;code&gt;read_file&lt;/code&gt; → deleted it, saved myself an hour&lt;/li&gt;
&lt;li&gt;Noticed &lt;code&gt;post_comment&lt;/code&gt; was being used as &lt;code&gt;post_essay&lt;/code&gt; → split into &lt;code&gt;post_inline_comment(line, body)&lt;/code&gt; and &lt;code&gt;post_summary(body)&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That intervention takes fifteen minutes when the tools are mocked. It takes a day when they're real.&lt;/p&gt;

&lt;h2&gt;
  
  
  The role-play pattern
&lt;/h2&gt;

&lt;p&gt;The trick is shockingly simple: write your tool schemas, send a real user message to Claude, and when the model produces a &lt;code&gt;tool_use&lt;/code&gt; block, &lt;em&gt;you&lt;/em&gt; hand-type the result and feed it back. The loop runs end-to-end, but you're playing every tool.&lt;/p&gt;

&lt;p&gt;In code, this is the same agent loop everyone writes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;callClaude&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;tools&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stop_reason&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;end_turn&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;toolUses&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;b&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;tool_use&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;toolResults&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;toolUses&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;t&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;tool_result&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;tool_use_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;PROMPT_USER_FOR_RESULT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;  &lt;span class="c1"&gt;// &amp;lt;-- you fill this in&lt;/span&gt;
  &lt;span class="p"&gt;}));&lt;/span&gt;

  &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;assistant&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;toolResults&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The only difference between this and a "real" agent loop is the &lt;code&gt;PROMPT_USER_FOR_RESULT&lt;/code&gt; call — instead of executing a function, it shows you what the model called and what arguments it used, and waits for you to type the answer.&lt;/p&gt;

&lt;p&gt;What that produces is surprisingly information-dense:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Did the model pick the tool I expected?&lt;/strong&gt; If it took a different path you didn't anticipate, your schema is signaling something other than what you meant.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Did the input shape match my JSON schema?&lt;/strong&gt; If the model is straining to fit the schema, the schema is too rigid or too loose.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;How many tools did it chain?&lt;/strong&gt; A 12-step tool chain to answer one question is a sign you decomposed the toolset wrong.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Did it ask follow-up questions before tool use?&lt;/strong&gt; That's good — it means the model is trying to disambiguate. If it doesn't, your prompt isn't asking it to.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You see all of this in a five-minute conversation, before you've written a single line of real implementation.&lt;/p&gt;

&lt;h2&gt;
  
  
  When you can stop role-playing
&lt;/h2&gt;

&lt;p&gt;The sandbox isn't a permanent state. It's a phase. You run it until you've answered three questions:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Are these the right tools?&lt;/strong&gt; — Some get deleted, some get split, some get merged. Usually 30-50% of your initial toolset doesn't survive contact with a real prompt.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Are the schemas tight enough?&lt;/strong&gt; — You see the model picking awkward argument values; you constrain the schema (enum instead of string, required instead of optional). &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Does the agent loop terminate?&lt;/strong&gt; — Some agents will keep calling tools forever if their stopping criteria are vague. The mock-response loop surfaces this immediately because &lt;em&gt;you're&lt;/em&gt; the one getting stuck typing responses.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;When those three are stable on a handful of real prompts, you write the real implementations. The implementation work is now de-risked: you know which tools to actually build, and the schemas are settled.&lt;/p&gt;

&lt;p&gt;The thing you save isn't the implementation time itself — it's the rework. Writing a tool from scratch is fast. Rewriting a tool because its schema was wrong, then updating the prompt because the new schema needs different framing, then re-running every regression input, is what eats days.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this looks like in tool-lab
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://tool-lab-bice.vercel.app" rel="noopener noreferrer"&gt;&lt;strong&gt;tool-lab&lt;/strong&gt;&lt;/a&gt; is what I built to do this without setting up a project each time. Three panes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;┌─&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Tools&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;(JSON&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;editor)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;─────────┬─&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Conversation&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;────────────────────┐&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;                             &lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="err"&gt;user:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;review&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;this&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;PR&lt;/span&gt;&lt;span class="w"&gt;             &lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"read_file"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="err"&gt;assistant:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;I'll&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;get&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;diff.&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"search_code"&lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;tool_use:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;get_diff()&lt;/span&gt;&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"get_diff"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="err"&gt;←&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;tool_result:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;&amp;lt;YOU&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;TYPE&amp;gt;&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"post_comment"&lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="err"&gt;assistant:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="w"&gt;                   &lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;                             &lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;                                    &lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;└───────────────────────────────┴───────────────────────────────────┘&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You paste your tool schemas on the left. Type the user message. The model streams its response into the right pane. When it lands a &lt;code&gt;tool_use&lt;/code&gt; block, the conversation pauses with a text field for the result. You type whatever the tool would have returned — JSON, a string, an error, whatever. Hit continue. The loop runs again with your fake result included.&lt;/p&gt;

&lt;p&gt;It's about 12KB of relevant logic on top of the shared SSE client I wrote about &lt;a href="https://ferhatatagun.com/blog/browser-only-claude-streaming" rel="noopener noreferrer"&gt;here&lt;/a&gt;. BYOK, no backend, your tool schemas and conversations live in &lt;code&gt;localStorage&lt;/code&gt; only. There's a demo conversation seeded on &lt;code&gt;?demo=1&lt;/code&gt; so you can see the loop run without writing tools yourself.&lt;/p&gt;

&lt;p&gt;The thing I keep noticing: the tool-lab session for any new agent takes ten to twenty minutes. The agent design that comes out of it is consistently 30-50% smaller than what I would have written from intuition. Smaller agents with fewer, more focused tools are also dramatically easier to reason about when they go wrong in production — which is the other dividend of doing the sandbox phase.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd recommend you do this week
&lt;/h2&gt;

&lt;p&gt;Three escalating moves:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Today (10 minutes):&lt;/strong&gt; Pick an agent you're already building. Paste its tool schemas into tool-lab, send a real user message, see what happens. If the agent picks the wrong tools or uses the right ones in surprising ways, you've just learned something.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;This sprint (an afternoon):&lt;/strong&gt; Make "sandbox before implementation" the default for new agents on your team. Stand up the tool schemas first, role-play five representative prompts, then write the implementations only for tools that survived. Track the count: how many initial tools made it through.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;This quarter (a habit):&lt;/strong&gt; When something goes wrong with an agent in production — wrong tool picked, weird argument shape, infinite loop — drop the trace into the sandbox before debugging the implementation. The bug is often in the design, not the code.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Tool implementations are not the hard part of agent development. &lt;em&gt;Tool design&lt;/em&gt; is. The thing that separates teams that ship reliable agents from teams that ship agents that "mostly work" isn't the quality of their tool functions; it's how many bad tool designs they killed before writing the function.&lt;/p&gt;

&lt;p&gt;You don't need a framework for this. You don't need a vendor. You need fifteen minutes and a willingness to play the role of every tool, by hand, until you know which ones deserve to be real.&lt;/p&gt;




&lt;p&gt;I shipped this in &lt;a href="https://tool-lab-bice.vercel.app" rel="noopener noreferrer"&gt;&lt;strong&gt;tool-lab&lt;/strong&gt;&lt;/a&gt; — define tools, mock responses, watch the agent loop. BYOK, no backend, runs in the browser. Source: &lt;a href="https://github.com/ferhatatagun/tool-lab" rel="noopener noreferrer"&gt;github.com/ferhatatagun/tool-lab&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The same SSE client also powers three sibling tools — &lt;a href="https://claudoscope-labs.vercel.app" rel="noopener noreferrer"&gt;claudoscope&lt;/a&gt;, &lt;a href="https://agentreplay.vercel.app" rel="noopener noreferrer"&gt;agent-replay&lt;/a&gt;, &lt;a href="https://prompt-lab-promptly.vercel.app" rel="noopener noreferrer"&gt;prompt-lab&lt;/a&gt;. All open-source, all BYOK: &lt;a href="https://ferhatatagun.com/tools" rel="noopener noreferrer"&gt;ferhatatagun.com/tools&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>claude</category>
      <category>anthropic</category>
      <category>agents</category>
      <category>llm</category>
    </item>
    <item>
      <title>Your prompt isn't better. You just remember it being better.</title>
      <dc:creator>Ferhat Atagün</dc:creator>
      <pubDate>Mon, 08 Jun 2026 11:27:10 +0000</pubDate>
      <link>https://dev.to/ferhatatagun/your-prompt-isnt-better-you-just-remember-it-being-better-3h52</link>
      <guid>https://dev.to/ferhatatagun/your-prompt-isnt-better-you-just-remember-it-being-better-3h52</guid>
      <description>&lt;p&gt;Every developer who has shipped a Claude-powered feature has had this conversation with themselves:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"OK, the old prompt was too long, this one's tighter — &lt;em&gt;feels&lt;/em&gt; like it's giving better answers… and faster too, I think? Let's ship it."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;You ship it. A week later something feels off — maybe outputs are flakier on the edge cases, maybe the bill went up, maybe a coworker tells you "the AI doesn't get it anymore." You don't remember the exact previous prompt. You don't have a baseline. You change it back. Or you don't, and live with a quiet regression for a month.&lt;/p&gt;

&lt;p&gt;I have done this maybe forty times. Most of us have. The reason isn't that prompt iteration is hard. The reason is that &lt;em&gt;evaluating&lt;/em&gt; prompt iteration is hard, and we don't have the tooling for it, so we substitute taste — which works fine until it doesn't.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"It feels better" is not data. Your sample size is one query, your memory is recent, your prior is sunk cost.&lt;/li&gt;
&lt;li&gt;The minimum useful comparison is the same input through two prompts in parallel, surfacing three numbers: output (do they say the same thing?), latency (how long did each take?), cost (how much did each spend?).&lt;/li&gt;
&lt;li&gt;Models change too — comparing GPT-style verbose system prompts on Sonnet 4.5 vs Haiku 4.5 surfaces ~10× cost differences for outputs you'd score the same.&lt;/li&gt;
&lt;li&gt;Running them in parallel makes it fair: same time of day, same API state, same input. Running them sequentially in a chat window does not.&lt;/li&gt;
&lt;li&gt;A browser-only tool can do this in 4 seconds. You don't need a benchmarking framework. You need to see them side by side.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What "vibes" actually costs
&lt;/h2&gt;

&lt;p&gt;The trap with prompt tuning is that the &lt;em&gt;only&lt;/em&gt; dimension a chat-style UI shows you is the output text. You read it, decide if it sounds right, and move on. Three things get hidden:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Latency.&lt;/strong&gt; Did this take 3 seconds or 11? You squinted, kind of remembered, but you weren't watching a stopwatch. Across a thousand production requests this difference is the gap between "snappy" and "slow."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Cost.&lt;/strong&gt; The verbose system prompt that produces beautiful structured output uses 4,000 input tokens. The terse one uses 600. They both produce ~800 output tokens. At Sonnet pricing that's the difference between $14 and $4 per thousand calls. You don't see this difference looking at one response.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Output drift.&lt;/strong&gt; "Cleaner" outputs sometimes mean the model lost a useful constraint. The polite preamble you stripped out was actually doing something. The structured format you added looks neat but truncates on long inputs. Side-by-side reveals this; sequential doesn't, because you remember the gist of the previous answer, not the specifics.&lt;/p&gt;

&lt;p&gt;The whole point of A/B testing is to lift all three of these into the same field of view, on the same input, at the same time. That's it. That's the entire idea. The reason most of us don't do it is that we don't have the tool — and the friction of switching between two tabs, hitting send twice, copying output into a diff viewer, and looking up cost in the dashboard is enough to make us shrug and ship.&lt;/p&gt;

&lt;h2&gt;
  
  
  Same input, two prompts, parallel
&lt;/h2&gt;

&lt;p&gt;The mechanism is unspectacular:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;outA&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;outB&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
  &lt;span class="nf"&gt;runClaude&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;system&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;promptA&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="nf"&gt;runClaude&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;system&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;promptB&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the core. Two requests fired in parallel against the same &lt;code&gt;messages&lt;/code&gt;. The trick is that both streams are happening simultaneously — same network conditions, same API load, same time-of-day cache warmth. Sequential A→B isn't a fair comparison; if the API was congested for the first call and cached the second, you're measuring weather, not signal.&lt;/p&gt;

&lt;p&gt;What you do with the two outputs is where it gets interesting. The boring version: log both, eyeball, pick one. The version that actually works: side-by-side render, each with its own latency clock, each with its own token count and cost dollars, each with a diff highlight if you want to see exactly where they disagree.&lt;/p&gt;

&lt;p&gt;The thing I've found is that 80% of the time both prompts produce &lt;em&gt;substantively equivalent&lt;/em&gt; outputs. The reason to pick one is purely on cost or latency — there's no semantic improvement, you just got a 4× cheaper version of the same answer. The remaining 20% is where the outputs actually diverge meaningfully, and that's where eyeballs are needed, but at least now you know to look.&lt;/p&gt;

&lt;h2&gt;
  
  
  What "better" looks like in numbers
&lt;/h2&gt;

&lt;p&gt;A concrete example from last week. I had two versions of a system prompt for a code-review tool:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Version A&lt;/strong&gt; — 1,800 tokens, full taxonomy of issue types, examples for each, explicit JSON schema:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You are a senior staff engineer reviewing a pull request. For each
issue you find, classify it under one of:
- correctness (the code is wrong)
- security (the code is exploitable)
- performance (the code is slow)
- maintainability (the code is hard to read)
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Version B&lt;/strong&gt; — 280 tokens, no taxonomy, schema implied by an example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Review this code. For each problem, return JSON like:
[{"severity": "high"|"medium"|"low", "line": 42, "issue": "..."}]
Don't comment on style; focus on bugs and security.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same input (a 600-line Python file). Both went to Sonnet 4.5. Side-by-side run:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Version A&lt;/th&gt;
&lt;th&gt;Version B&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Input tokens&lt;/td&gt;
&lt;td&gt;2,640&lt;/td&gt;
&lt;td&gt;1,120&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Output tokens&lt;/td&gt;
&lt;td&gt;820&lt;/td&gt;
&lt;td&gt;740&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Latency&lt;/td&gt;
&lt;td&gt;5.3s&lt;/td&gt;
&lt;td&gt;3.1s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost&lt;/td&gt;
&lt;td&gt;$0.0202&lt;/td&gt;
&lt;td&gt;$0.0145&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Issues found&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Looking at the diff: both flagged the same 5 critical issues. Version A also flagged a &lt;code&gt;# TODO&lt;/code&gt; as a maintainability issue and split a complex function into two suggested refactors. Version B was tighter — it caught fewer minor things but every single thing it caught was actionable.&lt;/p&gt;

&lt;p&gt;I shipped B. Not because it was "better" in a soft sense; because it was 28% cheaper and 41% faster for outputs that a human would consider equivalent on the work that mattered. &lt;em&gt;That&lt;/em&gt; is what an A/B framework gives you that a chat UI doesn't: a basis for the decision that isn't "feels right."&lt;/p&gt;

&lt;p&gt;If I had only run version B sequentially after deleting version A, I would have lost the comparison and convinced myself version B was either much better or much worse than it actually was.&lt;/p&gt;

&lt;h2&gt;
  
  
  The cross-model angle
&lt;/h2&gt;

&lt;p&gt;The same setup also surfaces something subtle that I think most teams underuse: the &lt;strong&gt;right model is also a prompt choice&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Same prompt, two models — Sonnet 4.5 vs Haiku 4.5 — on the same input:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Sonnet 4.5&lt;/th&gt;
&lt;th&gt;Haiku 4.5&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Latency&lt;/td&gt;
&lt;td&gt;4.1s&lt;/td&gt;
&lt;td&gt;0.9s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost (input+output)&lt;/td&gt;
&lt;td&gt;$0.011&lt;/td&gt;
&lt;td&gt;$0.0008&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Output quality&lt;/td&gt;
&lt;td&gt;9/10&lt;/td&gt;
&lt;td&gt;8/10&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For the right kind of task, that's a ~13× cost reduction with a quality drop most users would never notice in a UI. The wrong kind of task — anything requiring complex multi-step reasoning — and Haiku will whiff in ways Sonnet wouldn't, and the comparison protects you from that too. You don't have to &lt;em&gt;guess&lt;/em&gt; which kind of task you have; you can measure it on five real inputs in five minutes.&lt;/p&gt;

&lt;h2&gt;
  
  
  How prompt-lab does this
&lt;/h2&gt;

&lt;p&gt;I built &lt;a href="https://prompt-lab-promptly.vercel.app" rel="noopener noreferrer"&gt;&lt;strong&gt;prompt-lab&lt;/strong&gt;&lt;/a&gt; because the friction of A/B testing prompts in my own work was high enough that I was skipping the step and shipping by vibes. The tool's whole job is to remove that friction:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Two prompt panes. Paste prompt A on the left, prompt B on the right.&lt;/li&gt;
&lt;li&gt;One input pane. Type the user message once.&lt;/li&gt;
&lt;li&gt;Hit run. Both responses stream into their respective panes simultaneously.&lt;/li&gt;
&lt;li&gt;Below each pane: a small scoreboard with input tokens, output tokens, latency, cost.&lt;/li&gt;
&lt;li&gt;At the bottom: a verdict line — "A: $0.0202 / 5.3s · B: $0.0145 / 3.1s · B 28% cheaper, 41% faster."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's the entire UI. It's a browser tool, BYOK, no backend. It's about 8KB of relevant logic plus the streaming client from the &lt;a href="https://medium.com/@ferhatatagun/building-a-streaming-claude-client-in-the-browser-without-the-sdk-4ce8a9407d2c" rel="noopener noreferrer"&gt;previous post&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;You can also do same-prompt-different-model, or different-prompt-different-model. The arena doesn't care which one you're testing — you set the two columns and hit run.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd recommend you do this week
&lt;/h2&gt;

&lt;p&gt;Three steps, increasing in effort:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Today (5 minutes):&lt;/strong&gt; Open prompt-lab. Take whatever prompt your team is currently shipping. Make a shorter version of it. Run them both on three real inputs. If the shorter one wins on cost+latency with no semantic loss on the inputs you care about, you just paid for your week.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;This sprint (an afternoon):&lt;/strong&gt; Build a small eval harness. Pick 10 representative inputs that span your real traffic. Run every prompt change through them before merging. Doesn't need to be fancy — a JSON file of inputs and a script that diffs outputs is enough to catch the worst regressions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;This quarter (a project):&lt;/strong&gt; Make A/B comparison part of your prompt review process. Every PR that changes a prompt should include the run output for the same 10 inputs, with the cost and latency numbers in the description. Same energy as showing test results in a code review.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The economics of LLM apps are increasingly about prompt design and model choice. The teams that compete will be the ones that measure both. The teams that don't will keep shipping vibes-based prompt changes and wondering why the bill keeps creeping up while users complain it "feels worse."&lt;/p&gt;

&lt;p&gt;You don't need to outsmart your future self. You just need to make it possible for them to look back and know what was actually changing.&lt;/p&gt;




&lt;p&gt;I shipped this in &lt;a href="https://prompt-lab-promptly.vercel.app" rel="noopener noreferrer"&gt;&lt;strong&gt;prompt-lab&lt;/strong&gt;&lt;/a&gt; — two prompts side by side, BYOK, no backend, runs in the browser. Source: &lt;a href="https://github.com/ferhatatagun/prompt-lab" rel="noopener noreferrer"&gt;github.com/ferhatatagun/prompt-lab&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The same SSE client also powers three sibling tools — &lt;a href="https://claudoscope-labs.vercel.app" rel="noopener noreferrer"&gt;claudoscope&lt;/a&gt;, &lt;a href="https://agentreplay.vercel.app" rel="noopener noreferrer"&gt;agent-replay&lt;/a&gt;, &lt;a href="https://tool-lab-bice.vercel.app" rel="noopener noreferrer"&gt;tool-lab&lt;/a&gt;. All open-source, all BYOK: &lt;a href="https://ferhatatagun.com/tools" rel="noopener noreferrer"&gt;ferhatatagun.com/tools&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>claude</category>
      <category>anthropic</category>
      <category>llm</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Prompt caching is the cheapest Claude optimization. Nobody measures it.</title>
      <dc:creator>Ferhat Atagün</dc:creator>
      <pubDate>Mon, 08 Jun 2026 11:27:03 +0000</pubDate>
      <link>https://dev.to/ferhatatagun/prompt-caching-is-the-cheapest-claude-optimization-nobody-measures-it-1nga</link>
      <guid>https://dev.to/ferhatatagun/prompt-caching-is-the-cheapest-claude-optimization-nobody-measures-it-1nga</guid>
      <description>&lt;p&gt;Pull up the last week of Anthropic API bills from any team shipping a Claude-powered product. Two out of three of them are paying for context they could be reading from cache for one-tenth the price. Most of them don't know it, because the dashboard doesn't tell them and the SDKs don't either — by the time the response lands, the only number anyone looks at is &lt;code&gt;output_tokens&lt;/code&gt;, and even then mostly when something seems expensive.&lt;/p&gt;

&lt;p&gt;The information is in every response. Anthropic puts it in &lt;code&gt;usage&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"usage"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"input_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;312&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"cache_creation_input_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;4180&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"cache_read_input_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"output_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;187&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Four numbers. The first time a cached prompt runs you pay 1.25× the input price to &lt;em&gt;write&lt;/em&gt; the cache. Every subsequent call within the TTL pays 0.1× to &lt;em&gt;read&lt;/em&gt; it. The ratio between those two lines is the difference between a $3,000/month bill and a $300/month one. And almost no one is graphing it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Every Claude response carries cache-hit data in &lt;code&gt;usage&lt;/code&gt;. Most apps log it nowhere.&lt;/li&gt;
&lt;li&gt;The first call after a cache miss costs &lt;code&gt;1.25× input&lt;/code&gt; extra; every hit after costs &lt;code&gt;0.1× input&lt;/code&gt;. Break-even is two reads.&lt;/li&gt;
&lt;li&gt;The cache TTL is 5 minutes by default. A request pattern that fires once every six minutes is paying the write penalty forever and getting zero benefit.&lt;/li&gt;
&lt;li&gt;The fix is observability, not code: graph cache hit ratio over time, alert when it dips, and you'll find the bug before the invoice does.&lt;/li&gt;
&lt;li&gt;A 150-line browser tool is enough to do this for any project that streams from the Messages API.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What the four numbers actually mean
&lt;/h2&gt;

&lt;p&gt;When you send a request with &lt;code&gt;cache_control: { type: "ephemeral" }&lt;/code&gt; somewhere in your messages, the API checks if it's seen an identical prefix in the last 5 minutes. There are three outcomes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Cache miss, new content.&lt;/strong&gt; The full prompt is processed normally. &lt;code&gt;input_tokens&lt;/code&gt; reflects the uncached portion; &lt;code&gt;cache_creation_input_tokens&lt;/code&gt; reflects what got written into cache for next time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cache hit.&lt;/strong&gt; The cached prefix is read at 10% the price. &lt;code&gt;cache_read_input_tokens&lt;/code&gt; shows what was read; &lt;code&gt;input_tokens&lt;/code&gt; is just the new suffix.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;TTL expired.&lt;/strong&gt; Same shape as a miss — you pay the creation surcharge again.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;So a single response tells you exactly which of these three happened. Not "approximately." Exactly. Per request. For free.&lt;/p&gt;

&lt;p&gt;The pricing math (Sonnet 4.5, June 2026) shapes up like this for a 5,000-token system prompt that gets queried once and then again four minutes later:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;First call&lt;/th&gt;
&lt;th&gt;Second call&lt;/th&gt;
&lt;th&gt;Total&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;No caching&lt;/td&gt;
&lt;td&gt;5,000 × $3 = $0.015&lt;/td&gt;
&lt;td&gt;5,000 × $3 = $0.015&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$0.030&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cache, hit&lt;/td&gt;
&lt;td&gt;5,000 × $3.75 = $0.019&lt;/td&gt;
&lt;td&gt;5,000 × $0.30 = $0.0015&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$0.020&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cache, miss (TTL out)&lt;/td&gt;
&lt;td&gt;5,000 × $3.75 = $0.019&lt;/td&gt;
&lt;td&gt;5,000 × $3.75 = $0.019&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$0.038&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The third row is the failure mode. You enabled caching, you're paying the write penalty, and nobody's actually hitting the cache. Without measurement, this row looks identical to the second in your code — same headers, same prompt structure, same response — but it's 90% more expensive than not caching at all.&lt;/p&gt;

&lt;h2&gt;
  
  
  How a bad cache hit ratio sneaks in
&lt;/h2&gt;

&lt;p&gt;Three patterns I've watched teams ship and then quietly bleed money over:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Per-user system prompts.&lt;/strong&gt; Someone interpolated the user's name or org ID into the system prompt to feel "personalized." Every cache write is now per-user, and unless that user fires a second request within five minutes, every call pays the creation surcharge. The fix is moving the personalization into the user message and keeping the system prompt static — but you only see this fix is needed when the hit ratio graph is flat at zero.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Subtly drifting prompts.&lt;/strong&gt; Maybe you append the current timestamp, maybe a "today is" line, maybe you regenerate a list of available tools that arrives in a non-deterministic order. The cache key is the exact byte sequence; one character of drift and you've invalidated the whole prefix. Tools that serialize tool definitions before sending are an especially fun source of this — &lt;code&gt;JSON.stringify&lt;/code&gt; on an object with shuffled keys produces different bytes, no hit.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Wrong TTL for your traffic pattern.&lt;/strong&gt; A chatbot that gets ~one message every ten minutes has a structural mismatch with a 5-minute ephemeral cache. You're paying the write penalty on every conversation turn. Either bump to the 1-hour cache (more expensive write, way longer life) or accept that caching isn't economical for your traffic shape — but you need the data to make either decision.&lt;/p&gt;

&lt;p&gt;All three of these are invisible from a code review. They're only visible in the usage telemetry.&lt;/p&gt;

&lt;h2&gt;
  
  
  The minimum viable observability
&lt;/h2&gt;

&lt;p&gt;You don't need a metrics stack for this. You need to log four fields per request and chart them. The unhelpful version is the one most teams have:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;claude response&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;output_tokens&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The version that pays for itself in one week is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;u&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;hitRate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cache_read_input_tokens&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; 
                &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cache_read_input_tokens&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cache_creation_input_tokens&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;claude.usage&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;input_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;output&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;output_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;cache_create&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cache_creation_input_tokens&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;cache_read&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cache_read_input_tokens&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;hit_rate&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;hitRate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;cost_estimate&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;estimateCost&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;u&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;hit_rate&lt;/code&gt; field is the one that matters. Group by route, by model, by user-agent — whatever your traffic dimensions are. Anything trending toward zero on a cache-using endpoint is a money leak.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;cost_estimate&lt;/code&gt; is what makes the dashboard land in conversations with non-engineers. Anthropic publishes pricing per token tier; the conversion is mechanical:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;estimateCost&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;u&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Usage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;p&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;pricing&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt; &lt;span class="c1"&gt;// { input, output, cache_write, cache_read }&lt;/span&gt;
  &lt;span class="k"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nx"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;input_tokens&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
    &lt;span class="nx"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;output_tokens&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;output&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cache_creation_input_tokens&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cache_write&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cache_read_input_tokens&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cache_read&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="nx"&gt;_000_000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. Five lines of arithmetic and you've got per-request dollars on every Claude call your app makes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I built a tool for this anyway
&lt;/h2&gt;

&lt;p&gt;I built &lt;a href="https://claudoscope-labs.vercel.app" rel="noopener noreferrer"&gt;&lt;strong&gt;claudoscope&lt;/strong&gt;&lt;/a&gt; because I wanted to see this data live, while the response was streaming, without instrumenting whatever app I was iterating on. The use case is "I'm about to ship a prompt change, did my cache behavior just regress?" — the slow, expensive way is deploying it and looking at logs an hour later; the fast way is pasting the request into a tool that tells you in 4 seconds.&lt;/p&gt;

&lt;p&gt;The whole thing is a browser-only client. Bring your own key, no backend. Every event from the SSE stream is parsed and the &lt;code&gt;usage&lt;/code&gt; object is broken out into a panel:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─ X-Ray ────────────────────────────────────────┐
│ input         312      $0.0009                 │
│ cache write 4,180      $0.0157  ◄─ first run  │
│ cache read      0      $0.0000                 │
│ output        187      $0.0028                 │
│ ─────────────                                  │
│ total                  $0.0194                 │
│                                                │
│ hit ratio: 0% (cold) — re-run within 5m       │
└────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Hit "send" a second time within the TTL and the bars rearrange — cache write goes to zero, cache read fills, the cost number drops by 90%. It's the kind of thing that's obvious once you see it move and invisible if you don't.&lt;/p&gt;

&lt;p&gt;It's about 100KB gzipped and the source is in &lt;a href="https://github.com/ferhatatagun/claudoscope" rel="noopener noreferrer"&gt;one file&lt;/a&gt;. The pricing tier logic is in another. There's no third file.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd actually recommend you do today
&lt;/h2&gt;

&lt;p&gt;The order of operations, in increasing effort:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Right now (5 minutes):&lt;/strong&gt; Open claudoscope, paste your most expensive prompt, run it twice. Look at the difference. If the hit ratio isn't ~99% on the second call, you have a cacheability bug, not an optimization opportunity.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;This week (an afternoon):&lt;/strong&gt; Add the usage logging block above to every Claude call site in your app. Ship it. Don't bother building a dashboard yet — &lt;code&gt;grep&lt;/code&gt; your logs and you'll find the worst offenders in fifteen minutes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;This month (a sprint):&lt;/strong&gt; Move the four &lt;code&gt;usage&lt;/code&gt; fields into your real metrics pipeline (Datadog/Honeycomb/Grafana/whatever). Graph cache hit ratio by endpoint. Alert when it drops below your floor.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Optional (if you're me):&lt;/strong&gt; Build the visualizer because seeing it move in real time is the thing that makes it stick.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Three out of four of those are configuration, not code. The interesting part isn't the implementation; it's that almost nobody has done it. The teams I've talked to who do have it — without exception — found a cache misconfiguration in the first week of dashboards and saved more than the work cost them. The teams who don't have it are usually paying the cache creation surcharge for nothing.&lt;/p&gt;

&lt;p&gt;The Anthropic API gives you everything you need to know whether your caching is working. The only question is whether you look.&lt;/p&gt;




&lt;p&gt;I shipped this visualization in &lt;a href="https://claudoscope-labs.vercel.app" rel="noopener noreferrer"&gt;&lt;strong&gt;claudoscope&lt;/strong&gt;&lt;/a&gt; — bring-your-own-key, no backend, runs in the browser. Source: &lt;a href="https://github.com/ferhatatagun/claudoscope" rel="noopener noreferrer"&gt;github.com/ferhatatagun/claudoscope&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The same SSE client also powers three sibling tools — &lt;a href="https://agentreplay.vercel.app" rel="noopener noreferrer"&gt;agent-replay&lt;/a&gt;, &lt;a href="https://prompt-lab-promptly.vercel.app" rel="noopener noreferrer"&gt;prompt-lab&lt;/a&gt;, &lt;a href="https://tool-lab-bice.vercel.app" rel="noopener noreferrer"&gt;tool-lab&lt;/a&gt;. All open-source, all BYOK: &lt;a href="https://ferhatatagun.com/tools" rel="noopener noreferrer"&gt;ferhatatagun.com/tools&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>claude</category>
      <category>anthropic</category>
      <category>llm</category>
      <category>observability</category>
    </item>
    <item>
      <title>Building a streaming Claude client in the browser — without the SDK</title>
      <dc:creator>Ferhat Atagün</dc:creator>
      <pubDate>Mon, 08 Jun 2026 11:26:32 +0000</pubDate>
      <link>https://dev.to/ferhatatagun/building-a-streaming-claude-client-in-the-browser-without-the-sdk-5f80</link>
      <guid>https://dev.to/ferhatatagun/building-a-streaming-claude-client-in-the-browser-without-the-sdk-5f80</guid>
      <description>&lt;p&gt;I wanted to call Claude from a browser. The Anthropic SDK said no — sort of.&lt;/p&gt;

&lt;p&gt;When I tried &lt;code&gt;import Anthropic from "@anthropic-ai/sdk"&lt;/code&gt; in a Next.js app, the bundler crashed. The error pointed at &lt;code&gt;node:fs/promises&lt;/code&gt;, deep inside the package — an agent-toolset module that reads files from disk and obviously cannot run in a browser. It isn't optional code; it's pulled in by the SDK's main client entry.&lt;/p&gt;

&lt;p&gt;So either I waited for a browser-clean entry point (eventually, maybe), or I talked to the Messages API directly. The endpoint is HTTP. The streaming format is Server-Sent Events. I'd done this for OpenAI before — how hard could it be?&lt;/p&gt;

&lt;p&gt;Turns out: about 150 lines of TypeScript for a usable client, and the result is cleaner than the SDK for the kind of tool I was building. Here's what that took and why I'd recommend it for anything browser-side that touches the Claude API.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The official SDK pulls in Node-only modules and breaks browser bundles.&lt;/li&gt;
&lt;li&gt;Direct &lt;code&gt;fetch&lt;/code&gt; works once you send &lt;code&gt;anthropic-dangerous-direct-browser-access: true&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;The streaming format is straightforward SSE — split events on &lt;code&gt;\n\n&lt;/code&gt;, parse &lt;code&gt;data:&lt;/code&gt; lines.&lt;/li&gt;
&lt;li&gt;The only mild gotcha is &lt;code&gt;tool_use&lt;/code&gt; blocks: their &lt;code&gt;input&lt;/code&gt; arrives as &lt;code&gt;input_json_delta&lt;/code&gt; chunks you accumulate and parse at &lt;code&gt;content_block_stop&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Hand-rolled means tiny bundle, fewer abstractions, full visibility into what the protocol is doing.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The CORS unlock
&lt;/h2&gt;

&lt;p&gt;Browsers won't let you &lt;code&gt;fetch()&lt;/code&gt; &lt;code&gt;https://api.anthropic.com&lt;/code&gt; by default. Anthropic ships a flag to allow it: send &lt;code&gt;anthropic-dangerous-direct-browser-access: true&lt;/code&gt; and CORS opens up. The header's name is a warning — keys typed into a browser are visible to anyone with devtools open. For a bring-your-own-key developer tool that's fine; for a production app shipping a server-side key, it isn't.&lt;/p&gt;

&lt;p&gt;With the header in place, a minimal request looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://api.anthropic.com/v1/messages&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;content-type&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;x-api-key&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;anthropic-version&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;2023-06-01&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;anthropic-dangerous-direct-browser-access&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;true&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;max_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Hello.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="na"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;}),&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;stream: true&lt;/code&gt; gives back a Server-Sent Events stream. The response body is a &lt;code&gt;ReadableStream&amp;lt;Uint8Array&amp;gt;&lt;/code&gt; — chunks of bytes you decode as text. Events are delimited by a blank line; each event is a couple of lines (&lt;code&gt;event: &amp;lt;type&amp;gt;&lt;/code&gt; and &lt;code&gt;data: &amp;lt;json&amp;gt;&lt;/code&gt;), and the meaningful payload lives in &lt;code&gt;data:&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the stream actually looks like
&lt;/h2&gt;

&lt;p&gt;For a plain text response, the SSE event sequence is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;event:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;message_start&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;data:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"message_start"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"usage"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;event:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;content_block_start&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;data:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"content_block_start"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"index"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"content_block"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;event:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;content_block_delta&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;data:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"content_block_delta"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"index"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"delta"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"text_delta"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Hello"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;event:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;content_block_delta&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;data:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"content_block_delta"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"index"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"delta"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"text_delta"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;" there"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;event:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;content_block_stop&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;data:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"content_block_stop"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"index"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;event:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;message_delta&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;data:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"message_delta"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"delta"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"stop_reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"end_turn"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"usage"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"output_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;event:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;message_stop&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;data:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"message_stop"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each &lt;code&gt;content_block_delta&lt;/code&gt; carries a partial token. Concatenate the &lt;code&gt;text&lt;/code&gt; fields per &lt;code&gt;index&lt;/code&gt; and you have the streamed message. Done — for plain text.&lt;/p&gt;

&lt;p&gt;Three things make this slightly more interesting:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multiple content blocks per message (text plus tool_use, or several tool_use blocks).&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;tool_use&lt;/code&gt; block's &lt;code&gt;input&lt;/code&gt; arrives as a sequence of partial-JSON deltas, not all at once.&lt;/li&gt;
&lt;li&gt;Aborting cleanly when the user clicks Stop.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Parsing the stream
&lt;/h2&gt;

&lt;p&gt;The parser is small. Read chunks, accumulate them in a buffer, split on &lt;code&gt;\n\n&lt;/code&gt; (the SSE event separator), and process each event:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;reader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getReader&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;decoder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;TextDecoder&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;buffer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;done&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;reader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;done&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nx"&gt;buffer&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nx"&gt;decoder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;sep&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;sep&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;indexOf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;rawEvent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;sep&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;buffer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sep&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;dataLine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;rawEvent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;l&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startsWith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;data:&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;dataLine&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;continue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;evt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;dataLine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
    &lt;span class="nf"&gt;handle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;evt&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;TextDecoder&lt;/code&gt; with &lt;code&gt;{ stream: true }&lt;/code&gt; matters — without it you'll get garbled UTF-8 when a multi-byte character lands on a chunk boundary. Anthropic sends a lot of em-dashes; ask me how I know.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;handle(evt)&lt;/code&gt; switches on &lt;code&gt;evt.type&lt;/code&gt; and updates state. For text-only, the only events that move the UI are &lt;code&gt;content_block_delta&lt;/code&gt; (append text to the current text block) and &lt;code&gt;message_delta&lt;/code&gt; (final usage). For a full client, I keep a &lt;code&gt;blocks: Block[]&lt;/code&gt; array indexed by &lt;code&gt;evt.index&lt;/code&gt; and mutate the matching block as deltas arrive.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tool use: partial-JSON deltas
&lt;/h2&gt;

&lt;p&gt;Tool calling is where this gets a little trickier. When the model decides to call a tool, you get a &lt;code&gt;content_block_start&lt;/code&gt; with &lt;code&gt;content_block: { type: "tool_use", id, name, input: {} }&lt;/code&gt; — the &lt;code&gt;input&lt;/code&gt; is empty. The arguments arrive in &lt;code&gt;content_block_delta&lt;/code&gt; events shaped like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;event:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;content_block_delta&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;data:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"content_block_delta"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"index"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"delta"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"input_json_delta"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"partial_json"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"{&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;cit"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;event:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;content_block_delta&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;data:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"content_block_delta"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"index"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"delta"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"input_json_delta"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"partial_json"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"y&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;Ist"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can't &lt;code&gt;JSON.parse&lt;/code&gt; a partial string. So I accumulate them per block index and only parse at &lt;code&gt;content_block_stop&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;toolUseJson&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{};&lt;/span&gt;

&lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;content_block_start&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;evt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content_block&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;cb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;tool_use&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;blocks&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;evt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;index&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;tool_use&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;cb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;cb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="nx"&gt;toolUseJson&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;evt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;index&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;cb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;text&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;blocks&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;evt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;index&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;text&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;content_block_delta&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;d&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;evt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;text_delta&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;blocks&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;evt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;index&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;TextBlock&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nx"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;input_json_delta&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;toolUseJson&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;evt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;index&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nx"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;partial_json&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;content_block_stop&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;blocks&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;evt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;index&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;tool_use&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;toolUseJson&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;evt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;index&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;{}&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{};&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the entire tool-use accommodation. The UI gets a clean callback when the block completes, with a parsed object as &lt;code&gt;input&lt;/code&gt; ready to render.&lt;/p&gt;

&lt;p&gt;A nice consequence of the per-block accumulation: text deltas can be rendered live — typing animation, caret blink, the whole thing — while &lt;code&gt;tool_use&lt;/code&gt; cards appear only when their input is fully assembled. That feels right. Text is conversational; tool calls are commands.&lt;/p&gt;

&lt;h2&gt;
  
  
  Abort
&lt;/h2&gt;

&lt;p&gt;Don't skip this. A streaming request that the user has clicked Stop on should actually stop, not run to completion in the background:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ac&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;AbortController&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ENDPOINT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;...,&lt;/span&gt; &lt;span class="na"&gt;signal&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ac&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;signal&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="c1"&gt;// later, when the user clicks Stop:&lt;/span&gt;
&lt;span class="nx"&gt;ac&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;abort&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;reader.read()&lt;/code&gt; throws on the next iteration after abort, and &lt;code&gt;signal.aborted&lt;/code&gt; becomes true. Catch it, distinguish it from a real error, and surface a clean "stopped" state:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// ... the read loop ...&lt;/span&gt;
  &lt;span class="nx"&gt;cb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;onDone&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;stopReason&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;signal&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;aborted&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;cb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;onDone&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;stopReason&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;aborted&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="nx"&gt;cb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;onError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;errorMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The user gets the partial response they've already seen plus a "stopped" badge, instead of a generic crash.&lt;/p&gt;

&lt;h2&gt;
  
  
  Errors that mean something
&lt;/h2&gt;

&lt;p&gt;A 401 from the API can mean several things; a 429 can mean several things. The browser hands you a &lt;code&gt;Response&lt;/code&gt; you have to drill into. Parse the body as JSON, look for &lt;code&gt;error.message&lt;/code&gt;, fall back to status-code messages, and present something the user can act on:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;readError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; · &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="cm"&gt;/* fall through */&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;401&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;401 · Invalid API key.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;429&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;429 · Rate limited — wait a moment.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; · Request failed.`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Boring, but the difference between "the app crashed" and "your key is invalid, fix it" is the difference between a tool and a toy.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this gets you
&lt;/h2&gt;

&lt;p&gt;The whole SSE client — request, parsing, tool use, abort, errors — fits in about 150 lines of TypeScript and ships in a browser bundle that is, in my case, around 100 KB gzipped &lt;em&gt;including&lt;/em&gt; React, Tailwind v4, Framer Motion, and the rest. The SDK alone is larger than that.&lt;/p&gt;

&lt;p&gt;The other thing it gets you is honesty. The most interesting part of working with the Claude API is the streaming behaviour — caching turning on, tokens accumulating, tool calls landing one block at a time. Hiding that behind an SDK abstraction means you have to debug the SDK before you can debug your app. With direct &lt;code&gt;fetch&lt;/code&gt;, your client &lt;em&gt;is&lt;/em&gt; the protocol, and when something goes wrong you read the SSE events as they arrive.&lt;/p&gt;

&lt;p&gt;I shipped this approach in &lt;a href="https://claudoscope-labs.vercel.app/?demo=1" rel="noopener noreferrer"&gt;&lt;strong&gt;claudoscope&lt;/strong&gt;&lt;/a&gt;, a browser-only x-ray for Claude API calls. The whole token-economics visualization — cache reads, cache writes, uncached input, output, cost delta — is computed straight from the stream events described above. No SDK, no backend, no server-side proxy.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;src/
  app/page.tsx              orchestration
  lib/anthropic.ts          the ~150-line client from this post
  lib/pricing.ts            tier-aware cost from usage events
  components/XRayPanel.tsx  what makes the data visible
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The same client now powers three sibling tools — &lt;a href="https://agentreplay.vercel.app" rel="noopener noreferrer"&gt;agent-replay&lt;/a&gt;, &lt;a href="https://prompt-lab-promptly.vercel.app" rel="noopener noreferrer"&gt;prompt-lab&lt;/a&gt;, &lt;a href="https://tool-lab-bice.vercel.app" rel="noopener noreferrer"&gt;tool-lab&lt;/a&gt; — without modification. Once the SSE parsing is yours, it composes.&lt;/p&gt;

&lt;p&gt;If you've been waiting to put the Claude API in a browser tool because the SDK fights you: it's about an afternoon's work, and the result is small, debuggable, and yours.&lt;/p&gt;




&lt;p&gt;The four tools, all open-source and BYOK: &lt;a href="https://ferhatatagun.com/tools" rel="noopener noreferrer"&gt;ferhatatagun.com/tools&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Source for the SSE client described here: &lt;a href="https://github.com/ferhatatagun/claudoscope/blob/main/src/lib/anthropic.ts" rel="noopener noreferrer"&gt;github.com/ferhatatagun/claudoscope/blob/main/src/lib/anthropic.ts&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>claude</category>
      <category>anthropic</category>
      <category>typescript</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
