"Will Haiku do, or do I need Opus?" The fastest answer is to fire the same prompt at all three Claude models and see the answers + latency + token counts side by side. The whole tool fits in ~150 lines of browser JavaScript, no server, no proxy. The Anthropic API supports browser-direct calls via an opt-in header (
anthropic-dangerous-direct-browser-access: true), and the rest is justfetchandPromise.all.Here's the design: the CSRF guard the API normally enforces, why the opt-in header has "dangerous" in its name, and the parallel-fetch + per-call error handling pattern.
🤖 Demo: https://sen.ltd/portfolio/prompt-lab/
📦 GitHub: https://github.com/sen-ltd/prompt-lab
Why the API normally rejects browser calls
The Anthropic API rejects requests originating from a browser context as a CSRF guard. A malicious page could otherwise embed a <form action="https://api.anthropic.com/..."> and trick the user's browser into sending requests with attached cookies / Authorization headers, letting third parties exfiltrate API keys.
For BYOK tools — "use my own key in my own browser" — that guard is unnecessary. Anthropic added an opt-in header for exactly this case:
POST /v1/messages HTTP/1.1
Host: api.anthropic.com
x-api-key: sk-ant-...
anthropic-version: 2023-06-01
anthropic-dangerous-direct-browser-access: true
content-type: application/json
The "dangerous" in the header name is intentional: it's a flag that says "you understand this means anyone with browser dev tools can read the API key in the request, right?" In a production app shipping your own key, that would be catastrophic. In a BYOK tool the key belongs to the user, who already controls their own browser, so there's no leakage that they can't see themselves.
A 150-line API client
With the opt-in header in place, the rest is plain Messages API:
const ANTHROPIC_VERSION = "2023-06-01";
export function buildHeaders(apiKey) {
return {
"content-type": "application/json",
"x-api-key": apiKey,
"anthropic-version": ANTHROPIC_VERSION,
"anthropic-dangerous-direct-browser-access": "true",
};
}
export async function callOnce({ apiKey, model, prompt, maxTokens = 1024, fetchFn = globalThis.fetch }) {
const body = { model, max_tokens: maxTokens, messages: [{ role: "user", content: prompt }] };
const t0 = Date.now();
const res = await fetchFn("https://api.anthropic.com/v1/messages", {
method: "POST",
headers: buildHeaders(apiKey),
body: JSON.stringify(body),
});
const elapsedMs = Date.now() - t0;
if (!res.ok) {
const detail = (await res.json().catch(() => ({}))).error?.message || "";
throw new Error(`HTTP ${res.status}${detail ? ` — ${detail}` : ""}`);
}
const data = await res.json();
const text = (data.content || []).filter(b => b.type === "text").map(b => b.text).join("");
return { model, text, elapsedMs, inputTokens: data.usage?.input_tokens, outputTokens: data.usage?.output_tokens };
}
Two design choices worth pointing out:
-
fetchFnis injected as an argument, defaulting toglobalThis.fetch. The tests pass a stub instead, so the whole suite runs undernode --testwithout ever touching the real API. -
Multi-block content is filtered to text only. The API returns
content: [{type: "text", ...}, {type: "tool_use", ...}, ...]— when tools are involved you'd see non-text blocks. We.filter(b => b.type === "text")then.join("")so the text comes through cleanly regardless.
Errors get a uniform HTTP <status> — <message> shape so the UI can treat any failure (401 invalid key, 429 rate limited, 500+ provider issue) the same way.
Parallel calls with independent failure
The naive Promise.all([call1, call2, call3]) rejects with the first failure, leaving the other two responses orphaned. Better: try/catch per call so each model's outcome is reported independently, and the UI fills in as each settles:
export async function callParallel({ apiKey, models, prompt, onResult, fetchFn }) {
const tasks = models.map(async (model) => {
try {
const value = await callOnce({ apiKey, model, prompt, fetchFn });
if (onResult) onResult({ model, status: "ok", value });
return { model, status: "ok", value };
} catch (err) {
const error = err?.message || String(err);
if (onResult) onResult({ model, status: "error", error });
return { model, status: "error", error };
}
});
return Promise.all(tasks);
}
This is Promise.allSettled-equivalent in spirit but with two extras: the per-task onResult callback fires the moment a model returns (so the UI fills incrementally instead of waiting for the slowest), and the result shape is unified ({model, status, value | error}) so the UI's branching is one line.
The "Opus rate-limited but Sonnet/Haiku are fine" scenario is provable in the test:
const fetchFn = async (_url, opts) => {
const body = JSON.parse(opts.body);
if (body.model === "claude-opus-4-7") {
return { ok: false, status: 429, json: async () => ({ error: { message: "rate" } }) };
}
return { ok: true, status: 200, json: async () => ({
content: [{ type: "text", text: "ok" }],
usage: { input_tokens: 1, output_tokens: 1 },
}) };
};
const results = await callParallel({ apiKey: "k", models: MODELS.map(m => m.id), prompt: "hi", fetchFn });
const byModel = Object.fromEntries(results.map(r => [r.model, r]));
assert.equal(byModel["claude-opus-4-7"].status, "error");
assert.match(byModel["claude-opus-4-7"].error, /HTTP 429/);
assert.equal(byModel["claude-sonnet-4-6"].status, "ok");
assert.equal(byModel["claude-haiku-4-5-20251001"].status, "ok");
Verifying it actually parallelises
A subtle bug in Promise.all code is to accidentally await inside the .map() body, serializing the calls. The test catches it by giving each stubbed call a different latency and asserting the wall-clock total:
const fetchFn = async (_url, opts) => {
const body = JSON.parse(opts.body);
const delays = {
"claude-opus-4-7": 50, // 50ms
"claude-sonnet-4-6": 30,
"claude-haiku-4-5-20251001": 10,
};
await new Promise(r => setTimeout(r, delays[body.model]));
return { ok: true, status: 200, json: async () => ({ /* ... */ }) };
};
const t0 = Date.now();
await callParallel({ models: MODELS.map(m => m.id), prompt: "ping", fetchFn });
const elapsed = Date.now() - t0;
assert.ok(elapsed < 80, `expected ~50ms parallel, got ${elapsed}ms`);
// Sequential would be 50 + 30 + 10 = 90 ms; parallel finishes when the slowest does.
3 concurrent fetches to the same origin are well within Chrome's per-origin connection limit (6).
Where the trade-offs show up
The reason a comparison tool is useful at all is that most prompts get an acceptable answer from any model. The differences are:
| Dimension | Opus 4.7 | Sonnet 4.6 | Haiku 4.5 |
|---|---|---|---|
| Hardest reasoning, agentic flows | best | good | weak |
| Tool use, long-spec following | best | best | weak |
| Bulk-text tasks (summarise, classify, extract) | good | good | best (cost / latency) |
| Code review / suggestions | best | best | good |
| Typical latency | 3-5 s | 1-2 s | < 1 s |
| Token unit cost (in/out) | high | mid | low |
The tool's job is to make the comparison cheap to do — paste prompt, click, see three columns of output with timing/token counts, decide whether the cheaper model gets the answer right. That decision once made saves a recurring monthly bill.
Takeaways
- The
anthropic-dangerous-direct-browser-access: trueheader is the API's opt-in for browser-direct usage. The "dangerous" name warns about putting your own key in browser code — for BYOK (user's own key) it's fine. - The whole API client is ~150 lines:
buildRequest+buildHeaders+callOnce+callParallel. -
Per-task
try/catchinsidePromise.allgives independent failure (one model failing doesn't kill the others) plus a per-resultonResultcallback for incremental UI fills. - A stub
fetchwith deliberately mismatched latencies lets you assert that the parallelism actually parallelises, not just that the calls succeed. - The product use case is "which Claude model is enough for this job?" — making a comparison cheap turns model selection from a meeting into a 5-second click.
Full source on GitHub — api.js (~150 lines + 12 stub-fetch tests), script.js (UI). MIT.
Live demo — bring your own Anthropic API key. It stays in localStorage on this origin only; nothing is proxied through any server I run.

Top comments (0)