Calling the Anthropic API Directly From the Browser — A 150-Line BYOK Comparison Tool for Opus / Sonnet / Haiku

#javascript #anthropic #ai #webdev

"Will Haiku do, or do I need Opus?" The fastest answer is to fire the same prompt at all three Claude models and see the answers + latency + token counts side by side. The whole tool fits in ~150 lines of browser JavaScript, no server, no proxy. The Anthropic API supports browser-direct calls via an opt-in header (anthropic-dangerous-direct-browser-access: true), and the rest is just fetch and Promise.all.

Here's the design: the CSRF guard the API normally enforces, why the opt-in header has "dangerous" in its name, and the parallel-fetch + per-call error handling pattern.

🤖 Demo: https://sen.ltd/portfolio/prompt-lab/
📦 GitHub: https://github.com/sen-ltd/prompt-lab

Why the API normally rejects browser calls

The Anthropic API rejects requests originating from a browser context as a CSRF guard. A malicious page could otherwise embed a <form action="https://api.anthropic.com/..."> and trick the user's browser into sending requests with attached cookies / Authorization headers, letting third parties exfiltrate API keys.

For BYOK tools — "use my own key in my own browser" — that guard is unnecessary. Anthropic added an opt-in header for exactly this case:

POST /v1/messages HTTP/1.1
Host: api.anthropic.com
x-api-key: sk-ant-...
anthropic-version: 2023-06-01
anthropic-dangerous-direct-browser-access: true
content-type: application/json

The "dangerous" in the header name is intentional: it's a flag that says "you understand this means anyone with browser dev tools can read the API key in the request, right?" In a production app shipping your own key, that would be catastrophic. In a BYOK tool the key belongs to the user, who already controls their own browser, so there's no leakage that they can't see themselves.

A 150-line API client

With the opt-in header in place, the rest is plain Messages API:

const ANTHROPIC_VERSION = "2023-06-01";

export function buildHeaders(apiKey) {
  return {
    "content-type": "application/json",
    "x-api-key": apiKey,
    "anthropic-version": ANTHROPIC_VERSION,
    "anthropic-dangerous-direct-browser-access": "true",
  };
}

export async function callOnce({ apiKey, model, prompt, maxTokens = 1024, fetchFn = globalThis.fetch }) {
  const body = { model, max_tokens: maxTokens, messages: [{ role: "user", content: prompt }] };
  const t0 = Date.now();
  const res = await fetchFn("https://api.anthropic.com/v1/messages", {
    method: "POST",
    headers: buildHeaders(apiKey),
    body: JSON.stringify(body),
  });
  const elapsedMs = Date.now() - t0;
  if (!res.ok) {
    const detail = (await res.json().catch(() => ({}))).error?.message || "";
    throw new Error(`HTTP ${res.status}${detail ? ` — ${detail}` : ""}`);
  }
  const data = await res.json();
  const text = (data.content || []).filter(b => b.type === "text").map(b => b.text).join("");
  return { model, text, elapsedMs, inputTokens: data.usage?.input_tokens, outputTokens: data.usage?.output_tokens };
}

Two design choices worth pointing out:

fetchFn is injected as an argument, defaulting to globalThis.fetch. The tests pass a stub instead, so the whole suite runs under node --test without ever touching the real API.
Multi-block content is filtered to text only. The API returns content: [{type: "text", ...}, {type: "tool_use", ...}, ...] — when tools are involved you'd see non-text blocks. We .filter(b => b.type === "text") then .join("") so the text comes through cleanly regardless.

Errors get a uniform HTTP <status> — <message> shape so the UI can treat any failure (401 invalid key, 429 rate limited, 500+ provider issue) the same way.

Parallel calls with independent failure

The naive Promise.all([call1, call2, call3]) rejects with the first failure, leaving the other two responses orphaned. Better: try/catch per call so each model's outcome is reported independently, and the UI fills in as each settles:

export async function callParallel({ apiKey, models, prompt, onResult, fetchFn }) {
  const tasks = models.map(async (model) => {
    try {
      const value = await callOnce({ apiKey, model, prompt, fetchFn });
      if (onResult) onResult({ model, status: "ok", value });
      return { model, status: "ok", value };
    } catch (err) {
      const error = err?.message || String(err);
      if (onResult) onResult({ model, status: "error", error });
      return { model, status: "error", error };
    }
  });
  return Promise.all(tasks);
}

This is Promise.allSettled-equivalent in spirit but with two extras: the per-task onResult callback fires the moment a model returns (so the UI fills incrementally instead of waiting for the slowest), and the result shape is unified ({model, status, value | error}) so the UI's branching is one line.

The "Opus rate-limited but Sonnet/Haiku are fine" scenario is provable in the test:

const fetchFn = async (_url, opts) => {
  const body = JSON.parse(opts.body);
  if (body.model === "claude-opus-4-7") {
    return { ok: false, status: 429, json: async () => ({ error: { message: "rate" } }) };
  }
  return { ok: true, status: 200, json: async () => ({
    content: [{ type: "text", text: "ok" }],
    usage: { input_tokens: 1, output_tokens: 1 },
  }) };
};

const results = await callParallel({ apiKey: "k", models: MODELS.map(m => m.id), prompt: "hi", fetchFn });
const byModel = Object.fromEntries(results.map(r => [r.model, r]));
assert.equal(byModel["claude-opus-4-7"].status, "error");
assert.match(byModel["claude-opus-4-7"].error, /HTTP 429/);
assert.equal(byModel["claude-sonnet-4-6"].status, "ok");
assert.equal(byModel["claude-haiku-4-5-20251001"].status, "ok");

Verifying it actually parallelises

A subtle bug in Promise.all code is to accidentally await inside the .map() body, serializing the calls. The test catches it by giving each stubbed call a different latency and asserting the wall-clock total:

const fetchFn = async (_url, opts) => {
  const body = JSON.parse(opts.body);
  const delays = {
    "claude-opus-4-7": 50,    // 50ms
    "claude-sonnet-4-6": 30,
    "claude-haiku-4-5-20251001": 10,
  };
  await new Promise(r => setTimeout(r, delays[body.model]));
  return { ok: true, status: 200, json: async () => ({ /* ... */ }) };
};

const t0 = Date.now();
await callParallel({ models: MODELS.map(m => m.id), prompt: "ping", fetchFn });
const elapsed = Date.now() - t0;

assert.ok(elapsed < 80, `expected ~50ms parallel, got ${elapsed}ms`);
// Sequential would be 50 + 30 + 10 = 90 ms; parallel finishes when the slowest does.

3 concurrent fetches to the same origin are well within Chrome's per-origin connection limit (6).

Where the trade-offs show up

The reason a comparison tool is useful at all is that most prompts get an acceptable answer from any model. The differences are:

Dimension	Opus 4.7	Sonnet 4.6	Haiku 4.5
Hardest reasoning, agentic flows	best	good	weak
Tool use, long-spec following	best	best	weak
Bulk-text tasks (summarise, classify, extract)	good	good	best (cost / latency)
Code review / suggestions	best	best	good
Typical latency	3-5 s	1-2 s	< 1 s
Token unit cost (in/out)	high	mid	low

The tool's job is to make the comparison cheap to do — paste prompt, click, see three columns of output with timing/token counts, decide whether the cheaper model gets the answer right. That decision once made saves a recurring monthly bill.

Takeaways

The anthropic-dangerous-direct-browser-access: true header is the API's opt-in for browser-direct usage. The "dangerous" name warns about putting your own key in browser code — for BYOK (user's own key) it's fine.
The whole API client is ~150 lines: buildRequest + buildHeaders + callOnce + callParallel.
Per-task try/catch inside Promise.all gives independent failure (one model failing doesn't kill the others) plus a per-result onResult callback for incremental UI fills.
A stub fetch with deliberately mismatched latencies lets you assert that the parallelism actually parallelises, not just that the calls succeed.
The product use case is "which Claude model is enough for this job?" — making a comparison cheap turns model selection from a meeting into a 5-second click.