DEV Community

SEN LLC
SEN LLC

Posted on

Calling the Anthropic API Directly From the Browser — A 150-Line BYOK Comparison Tool for Opus / Sonnet / Haiku

"Will Haiku do, or do I need Opus?" The fastest answer is to fire the same prompt at all three Claude models and see the answers + latency + token counts side by side. The whole tool fits in ~150 lines of browser JavaScript, no server, no proxy. The Anthropic API supports browser-direct calls via an opt-in header (anthropic-dangerous-direct-browser-access: true), and the rest is just fetch and Promise.all.

Here's the design: the CSRF guard the API normally enforces, why the opt-in header has "dangerous" in its name, and the parallel-fetch + per-call error handling pattern.

prompt-lab UI: API key field on top, prompt textarea, three side-by-side columns showing Opus 4.7 / Sonnet 4.6 / Haiku 4.5 responses to

🤖 Demo: https://sen.ltd/portfolio/prompt-lab/
📦 GitHub: https://github.com/sen-ltd/prompt-lab

Why the API normally rejects browser calls

The Anthropic API rejects requests originating from a browser context as a CSRF guard. A malicious page could otherwise embed a <form action="https://api.anthropic.com/..."> and trick the user's browser into sending requests with attached cookies / Authorization headers, letting third parties exfiltrate API keys.

For BYOK tools — "use my own key in my own browser" — that guard is unnecessary. Anthropic added an opt-in header for exactly this case:

POST /v1/messages HTTP/1.1
Host: api.anthropic.com
x-api-key: sk-ant-...
anthropic-version: 2023-06-01
anthropic-dangerous-direct-browser-access: true
content-type: application/json
Enter fullscreen mode Exit fullscreen mode

The "dangerous" in the header name is intentional: it's a flag that says "you understand this means anyone with browser dev tools can read the API key in the request, right?" In a production app shipping your own key, that would be catastrophic. In a BYOK tool the key belongs to the user, who already controls their own browser, so there's no leakage that they can't see themselves.

A 150-line API client

With the opt-in header in place, the rest is plain Messages API:

const ANTHROPIC_VERSION = "2023-06-01";

export function buildHeaders(apiKey) {
  return {
    "content-type": "application/json",
    "x-api-key": apiKey,
    "anthropic-version": ANTHROPIC_VERSION,
    "anthropic-dangerous-direct-browser-access": "true",
  };
}

export async function callOnce({ apiKey, model, prompt, maxTokens = 1024, fetchFn = globalThis.fetch }) {
  const body = { model, max_tokens: maxTokens, messages: [{ role: "user", content: prompt }] };
  const t0 = Date.now();
  const res = await fetchFn("https://api.anthropic.com/v1/messages", {
    method: "POST",
    headers: buildHeaders(apiKey),
    body: JSON.stringify(body),
  });
  const elapsedMs = Date.now() - t0;
  if (!res.ok) {
    const detail = (await res.json().catch(() => ({}))).error?.message || "";
    throw new Error(`HTTP ${res.status}${detail ? ` — ${detail}` : ""}`);
  }
  const data = await res.json();
  const text = (data.content || []).filter(b => b.type === "text").map(b => b.text).join("");
  return { model, text, elapsedMs, inputTokens: data.usage?.input_tokens, outputTokens: data.usage?.output_tokens };
}
Enter fullscreen mode Exit fullscreen mode

Two design choices worth pointing out:

  • fetchFn is injected as an argument, defaulting to globalThis.fetch. The tests pass a stub instead, so the whole suite runs under node --test without ever touching the real API.
  • Multi-block content is filtered to text only. The API returns content: [{type: "text", ...}, {type: "tool_use", ...}, ...] — when tools are involved you'd see non-text blocks. We .filter(b => b.type === "text") then .join("") so the text comes through cleanly regardless.

Errors get a uniform HTTP <status> — <message> shape so the UI can treat any failure (401 invalid key, 429 rate limited, 500+ provider issue) the same way.

Parallel calls with independent failure

The naive Promise.all([call1, call2, call3]) rejects with the first failure, leaving the other two responses orphaned. Better: try/catch per call so each model's outcome is reported independently, and the UI fills in as each settles:

export async function callParallel({ apiKey, models, prompt, onResult, fetchFn }) {
  const tasks = models.map(async (model) => {
    try {
      const value = await callOnce({ apiKey, model, prompt, fetchFn });
      if (onResult) onResult({ model, status: "ok", value });
      return { model, status: "ok", value };
    } catch (err) {
      const error = err?.message || String(err);
      if (onResult) onResult({ model, status: "error", error });
      return { model, status: "error", error };
    }
  });
  return Promise.all(tasks);
}
Enter fullscreen mode Exit fullscreen mode

This is Promise.allSettled-equivalent in spirit but with two extras: the per-task onResult callback fires the moment a model returns (so the UI fills incrementally instead of waiting for the slowest), and the result shape is unified ({model, status, value | error}) so the UI's branching is one line.

The "Opus rate-limited but Sonnet/Haiku are fine" scenario is provable in the test:

const fetchFn = async (_url, opts) => {
  const body = JSON.parse(opts.body);
  if (body.model === "claude-opus-4-7") {
    return { ok: false, status: 429, json: async () => ({ error: { message: "rate" } }) };
  }
  return { ok: true, status: 200, json: async () => ({
    content: [{ type: "text", text: "ok" }],
    usage: { input_tokens: 1, output_tokens: 1 },
  }) };
};

const results = await callParallel({ apiKey: "k", models: MODELS.map(m => m.id), prompt: "hi", fetchFn });
const byModel = Object.fromEntries(results.map(r => [r.model, r]));
assert.equal(byModel["claude-opus-4-7"].status, "error");
assert.match(byModel["claude-opus-4-7"].error, /HTTP 429/);
assert.equal(byModel["claude-sonnet-4-6"].status, "ok");
assert.equal(byModel["claude-haiku-4-5-20251001"].status, "ok");
Enter fullscreen mode Exit fullscreen mode

Verifying it actually parallelises

A subtle bug in Promise.all code is to accidentally await inside the .map() body, serializing the calls. The test catches it by giving each stubbed call a different latency and asserting the wall-clock total:

const fetchFn = async (_url, opts) => {
  const body = JSON.parse(opts.body);
  const delays = {
    "claude-opus-4-7": 50,    // 50ms
    "claude-sonnet-4-6": 30,
    "claude-haiku-4-5-20251001": 10,
  };
  await new Promise(r => setTimeout(r, delays[body.model]));
  return { ok: true, status: 200, json: async () => ({ /* ... */ }) };
};

const t0 = Date.now();
await callParallel({ models: MODELS.map(m => m.id), prompt: "ping", fetchFn });
const elapsed = Date.now() - t0;

assert.ok(elapsed < 80, `expected ~50ms parallel, got ${elapsed}ms`);
// Sequential would be 50 + 30 + 10 = 90 ms; parallel finishes when the slowest does.
Enter fullscreen mode Exit fullscreen mode

3 concurrent fetches to the same origin are well within Chrome's per-origin connection limit (6).

Where the trade-offs show up

The reason a comparison tool is useful at all is that most prompts get an acceptable answer from any model. The differences are:

Dimension Opus 4.7 Sonnet 4.6 Haiku 4.5
Hardest reasoning, agentic flows best good weak
Tool use, long-spec following best best weak
Bulk-text tasks (summarise, classify, extract) good good best (cost / latency)
Code review / suggestions best best good
Typical latency 3-5 s 1-2 s < 1 s
Token unit cost (in/out) high mid low

The tool's job is to make the comparison cheap to do — paste prompt, click, see three columns of output with timing/token counts, decide whether the cheaper model gets the answer right. That decision once made saves a recurring monthly bill.

Takeaways

  • The anthropic-dangerous-direct-browser-access: true header is the API's opt-in for browser-direct usage. The "dangerous" name warns about putting your own key in browser code — for BYOK (user's own key) it's fine.
  • The whole API client is ~150 lines: buildRequest + buildHeaders + callOnce + callParallel.
  • Per-task try/catch inside Promise.all gives independent failure (one model failing doesn't kill the others) plus a per-result onResult callback for incremental UI fills.
  • A stub fetch with deliberately mismatched latencies lets you assert that the parallelism actually parallelises, not just that the calls succeed.
  • The product use case is "which Claude model is enough for this job?" — making a comparison cheap turns model selection from a meeting into a 5-second click.

Full source on GitHubapi.js (~150 lines + 12 stub-fetch tests), script.js (UI). MIT.

Live demo — bring your own Anthropic API key. It stays in localStorage on this origin only; nothing is proxied through any server I run.

Top comments (0)