How I built a shared Claude Haiku client with system-prompt caching for batch ETL

#claude #anthropic #typescript #showdev

One of the quieter architectural decisions in this project is how the three directory sites share a single Claude Haiku client. It lives in packages/shared/src/claude/index.ts and every ETL job — model summaries, game recommendations, open-source comparisons — goes through it. The interesting part isn't the singleton itself. It's the prompt caching setup and the failure path.

Why a shared library at all

Each app has its own ETL folder with its own generate-content.ts. But they all need the same two things: a consistent way to call claude-haiku-4-5-20251001, and a consistent way to handle the case where the API key isn't present (local dev, CI runs that don't need content).

Copying a new Anthropic({ apiKey }) call into three places would work, but it also means three places to update model names, three places to handle response errors, and three places where the caching setup could drift. So I extracted it.

The shared function signature is simple on purpose:

export async function generate(opts: GenerateOptions): Promise<GenerateResult> {

GenerateOptions has five fields: systemPrompt, userPrompt, model, maxTokens, and cacheSystem. The caller decides whether to cache. The library handles the mechanics.

The cacheSystem pattern

Claude's prompt caching works by marking message blocks with cache_control: { type: "ephemeral" }. When you send the same cached block in subsequent requests within a session (or across requests within the 5-minute TTL), Anthropic charges you the cached-read rate instead of the full input rate.

For batch generation — 100 models in a loop, all sharing the same system prompt — this can cut input token costs significantly. I don't have 30-day cost data yet, so I won't throw out a percentage. What I can say is that the implementation itself is two lines:

const systemBlock = opts.cacheSystem
  ? [{ type: "text" as const, text: opts.systemPrompt, cache_control: { type: "ephemeral" as const } }]
  : opts.systemPrompt;

When cacheSystem is false, system gets a plain string and the Anthropic SDK handles it normally. When it's true, the system gets a single-element array with the cache marker. The rest of the messages.create call is identical.

The callers that loop over many entries — generate-content.ts and compare.ts — both pass cacheSystem: true. One-off calls or places where the system prompt varies per request wouldn't benefit, so they'd pass false or omit it.

One thing I haven't wired up yet: logging cache_creation_input_tokens and cache_read_input_tokens from res.usage. Those numbers come back in every response and would let me see the actual hit rate. Right now I surface the usage object in GenerateResult but nothing downstream reads it. That's on the list.

Defensive JSON parsing

Every content-generation prompt asks Claude to return a JSON object with specific keys. This works well most of the time. It doesn't work 100% of the time.

The Haiku model occasionally wraps the JSON in markdown code fences, adds a leading explanation sentence, or — rarely — returns structurally valid JSON that's missing one of the expected keys. So I wrote parseOrFallback:

function parseOrFallback(text: string, fb: GeneratedContent): GeneratedContent {
  try {
    const jsonMatch = text.match(/\{[\s\S]*\}/);
    if (!jsonMatch) return fb;
    const parsed = JSON.parse(jsonMatch[0]);
    return {
      summary: parsed.summary ?? fb.summary,
      use_cases: Array.isArray(parsed.use_cases) ? parsed.use_cases : fb.use_cases,
      pros: Array.isArray(parsed.pros) ? parsed.pros : fb.pros,
      cons: Array.isArray(parsed.cons) ? parsed.cons : fb.cons,
    };
  } catch {
    return fb;
  }
}

The regex \{[\s\S]*\} extracts the first JSON object from the response text, ignoring any surrounding prose or fence markers. Then each field is validated individually: if pros is present but isn't an array, I use the fallback value for that field specifically rather than discarding the whole response.

The fallback content (fallbackContent()) generates generic but syntactically valid entries using the model name and pipeline tag. These get stored in the database with model_used = 'fallback-template' so I can query later and re-generate them when the API key is available or when the prompt improves.

No API key, no crash

Local development and CI jobs that don't touch content generation don't have ANTHROPIC_API_KEY set. The ETL scripts detect this with !!process.env.ANTHROPIC_API_KEY and route all rows to the fallback path instead of attempting the API call. The database still gets populated. The build still runs. No keys need to be mocked or stubbed.

This means a fresh clone of the repo can run pnpm etl and get a working site — it'll just have generic entry text instead of generated content. For prototyping new page layouts or testing the Turso connection, that's exactly the right trade-off.

The model_used column made this pattern easy to operationalize. After the first real content-generation run, I could query:

SELECT model_used, COUNT(*) FROM model_content GROUP BY model_used;

and immediately see how many entries were actually generated versus fell back. That count drives whether to re-run with a higher ETL_LIMIT.

What I'd do differently

The biggest gap is the missing usage logging. I return res.usage in GenerateResult, but none of the calling scripts do anything with it. Adding a single console.log of cache_read_input_tokens per batch run would give me real data to show. I'll fix that before the month-1 cost report.

I'd also consider adding a generateBatch function that accepts an array of user prompts and fires them sequentially in a single function call. Right now generate-content.ts loops externally. Moving the loop inside would make it easier to add rate limiting and retry logic in one place rather than per-caller.

The singleton Anthropic client is fine for the current scale — one ETL run at a time, no concurrency. If I ever run two ETL scripts in the same process simultaneously, the singleton is still safe because the Anthropic SDK is stateless between requests. But it's worth keeping in mind.

Finally: the system prompts are hardcoded strings inside each generate-content.ts. If I were building this again, I'd probably move them to a prompts/ directory as plain .txt files. Easier to diff, easier for non-TypeScript people to read, easier to add versioning later.

The pattern works well for what it is — a lightweight, shared abstraction that keeps caching consistent and keeps local dev functional. Month 1 numbers will tell whether the caching is actually saving meaningful cost, but the defensive parsing has already prevented at least a few bad database rows on the first content run.

Part of an ongoing 6-month experiment running three AI-curated directory sites. The technical claims here are real; this article was AI-assisted.