DEV Community

kanta13jp1
kanta13jp1

Posted on

Claude API Patterns: Streaming, Tool Use, and Prompt Caching

Claude API Patterns: Streaming, Tool Use, and Prompt Caching

Three implementation patterns for calling the Claude API from a Supabase Edge Function.

1. Streaming Responses

Essential for long-form generation (blog drafts, summaries). Dramatically reduces
perceived latency — users see the first tokens in under a second.

// supabase/functions/ai-assistant/index.ts
import Anthropic from "npm:@anthropic-ai/sdk";

const client = new Anthropic({ apiKey: Deno.env.get("ANTHROPIC_API_KEY") });

Deno.serve(async (req) => {
  const { prompt } = await req.json();

  const stream = await client.messages.stream({
    model: "claude-haiku-4-5",
    max_tokens: 1024,
    messages: [{ role: "user", content: prompt }],
  });

  const encoder = new TextEncoder();
  const readable = new ReadableStream({
    async start(controller) {
      for await (const chunk of stream) {
        if (
          chunk.type === "content_block_delta" &&
          chunk.delta.type === "text_delta"
        ) {
          controller.enqueue(
            encoder.encode(`data: ${chunk.delta.text}\n\n`)
          );
        }
      }
      controller.enqueue(encoder.encode("data: [DONE]\n\n"));
      controller.close();
    },
  });

  return new Response(readable, {
    headers: {
      "Content-Type": "text/event-stream",
      "Cache-Control": "no-cache",
    },
  });
});
Enter fullscreen mode Exit fullscreen mode
// Flutter: receive SSE and render in real time
final client = http.Client();
final request = http.Request('POST', Uri.parse(efUrl));
request.body = jsonEncode({'prompt': userPrompt});
final response = await client.send(request);

response.stream
  .transform(utf8.decoder)
  .transform(const LineSplitter())
  .listen((line) {
    if (line.startsWith('data: ') && line != 'data: [DONE]') {
      setState(() => _output += line.substring(6));
    }
  });
Enter fullscreen mode Exit fullscreen mode

2. Tool Use (Function Calling)

Let the model trigger database lookups, calculations, or API calls.
The foundation for task-management AI, Q&A bots, and agentic features.

const tools: Anthropic.Tool[] = [
  {
    name: "search_tasks",
    description: "Search the user's task list",
    input_schema: {
      type: "object" as const,
      properties: {
        query: { type: "string", description: "Search keyword" },
        status: {
          type: "string",
          enum: ["pending", "done", "all"],
        },
      },
      required: ["query"],
    },
  },
];

const response = await client.messages.create({
  model: "claude-haiku-4-5",
  max_tokens: 1024,
  tools,
  messages: [{ role: "user", content: "What tasks are due today?" }],
});

if (response.stop_reason === "tool_use") {
  const toolUse = response.content.find((b) => b.type === "tool_use");
  if (toolUse && toolUse.type === "tool_use") {
    const results = await searchTasks(toolUse.input as SearchInput);
    const final = await client.messages.create({
      model: "claude-haiku-4-5",
      max_tokens: 1024,
      tools,
      messages: [
        { role: "user", content: "What tasks are due today?" },
        { role: "assistant", content: response.content },
        {
          role: "user",
          content: [
            {
              type: "tool_result",
              tool_use_id: toolUse.id,
              content: JSON.stringify(results),
            },
          ],
        },
      ],
    });
    return final.content[0].type === "text" ? final.content[0].text : "";
  }
}
Enter fullscreen mode Exit fullscreen mode

3. Prompt Caching

When the same system prompt is reused across many requests, caching cuts costs
by up to 88%.

const response = await client.messages.create({
  model: "claude-haiku-4-5",
  max_tokens: 512,
  system: [
    {
      type: "text",
      text: longSystemPrompt,      // FAQ content, docs, etc.
      cache_control: { type: "ephemeral" },
    },
  ],
  messages: [{ role: "user", content: userQuestion }],
});

// cache_read_input_tokens > 0 means cache hit
console.log(response.usage);
Enter fullscreen mode Exit fullscreen mode

Cost comparison (claude-haiku-4-5):

Standard:       $0.25 / 1M input tokens
Cache write:    $0.30 / 1M  (+20%)
Cache read:     $0.03 / 1M  (−88%)
Enter fullscreen mode Exit fullscreen mode

Always enable caching for FAQ bots, CS bots, or any pattern that sends
the same long context repeatedly.

Summary

Streaming     → long-form generation / chat UX (perceived latency −90%)
Tool use      → connect AI to DB, calculations, external APIs
Prompt cache  → same system prompt reused → up to 88% cost reduction
Enter fullscreen mode Exit fullscreen mode

Edge Functions run on Deno. Import npm:@anthropic-ai/sdk directly — no build step.

Top comments (0)