DEV Community

Sangmin Lee
Sangmin Lee

Posted on • Originally published at claudeguide.io

Claude API + GraphQL Integration Guide (2026)

Originally published at claudeguide.io/claude-api-graphql-integration

Claude API + GraphQL Integration Guide (2026)

To integrate the Claude API with a GraphQL server, add an Apollo Server resolver that calls the Anthropic SDK and returns the completion as a query or mutation result. Define a completeText mutation in your schema, install @anthropic-ai/sdk, and call anthropic.messages.create() inside the resolver. For real-time output, expose a textStream subscription backed by an async iterator that yields delta tokens. The full setup takes under 30 minutes and works with any Apollo Server 4 project.


GraphQL Schema for an AI Completion Endpoint

Start by defining the types your API will expose:

# schema.graphql

type CompletionResult {
  id: String!
  content: String!
  model: String!
  inputTokens: Int!
  outputTokens: Int!
}

type StreamDelta {
  text: String!
  done: Boolean!
}

type Query {
  ping: String!
}

type Mutation {
  completeText(
    prompt: String!
    model: String
    maxTokens: Int
    systemPrompt: String
  ): CompletionResult!
}

type Subscription {
  textStream(
    prompt: String!
    model: String
    systemPrompt: String
  ): StreamDelta!
}
Enter fullscreen mode Exit fullscreen mode

This schema keeps AI concerns isolated. The CompletionResult type mirrors the Anthropic response so clients can log token usage for cost tracking. See Claude API Cost and Prompt Caching Break-Even for how to translate inputTokens and outputTokens into dollar amounts.


Apollo Server Resolver Calling Claude

Install dependencies:

npm install @apollo/server @anthropic-ai/sdk graphql
Enter fullscreen mode Exit fullscreen mode

Wire up the server and resolvers:

// server.js
import { ApolloServer } from "@apollo/server";
import { startStandaloneServer } from "@apollo/server/standalone";
import Anthropic from "@anthropic-ai/sdk";
import { readFileSync } from "fs";

const typeDefs = readFileSync("./schema.graphql", "utf-8");
const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });

const resolvers = {
  Query: {
    ping: () =

---

## Streaming Responses Over GraphQL Subscriptions

GraphQL subscriptions let you push token deltas to the client as Claude generates them. This eliminates the long wait for a full response and enables real-time chat UIs.

Enter fullscreen mode Exit fullscreen mode


js
// subscriptions.js — add to your resolvers
import { PubSub } from "graphql-subscriptions";

const pubsub = new PubSub();

const subscriptionResolvers = {
Subscription: {
textStream: {
subscribe: async function* (_, { prompt, model, systemPrompt }) {
const stream = await anthropic.messages.stream({
model: model ?? "claude-sonnet-4-5",
max_tokens: 2048,
system: systemPrompt ?? "You are a helpful assistant.",
messages: [{ role: "user", content: prompt }],
});

    for await (const event of stream) {
      if (
        event.type === "content_block_delta" &&
        event.delta.type === "text_delta"
      ) {
        yield { textStream: { text: event.delta.text, done: false } };
      }
    }

    yield { textStream: { text: "", done: true } };
  },
},
Enter fullscreen mode Exit fullscreen mode

},
};


For subscriptions to work you need a WebSocket-capable transport. Use `graphql-ws` with Apollo Server:

Enter fullscreen mode Exit fullscreen mode


bash
npm install graphql-ws ws


Enter fullscreen mode Exit fullscreen mode


js
// Update server setup
import { WebSocketServer } from "ws";
import { useServer } from "graphql-ws/lib/use/ws";
import { makeExecutableSchema } from "@graphql-tools/schema";

const schema = makeExecutableSchema({
typeDefs,
resolvers: { ...resolvers, ...subscriptionResolvers },
});

const httpServer = createServer(app);
const wsServer = new WebSocketServer({ server: httpServer, path: "/graphql" });
useServer({ schema }, wsServer);


Client subscription example:

Enter fullscreen mode Exit fullscreen mode


graphql
subscription {
textStream(prompt: "Write a haiku about GraphQL.") {
text
done
}
}


---

## Error Handling Patterns

Claude API errors fall into three categories: rate limits (429), invalid requests (400), and transient server errors (5xx). Handle each explicitly:

Enter fullscreen mode Exit fullscreen mode


js
import Anthropic from "@anthropic-ai/sdk";

async function callClaudeWithRetry(params, maxRetries = 3) {
let lastError;

for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
return await anthropic.messages.create(params);
} catch (err) {
lastError = err;

  // Rate limit — exponential backoff
  if (err instanceof Anthropic.RateLimitError) {
    const delay = Math.pow(2, attempt) * 1000;
    console.warn(`Rate limited. Retrying in ${delay}ms…`);
    await new Promise((r) =
Enter fullscreen mode Exit fullscreen mode

Frequently Asked Questions

Can I use Claude API with GraphQL without Apollo Server?

Yes. The Anthropic SDK is transport-agnostic — any GraphQL server that supports custom resolvers works. Alternatives include Mercurius (Fastify), Yoga (Hapi/Express), and Pothos with any HTTP framework. The schema definition and resolver logic shown above are identical regardless of which GraphQL runtime you use.

How do I handle long-running Claude requests in GraphQL mutations?

Mutations are synchronous by default — the client waits for the response. For prompts that generate more than ~2 000 tokens, switch to a subscription-based streaming pattern so the client receives deltas incrementally. Alternatively, return a job ID from the mutation and poll a separate completionJob(id: ID!) query until the result is ready.

Is prompt caching compatible with GraphQL resolvers?

Yes. Prompt caching is a server-side API feature — it does not affect the GraphQL schema or resolver signature at all. Add cache_control: { type: "ephemeral" } to your system message block in the Anthropic SDK call. The cache lives on Anthropic's infrastructure for five minutes. Repeated resolver calls with the same system prompt will hit the cache and reduce input token costs significantly.

How do I secure a public-facing GraphQL AI endpoint?

Three layers are recommended: (1) Apollo's built-in depth/complexity limits prevent runaway nested queries, (2) rate-limit the resolver by user or IP using a middleware like graphql-rate-limit, and (3) validate the prompt argument length with a custom scalar before it reaches the Anthropic SDK. Never expose your ANTHROPIC_API_KEY to the client — all Anthropic calls must happen server-side.

Can I use DataLoader to batch Claude API calls in GraphQL?

DataLoader batches multiple resolver calls within the same tick into a single network request. The Anthropic Messages API does not accept batched prompts in one call (unlike the Batch API), so DataLoader does not provide the usual N+1 benefit here. For bulk processing, use the Anthropic Batch API directly and return results asynchronously. For the Claude Code complete workflow, see Claude Code Complete Guide.

Top comments (0)