Originally published at claudeguide.io/claude-api-graphql-integration
Claude API + GraphQL Integration Guide (2026)
To integrate the Claude API with a GraphQL server, add an Apollo Server resolver that calls the Anthropic SDK and returns the completion as a query or mutation result. Define a completeText mutation in your schema, install @anthropic-ai/sdk, and call anthropic.messages.create() inside the resolver. For real-time output, expose a textStream subscription backed by an async iterator that yields delta tokens. The full setup takes under 30 minutes and works with any Apollo Server 4 project.
GraphQL Schema for an AI Completion Endpoint
Start by defining the types your API will expose:
# schema.graphql
type CompletionResult {
id: String!
content: String!
model: String!
inputTokens: Int!
outputTokens: Int!
}
type StreamDelta {
text: String!
done: Boolean!
}
type Query {
ping: String!
}
type Mutation {
completeText(
prompt: String!
model: String
maxTokens: Int
systemPrompt: String
): CompletionResult!
}
type Subscription {
textStream(
prompt: String!
model: String
systemPrompt: String
): StreamDelta!
}
This schema keeps AI concerns isolated. The CompletionResult type mirrors the Anthropic response so clients can log token usage for cost tracking. See Claude API Cost and Prompt Caching Break-Even for how to translate inputTokens and outputTokens into dollar amounts.
Apollo Server Resolver Calling Claude
Install dependencies:
npm install @apollo/server @anthropic-ai/sdk graphql
Wire up the server and resolvers:
// server.js
import { ApolloServer } from "@apollo/server";
import { startStandaloneServer } from "@apollo/server/standalone";
import Anthropic from "@anthropic-ai/sdk";
import { readFileSync } from "fs";
const typeDefs = readFileSync("./schema.graphql", "utf-8");
const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
const resolvers = {
Query: {
ping: () =
---
## Streaming Responses Over GraphQL Subscriptions
GraphQL subscriptions let you push token deltas to the client as Claude generates them. This eliminates the long wait for a full response and enables real-time chat UIs.
js
// subscriptions.js — add to your resolvers
import { PubSub } from "graphql-subscriptions";
const pubsub = new PubSub();
const subscriptionResolvers = {
Subscription: {
textStream: {
subscribe: async function* (_, { prompt, model, systemPrompt }) {
const stream = await anthropic.messages.stream({
model: model ?? "claude-sonnet-4-5",
max_tokens: 2048,
system: systemPrompt ?? "You are a helpful assistant.",
messages: [{ role: "user", content: prompt }],
});
for await (const event of stream) {
if (
event.type === "content_block_delta" &&
event.delta.type === "text_delta"
) {
yield { textStream: { text: event.delta.text, done: false } };
}
}
yield { textStream: { text: "", done: true } };
},
},
},
};
For subscriptions to work you need a WebSocket-capable transport. Use `graphql-ws` with Apollo Server:
bash
npm install graphql-ws ws
js
// Update server setup
import { WebSocketServer } from "ws";
import { useServer } from "graphql-ws/lib/use/ws";
import { makeExecutableSchema } from "@graphql-tools/schema";
const schema = makeExecutableSchema({
typeDefs,
resolvers: { ...resolvers, ...subscriptionResolvers },
});
const httpServer = createServer(app);
const wsServer = new WebSocketServer({ server: httpServer, path: "/graphql" });
useServer({ schema }, wsServer);
Client subscription example:
graphql
subscription {
textStream(prompt: "Write a haiku about GraphQL.") {
text
done
}
}
---
## Error Handling Patterns
Claude API errors fall into three categories: rate limits (429), invalid requests (400), and transient server errors (5xx). Handle each explicitly:
js
import Anthropic from "@anthropic-ai/sdk";
async function callClaudeWithRetry(params, maxRetries = 3) {
let lastError;
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
return await anthropic.messages.create(params);
} catch (err) {
lastError = err;
// Rate limit — exponential backoff
if (err instanceof Anthropic.RateLimitError) {
const delay = Math.pow(2, attempt) * 1000;
console.warn(`Rate limited. Retrying in ${delay}ms…`);
await new Promise((r) =
Frequently Asked Questions
Can I use Claude API with GraphQL without Apollo Server?
Yes. The Anthropic SDK is transport-agnostic — any GraphQL server that supports custom resolvers works. Alternatives include Mercurius (Fastify), Yoga (Hapi/Express), and Pothos with any HTTP framework. The schema definition and resolver logic shown above are identical regardless of which GraphQL runtime you use.
How do I handle long-running Claude requests in GraphQL mutations?
Mutations are synchronous by default — the client waits for the response. For prompts that generate more than ~2 000 tokens, switch to a subscription-based streaming pattern so the client receives deltas incrementally. Alternatively, return a job ID from the mutation and poll a separate completionJob(id: ID!) query until the result is ready.
Is prompt caching compatible with GraphQL resolvers?
Yes. Prompt caching is a server-side API feature — it does not affect the GraphQL schema or resolver signature at all. Add cache_control: { type: "ephemeral" } to your system message block in the Anthropic SDK call. The cache lives on Anthropic's infrastructure for five minutes. Repeated resolver calls with the same system prompt will hit the cache and reduce input token costs significantly.
How do I secure a public-facing GraphQL AI endpoint?
Three layers are recommended: (1) Apollo's built-in depth/complexity limits prevent runaway nested queries, (2) rate-limit the resolver by user or IP using a middleware like graphql-rate-limit, and (3) validate the prompt argument length with a custom scalar before it reaches the Anthropic SDK. Never expose your ANTHROPIC_API_KEY to the client — all Anthropic calls must happen server-side.
Can I use DataLoader to batch Claude API calls in GraphQL?
DataLoader batches multiple resolver calls within the same tick into a single network request. The Anthropic Messages API does not accept batched prompts in one call (unlike the Batch API), so DataLoader does not provide the usual N+1 benefit here. For bulk processing, use the Anthropic Batch API directly and return results asynchronously. For the Claude Code complete workflow, see Claude Code Complete Guide.
Top comments (0)