My AI platform worked perfectly. Lambda tools, ECS agents, streaming responses. But integrating it into frontend applications? Pure agony.
Raw HTTP calls scattered everywhere. Manual JSON parsing breaking on edge cases. Zero type safety, so temperature: "0.5" passed silently instead of temperature: 0.5. Error handling that assumed network calls never fail.
Three developers tried to integrate the platform. All three gave up on streaming because parsing Server-Sent Events manually is a nightmare.
I needed an SDK that followed one rule: If your SDK needs documentation beyond IntelliSense, your SDK is wrong.
Why Build an SDK at All?
Working with AI APIs through raw HTTP is painful for several reasons:
Type Safety: Without types, developers make mistakes. They send temperature: "0.5" instead of temperature: 0.5 and wonder why their responses are weird.
Streaming: Server-Sent Events are a nightmare to implement correctly. I've seen developers give up on streaming entirely rather than deal with parsing SSE chunks.
Error Handling: AI APIs fail in creative ways. Network timeouts, rate limits, model overloads, context length exceeded - each needs different handling strategies.
Authentication: Managing API keys, rotating tokens, handling BYOK (Bring Your Own Key) scenarios.
Discoverability: Without good IntelliSense, developers resort to copy-pasting from docs and hope for the best.
I've used dozens of API SDKs over the years. The good ones feel invisible - you just write code and it works. The bad ones require constant trips to documentation.
Contract-First Development with OpenAPI
The first decision: generate everything from OpenAPI specs instead of hand-writing types. I learned this lesson the hard way when our API evolved and the SDK fell behind.
Here's my OpenAPI spec for the core completion endpoint:
/v1/complete:
post:
summary: Generate AI completion
requestBody:
required: true
content:
application/json:
schema:
type: object
properties:
messages:
type: array
items:
type: object
properties:
role:
type: string
enum: [system, user, assistant]
content:
type: string
provider:
type: string
enum: [openai, anthropic, bedrock]
model:
type: string
temperature:
type: number
minimum: 0
maximum: 2
stream:
type: boolean
default: false
responses:
200:
description: Completion response
content:
application/json:
schema:
$ref: '#/components/schemas/CompletionResponse'
text/event-stream:
schema:
type: string
I use @apidevtools/swagger-parser and custom templates to generate TypeScript interfaces:
npx swagger-codegen generate \
-i openapi.yaml \
-g typescript-fetch \
-o src/generated \
--additional-properties=typescriptThreePlus=true
This generates perfect TypeScript types that stay in sync with my API automatically.
SDK Design Philosophy: Options Over Builders
I had to choose between two common patterns:
Builder Pattern:
const response = await client
.complete()
.withModel('gpt-4')
.withTemperature(0.7)
.withStream(true)
.execute();
Options Object:
const response = await client.complete({
model: 'gpt-4',
temperature: 0.7,
stream: true
});
I chose options objects for several reasons:
- Destructuring support - you can spread configuration objects
- Conditional parameters - easier to build options dynamically
- Less cognitive overhead - one method call instead of a chain
- Better TypeScript inference - the compiler can validate the entire options object at once
Here's the core client interface:
import type { CompletionRequest, CompletionResponse, EmbeddingRequest, EmbeddingResponse } from './types';
export interface AIGatewayOptions {
baseUrl: string;
apiKey?: string;
timeout?: number;
}
export class AIGateway {
private client: ReturnType<typeof createClient<paths>>;
private baseUrl: string;
private apiKey?: string;
private timeout: number;
constructor(options: AIGatewayOptions) {
this.baseUrl = options.baseUrl.replace(/\/$/, '');
this.apiKey = options.apiKey;
this.timeout = options.timeout || 30000;
this.client = createClient<paths>({
baseUrl: this.baseUrl,
headers: {
'Content-Type': 'application/json',
...(this.apiKey ? { 'X-API-Key': this.apiKey } : {})
}
});
}
async complete(req: CompletionRequest): Promise<CompletionResponse> {
const { data, error } = await this.client.POST('/v1/complete', {
body: { ...req, stream: false },
signal: AbortSignal.timeout(this.timeout)
});
if (error) throw new Error(`Gateway error: ${JSON.stringify(error)}`);
return data as CompletionResponse;
}
async *stream(req: Omit<CompletionRequest, 'stream'>): AsyncGenerator<string> {
const headers: Record<string, string> = {
'Content-Type': 'application/json'
};
if (this.apiKey) {
headers['X-API-Key'] = this.apiKey;
}
const response = await fetch(`${this.baseUrl}/v1/complete`, {
method: 'POST',
headers,
body: JSON.stringify({ ...req, stream: true })
});
if (!response.ok) {
const err = await response.json().catch(() => ({ error: response.statusText }));
throw new Error(`Gateway error ${response.status}: ${err.error}`);
}
yield* parseSSEStream(response);
}
async embed(req: EmbeddingRequest): Promise<EmbeddingResponse> {
const { data, error } = await this.client.POST('/v1/embed', {
body: req,
signal: AbortSignal.timeout(this.timeout)
});
if (error) throw new Error(`Gateway error: ${JSON.stringify(error)}`);
return data as EmbeddingResponse;
}
}
export interface CompletionRequest {
messages: Array<{ role: 'system' | 'user' | 'assistant'; content: string }>;
model?: string;
temperature?: number;
maxTokens?: number;
provider?: 'openai' | 'anthropic' | 'bedrock';
apiKey?: string; // BYOK support
}
export interface CompletionResponse {
id: string;
content: string;
provider: string;
model: string;
usage: {
promptTokens: number;
completionTokens: number;
totalTokens: number;
};
metadata: {
latency: number;
cost?: number;
region: string;
};
}
Streaming That Actually Works
Streaming AI responses is hard to get right. I've seen developers give up because they couldn't parse Server-Sent Events correctly. My SDK handles all the complexity:
// streaming.ts - Server-Sent Event parsing
export async function* parseSSEStream(response: Response): AsyncGenerator<string> {
const reader = response.body?.getReader();
if (!reader) {
throw new Error('Response body is not readable');
}
const decoder = new TextDecoder();
let buffer = '';
try {
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split('\n');
buffer = lines.pop() || '';
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = line.slice(6).trim();
if (data === '[DONE]') {
return;
}
try {
const parsed = JSON.parse(data);
if (parsed.content) {
yield parsed.content;
}
} catch {
// Skip malformed JSON chunks
continue;
}
}
}
}
} finally {
reader.releaseLock();
}
}
// Usage in AIGateway class
async *stream(req: Omit<CompletionRequest, 'stream'>): AsyncGenerator<string> {
const headers: Record<string, string> = {
'Content-Type': 'application/json'
};
if (this.apiKey) {
headers['X-API-Key'] = this.apiKey;
}
const response = await fetch(`${this.baseUrl}/v1/complete`, {
method: 'POST',
headers,
body: JSON.stringify({ ...req, stream: true })
});
if (!response.ok) {
const err = await response.json().catch(() => ({ error: response.statusText }));
throw new Error(`Gateway error ${response.status}: ${err.error}`);
}
yield* parseSSEStream(response);
}
Usage becomes trivial:
// Stream a completion
for await (const chunk of client.stream({
messages: [{ role: 'user', content: 'Write a story about...' }],
model: 'gpt-4'
})) {
process.stdout.write(chunk.content);
}
Error Handling That Makes Sense
AI APIs fail in predictable ways. Instead of generic HTTP errors, I provide specific error types with actionable information:
export abstract class AIGatewayError extends Error {
abstract readonly code: string;
abstract readonly retryable: boolean;
constructor(
message: string,
public readonly statusCode?: number,
public readonly details?: Record<string, unknown>
) {
super(message);
this.name = this.constructor.name;
}
}
export class RateLimitError extends AIGatewayError {
readonly code = 'RATE_LIMIT_EXCEEDED';
readonly retryable = true;
constructor(
public readonly retryAfter: number,
details?: Record<string, unknown>
) {
super(`Rate limit exceeded. Retry after ${retryAfter} seconds.`, 429, details);
}
}
export class ContextLengthError extends AIGatewayError {
readonly code = 'CONTEXT_LENGTH_EXCEEDED';
readonly retryable = false;
constructor(
public readonly maxTokens: number,
public readonly actualTokens: number
) {
super(`Context length exceeded: ${actualTokens} > ${maxTokens} tokens`, 400);
}
}
export class ModelUnavailableError extends AIGatewayError {
readonly code = 'MODEL_UNAVAILABLE';
readonly retryable = true;
constructor(public readonly model: string) {
super(`Model ${model} is currently unavailable`, 503);
}
}
export class BudgetExceededError extends AIGatewayError {
readonly code = 'BUDGET_EXCEEDED';
readonly retryable = false;
constructor(
public readonly currentSpend: number,
public readonly limit: number
) {
super(`Monthly budget exceeded: $${currentSpend} > $${limit}`, 402);
}
}
The HTTP client automatically converts API errors to typed exceptions:
private async handleResponse<T>(response: AxiosResponse): Promise<T> {
if (response.status >= 400) {
const error = response.data;
switch (error.code) {
case 'rate_limit_exceeded':
throw new RateLimitError(error.retry_after, error);
case 'context_length_exceeded':
throw new ContextLengthError(error.max_tokens, error.actual_tokens);
case 'model_unavailable':
throw new ModelUnavailableError(error.model);
default:
throw new AIPlatformError(error.message, response.status, error);
}
}
return response.data;
}
Retry Logic with Exponential Backoff
Retries are built into the SDK with sensible defaults:
export class RetryHandler {
constructor(
private maxRetries: number = 3,
private baseDelay: number = 1000,
private maxDelay: number = 30000
) {}
async execute<T>(fn: () => Promise<T>): Promise<T> {
let lastError: Error;
for (let attempt = 0; attempt <= this.maxRetries; attempt++) {
try {
return await fn();
} catch (error) {
lastError = error;
if (error instanceof AIPlatformError && !error.retryable) {
throw error; // Don't retry non-retryable errors
}
if (attempt === this.maxRetries) {
throw error; // Last attempt failed
}
const delay = Math.min(
this.baseDelay * Math.pow(2, attempt),
this.maxDelay
);
if (error instanceof RateLimitError) {
// Respect the API's rate limit guidance
await this.sleep(error.retryAfter * 1000);
} else {
await this.sleep(delay);
}
}
}
throw lastError;
}
private sleep(ms: number): Promise<void> {
return new Promise(resolve => setTimeout(resolve, ms));
}
}
BYOK: Bring Your Own Key
Many users want to use their own API keys for cost control or compliance. The SDK supports this seamlessly:
// Global API key (uses platform credits)
const client = new AIPlatformClient({
apiKey: 'your-platform-key'
});
// Per-request API key (BYOK)
await client.complete({
messages: [{ role: 'user', content: 'Hello' }],
provider: 'openai',
apiKey: 'sk-user-openai-key' // Use user's own OpenAI key
});
// Environment-based keys
const client = new AIPlatformClient({
apiKeys: {
openai: process.env.OPENAI_API_KEY,
anthropic: process.env.ANTHROPIC_API_KEY
}
});
The platform routes BYOK requests directly to the provider, so users get:
- Their own rate limits
- Direct billing relationship
- Full control over their keys
- Same SDK interface
Testing with Mock Providers
Testing AI applications is hard because real API calls are slow and expensive. I built a mock provider for unit tests:
export class MockProvider implements Provider {
private responses: Map<string, any> = new Map();
setMockResponse(input: string, response: CompletionResponse): void {
this.responses.set(input, response);
}
async complete(options: CompleteOptions): Promise<CompletionResponse> {
const key = this.hashInput(options.messages);
const mockResponse = this.responses.get(key);
if (!mockResponse) {
throw new Error(`No mock response configured for input: ${key}`);
}
// Simulate API latency
await new Promise(resolve => setTimeout(resolve, 100));
return {
...mockResponse,
metadata: {
...mockResponse.metadata,
latency: 100
}
};
}
}
// In tests
const mockProvider = new MockProvider();
mockProvider.setMockResponse('Hello', {
id: 'test-123',
content: 'Hello! How can I help you today?',
provider: 'mock',
model: 'test-model',
usage: { promptTokens: 1, completionTokens: 8, totalTokens: 9 },
metadata: { latency: 100, region: 'test' }
});
const client = new AIPlatformClient({ provider: mockProvider });
Real-World Usage Examples
Here's how developers actually use the SDK in our applications:
Chat Application:
import { AIPlatformClient } from '@ai-platform/sdk';
const client = new AIPlatformClient({
apiKey: process.env.AI_PLATFORM_KEY
});
export async function getChatResponse(messages: Message[]): Promise<string> {
try {
const response = await client.complete({
messages,
model: 'gpt-4',
temperature: 0.7
});
return response.content;
} catch (error) {
if (error instanceof ContextLengthError) {
// Truncate conversation history and retry
const truncated = messages.slice(-10);
return getChatResponse(truncated);
}
throw error;
}
}
Streaming Chat:
export async function* streamChatResponse(
messages: Message[]
): AsyncGenerator<string, void, unknown> {
try {
for await (const chunk of client.stream({
messages,
model: 'gpt-4',
temperature: 0.7
})) {
yield chunk.content;
}
} catch (error) {
if (error instanceof RateLimitError) {
yield `Rate limit exceeded. Retrying in ${error.retryAfter} seconds...`;
await new Promise(resolve => setTimeout(resolve, error.retryAfter * 1000));
yield* streamChatResponse(messages);
} else {
throw error;
}
}
}
Agent Workflows:
export async function researchTopic(topic: string): Promise<ResearchResult> {
const response = await client.agent.run({
type: 'research',
input: { topic },
tools: ['search', 'summarize', 'extract'],
humanApproval: true
});
return {
summary: response.summary,
sources: response.sources,
confidence: response.metadata.confidence
};
}
Bundle Size and Tree Shaking
Modern applications care about bundle size. The SDK is designed for optimal tree shaking:
// Only import what you need
import { complete } from '@ai-platform/sdk/complete';
import { embed } from '@ai-platform/sdk/embed';
// Full client (if you need everything)
import { AIPlatformClient } from '@ai-platform/sdk';
The package exports are configured for maximum tree shaking:
{
"exports": {
".": {
"import": "./dist/index.esm.js",
"require": "./dist/index.cjs.js"
},
"./complete": {
"import": "./dist/complete.esm.js",
"require": "./dist/complete.cjs.js"
},
"./embed": {
"import": "./dist/embed.esm.js",
"require": "./dist/embed.cjs.js"
}
}
}
Documentation That Developers Actually Read
I follow a simple documentation principle: Examples first, reference second.
Every method has a practical example in the JSDoc:
/**
* Generate a text completion using AI models
*
* @example
* ```
typescript
* const response = await client.complete({
* messages: [{ role: 'user', content: 'Write a haiku about TypeScript' }],
* model: 'gpt-4',
* temperature: 0.8
* });
*
* console.log(response.content); // AI-generated haiku
*
- @example BYOK (Bring Your Own Key)
- ``` typescript
- const response = await client.complete({
- messages: [{ role: 'user', content: 'Hello' }],
- provider: 'openai',
- apiKey: 'sk-your-openai-key' // Use your own key
- });
*
*/ async complete(options: CompleteOptions): Promise<CompletionResponse>
The Results
After 6 months with the SDK in production:
Developer Experience Metrics:
- Integration time: 2 hours down to 15 minutes
- Support tickets: 60% reduction
- Bug reports related to API usage: 85% reduction
Adoption:
- 15 internal applications using the SDK
- 3 external partners building on the platform
- 95% of new integrations use the SDK vs raw HTTP
Performance:
- Bundle size: 45KB gzipped (with tree shaking)
- Streaming latency overhead: <5ms
- Error recovery success rate: 92%
The SDK transformed our AI platform from infrastructure to product. Developers don't think about HTTP calls, error handling, or streaming complexity anymore. They just write business logic.
What's Next
The complete SDK code is available at:
- SDK package: ai-platform-aws/packages/sdk
- Streaming example: 05-streaming-chat
Part 7 covers the production nightmares: cost tracking, authentication, security. Great developer tools mean nothing if they bankrupt your company.
Part 6 of 8 in "Building an AI Platform on AWS from Scratch". Everything I learned building production AI infrastructure - including the expensive mistakes.
Top comments (0)