DEV Community

Cover image for I ported the OpenAI Python SDK to Rust in 5 days with Claude Code.
Rust
Rust

Posted on • Originally published at github.com

I ported the OpenAI Python SDK to Rust in 5 days with Claude Code.

I needed a fast OpenAI client for a realtime voice agent project. The official Python SDK is great, but I needed Rust — for WebSocket audio streaming, edge deployment to Cloudflare Workers, and sub-second latency in agentic loops with dozens of tool calls.

So I ported it. 259 commits, 5 days, 100+ API methods. The first day — 120 commits — was mostly Claude Code translating types from Python to Rust while I set up pre-commit hooks, WASM checks, and benchmarks. The rest was architecture decisions, performance tuning, and Node/Python bindings.

The result: openai-oxide — a Rust client that matches the official Python SDK's API surface while being faster and deployable to WASM.

Why Not Just Use What Exists?

The official Python and Node SDKs are solid — they reuse HTTP/2 connections, have WebSocket support for the Realtime API, and cover all endpoints. But they don't compile to WASM, and their WebSocket mode is only for the Realtime API (audio/multimodal), not for regular text-based Responses API calls.

In the Rust ecosystem, you pick async-openai for types or genai for multi-provider support — but no single crate gives you persistent WebSocket sessions for the Responses API, structured outputs with auto-generated schemas, stream helpers, and WASM deployment in one package.

For an agentic loop where the model calls read_file, search_code, edit_file, run_tests in sequence — you want all of this together. That's what we built.

Persistent WebSockets

The biggest win: keep one wss:// connection open for the entire agent cycle.

let mut session = client.ws_session().await?;

// 50 tool calls — zero TLS overhead after the first
for _ in 0..50 {
    let response = session.send(request).await?;
    // execute tool, feed result back
}

session.close().await?;
Enter fullscreen mode Exit fullscreen mode

Benchmark: 10 sequential tool calls complete 40% faster than HTTP REST on the same model.

Structured Outputs Without Boilerplate

Every Rust OpenAI client supports response_format: json_schema. But you have to build the schema by hand:

// Other clients: manual schema construction
let schema = json!({
    "type": "object",
    "properties": {
        "answer": {"type": "string"},
        "confidence": {"type": "number"}
    },
    "required": ["answer", "confidence"],
    "additionalProperties": false
});
Enter fullscreen mode Exit fullscreen mode

With openai-oxide, derive the schema from your types:

#[derive(Deserialize, JsonSchema)]
struct Answer {
    answer: String,
    confidence: f64,
}

let result = client.chat().completions()
    .parse::<Answer>(request).await?;

println!("{}", result.parsed.unwrap().answer);
Enter fullscreen mode Exit fullscreen mode

One derive, both directions — the same #[derive(JsonSchema)] generates response schemas and tool parameter definitions. No manual JSON, no drift between types and schemas.

Zero-Copy SSE Streaming

Time-to-first-token matters for UX. Our SSE parser avoids intermediate allocations and sets anti-buffering headers that prevent reverse proxies from holding back chunks:

Accept: text/event-stream
Cache-Control: no-cache
Enter fullscreen mode Exit fullscreen mode

Without these, Cloudflare and nginx buffer streaming responses, adding 50-200ms to TTFT. With them: 530ms TTFT on gpt-5.4.

Stream Helpers

Raw SSE chunks require manual stitching — tracking content deltas, assembling tool call arguments by index, detecting completion. We provide typed events:

let mut stream = client.chat().completions()
    .create_stream_helper(request).await?;

while let Some(event) = stream.next().await {
    match event? {
        ChatStreamEvent::ContentDelta { delta, snapshot } => {
            print!("{delta}"); // snapshot has full text so far
        }
        ChatStreamEvent::ToolCallDone { name, arguments, .. } => {
            execute_tool(&name, &arguments).await;
        }
        _ => {}
    }
}
Enter fullscreen mode Exit fullscreen mode

Or just get the final result: stream.get_final_completion().await?

WASM Support

The entire client compiles to wasm32-unknown-unknown and runs in Cloudflare Workers:

[dependencies]
openai-oxide = { version = "0.9", default-features = false, features = ["chat", "responses"] }
worker = "0.7"
Enter fullscreen mode Exit fullscreen mode

Streaming, structured outputs, retry logic — all work in WASM. Live demo.

HTTP Optimizations That Nobody Else Does

We checked — neither async-openai nor genai enable these by default:

Optimization Impact
gzip compression ~30% smaller responses
TCP_NODELAY Lower latency (disables Nagle)
HTTP/2 keep-alive (20s ping) Prevents idle connection drops
HTTP/2 adaptive window Auto-tunes flow control
Connection pool (4/host) Better parallel performance

These are all standard reqwest builder options. Source.

Benchmarks

Median of 3 runs, 5 iterations each, gpt-5.4:

Rust (Responses API)

Test openai-oxide async-openai genai
Streaming TTFT 645ms 685ms 670ms
Function calling 1192ms 1748ms 1030ms
WebSocket plain text 710ms N/A N/A

Node.js — oxide wins 8/8

Test openai-oxide official openai
Structured output 1370ms 1765ms +22%
Multi-turn 2283ms 2859ms +20%
Streaming TTFT 534ms 580ms +8%

Python — oxide wins 10/12

Test openai-oxide official openai
Multi-turn 2260ms 3089ms +27%
Prompt-cached 4425ms 5564ms +20%
Plain text 845ms 997ms +15%

Full reproducible benchmarks: cargo run --example benchmark --features responses --release

Drop-in Replacement

For existing codebases — change one import:

Python:

# from openai import AsyncOpenAI
from openai_oxide.compat import AsyncOpenAI

# rest of code unchanged
client = AsyncOpenAI()
r = await client.chat.completions.create(model="gpt-5.4-mini", messages=[...])
Enter fullscreen mode Exit fullscreen mode

Node.js:

// const OpenAI = require('openai');
const { OpenAI } = require('openai-oxide/compat');

// rest of code unchanged
const client = new OpenAI();
Enter fullscreen mode Exit fullscreen mode

How This Was Built

This library started as a need for a fast OpenAI client for my realtime TTS voice agent project. The official Python SDK worked, but I needed Rust-level performance for WebSocket audio streaming and edge deployment.

The entire crate — 100+ API methods, typed streaming, structured outputs, WASM support, Node/Python bindings — was built in a few days using a harness engineering approach with Claude Code and my own toolkit:

  1. Setup: configured pre-commit hooks (tests, clippy, WASM check, secret scan), OpenAPI spec as ground truth, Python SDK source as reference
  2. Planning: used solo-factory skills (/plan, /build) with solograph for code intelligence — MCP server that indexes the codebase and provides semantic search across projects
  3. Building: initial scaffold via Ralph Loop (autonomous agent loop), then manual refinement — architecture decisions, API design, performance tuning
  4. Quality gates: every commit runs 189 tests + clippy + WASM compilation check. Artifacts and docs are auto-generated from code

The key insight: treat the Python SDK as a spec, not as code to port line-by-line. The agent handles mechanical translation (types, error mapping, serialization); you focus on Rust-specific wins (zero-copy, tagged enums, WASM cfg gates). More on the tooling approach in a separate post.

The result: Python SDK parity plus Rust-specific features (zero-copy parsing, WASM, persistent WebSockets) that aren't possible in Python.

Try It

cargo add openai-oxide
Enter fullscreen mode Exit fullscreen mode

Top comments (0)