Last week I was stitching together Serper for web + Tavily for synthesis + a YouTube transcript API just to answer one chat question. Every category was a new API contract, a new auth flow, a new JSON shape my agent had to know about.
I almost wrote a wrapper SDK. Then I realized I didn't need to — the openai SDK can already do all of this. You just point it at a different base_url.
Here's the trick
from openai import OpenAI
client = OpenAI(
api_key="pxs_...", # pixserp key, not OpenAI
base_url="https://pixserp.com/api/v1", # one URL change
)
r = client.chat.completions.create(
model="pixserp-standard",
messages=[{"role": "user", "content": "EU AI Act enforcement timeline 2026"}],
)
print(r.choices[0].message.content)
print(r.choices[0].message.citations)
That's it. The openai SDK doesn't know it's talking to a search backend. It thinks it's calling chat.completions.create. The server does the routing — runs the searches, fetches pages, synthesizes a cited answer — and returns a standard OpenAI assistant message.
Inline [1], [2] markers in content. Structured message.citations array with one entry per source. Drop into your existing typed code, render in your existing UI.
Why this works at all
The openai SDK doesn't validate response shape strictly — it expects the OpenAI wire format and serializes whatever comes back into a ChatCompletion object. As long as the server returns:
choices[*].message.role = "assistant"-
choices[*].message.content(the cited text) - Standard
usageblock - Optional extra fields (citations, cost) accessible via
model_extra
…it just works. Same SSE streaming for stream=True. Same response_format: { type: "json_schema" } for structured output. Same tool_calls shape on the response if the backend wants to surface internal steps.
A search backend that speaks chat-completions is a drop-in for any code that already speaks OpenAI. No wrapper, no new SDK, no second auth flow.
More than just "web"
The thing that pushed me over the edge to drop the wrapper plan: pixserp routes to ten different answer shapes through the same endpoint.
def ask(q, model="pixserp-standard"):
r = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": q}],
)
return r.choices[0].message.content, r.choices[0].message.citations
ask("Cheapest direct MXP→JFK on July 18 in economy")
ask("Hotels in Barcelona Jul 15–20, 4 stars or more")
ask("Summarize https://www.youtube.com/watch?v=jNQXAC9IVRw in 5 bullets")
ask("iPhone 15 Pro under $900 with free shipping")
ask("Best ramen near Porta Garibaldi")
Flights, hotels, places, shopping, transcripts, news, images — all the same call. The citations array carries per-shape structured fields: rating, price, gps, hours, airline, check_in. Your renderer can show real cards instead of bare links.
I'm not using all ten verticals in my app yet. But I'm not paying a "stitching tax" anymore either.
Streaming works the way you expect
The bit that mattered most for the chat UI:
stream = client.chat.completions.create(
model="pixserp-fast",
messages=[{"role": "user", "content": "What did the Fed announce today?"}],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta
if delta.content:
print(delta.content, end="", flush=True)
Standard OpenAI SSE. The delta shape is the one your code already handles. Citations land on the final chunk in delta.citations (or via model_extra depending on your SDK version) so the references panel can render once the answer settles.
I dropped a custom EventSource handler I'd written for a different vendor — back to the stock openai stream parser.
Per-call cost on a header
The bit dev.to readers ask about most after "does it work": cost visibility. Pixserp puts the per-call cost on the response header:
x-cost-usd: <decimal value per request>
Stick that into your structured logs and you've got per-call cost-per-customer, per-feature, per-route — without paying for a third-party billing analytics tool.
Gotchas (the honest part)
Five things I tripped over while migrating:
Don't bring your own retries. The server handles upstream-vendor retries (Google Flights timing out, etc.) and surfaces a definitive answer or a definitive error. Wrapping a retry loop around
chat.completions.createin your app double-counts and burns credit. Trust the server, fail fast on 4xx.tool_callsare not OpenAI function calls. If the response includes atool_callsarray, those are the internal retrieval steps surfaced for transparency (which page fetches, which searches). They are NOT a request for you to execute a function — there's no tool-loop to close. Don'tsubmit_tool_outputs. Render them or ignore them.Context windows are smaller than chat models. Search synthesis is fast and short by design; if you throw a 30k-token system prompt at it, you're paying for tokens you won't use. Keep the user message focused on the question.
response_format: json_schemais web-grounded, not free-form. If you ask for{"answer": str, "year": int}and the live web doesn't know the year, the field will benull— not hallucinated. That's the feature, but worth knowing if your downstream code expects a non-null value.Streaming
delta.citationsshape depends on SDK version.pip show openai≥ 1.55 exposes them in the typed delta. Older versions: read citations fromr.choices[0].message.model_extraafter the stream completes.
Wrapping up
If you're already calling openai.chat.completions.create, adding live web search to your agent is a base_url swap and a model-name change. No new SDK. No new auth pattern. No "we'll do it next sprint" because the integration shape changed.
For the full reference see pixserp's docs. Free credit on signup if you want to throw a real workload at it — no card.
I'm shipping the rest of my agent's verticals on top of this now. If you've been stitching three or four search APIs together for one chat reply, take fifteen minutes and try the swap. It's the smallest amount of code I've ever deleted to add a feature.
pixserp is an OpenAI-compatible AI search API — docs · pricing. Built at Teti AI.
Top comments (0)