DEV Community

Cover image for I gave the OpenAI SDK live web search by changing one line
mv7
mv7

Posted on

I gave the OpenAI SDK live web search by changing one line

Last week I was stitching together Serper for web + Tavily for synthesis + a YouTube transcript API just to answer one chat question. Every category was a new API contract, a new auth flow, a new JSON shape my agent had to know about.

I almost wrote a wrapper SDK. Then I realized I didn't need to — the openai SDK can already do all of this. You just point it at a different base_url.

Here's the trick

from openai import OpenAI

client = OpenAI(
    api_key="pxs_...",                        # pixserp key, not OpenAI
    base_url="https://pixserp.com/api/v1",    # one URL change
)

r = client.chat.completions.create(
    model="pixserp-standard",
    messages=[{"role": "user", "content": "EU AI Act enforcement timeline 2026"}],
)
print(r.choices[0].message.content)
print(r.choices[0].message.citations)
Enter fullscreen mode Exit fullscreen mode

That's it. The openai SDK doesn't know it's talking to a search backend. It thinks it's calling chat.completions.create. The server does the routing — runs the searches, fetches pages, synthesizes a cited answer — and returns a standard OpenAI assistant message.

Inline [1], [2] markers in content. Structured message.citations array with one entry per source. Drop into your existing typed code, render in your existing UI.

Why this works at all

The openai SDK doesn't validate response shape strictly — it expects the OpenAI wire format and serializes whatever comes back into a ChatCompletion object. As long as the server returns:

  • choices[*].message.role = "assistant"
  • choices[*].message.content (the cited text)
  • Standard usage block
  • Optional extra fields (citations, cost) accessible via model_extra

…it just works. Same SSE streaming for stream=True. Same response_format: { type: "json_schema" } for structured output. Same tool_calls shape on the response if the backend wants to surface internal steps.

A search backend that speaks chat-completions is a drop-in for any code that already speaks OpenAI. No wrapper, no new SDK, no second auth flow.

More than just "web"

The thing that pushed me over the edge to drop the wrapper plan: pixserp routes to ten different answer shapes through the same endpoint.

def ask(q, model="pixserp-standard"):
    r = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": q}],
    )
    return r.choices[0].message.content, r.choices[0].message.citations

ask("Cheapest direct MXP→JFK on July 18 in economy")
ask("Hotels in Barcelona Jul 15–20, 4 stars or more")
ask("Summarize https://www.youtube.com/watch?v=jNQXAC9IVRw in 5 bullets")
ask("iPhone 15 Pro under $900 with free shipping")
ask("Best ramen near Porta Garibaldi")
Enter fullscreen mode Exit fullscreen mode

Flights, hotels, places, shopping, transcripts, news, images — all the same call. The citations array carries per-shape structured fields: rating, price, gps, hours, airline, check_in. Your renderer can show real cards instead of bare links.

I'm not using all ten verticals in my app yet. But I'm not paying a "stitching tax" anymore either.

Streaming works the way you expect

The bit that mattered most for the chat UI:

stream = client.chat.completions.create(
    model="pixserp-fast",
    messages=[{"role": "user", "content": "What did the Fed announce today?"}],
    stream=True,
)
for chunk in stream:
    delta = chunk.choices[0].delta
    if delta.content:
        print(delta.content, end="", flush=True)
Enter fullscreen mode Exit fullscreen mode

Standard OpenAI SSE. The delta shape is the one your code already handles. Citations land on the final chunk in delta.citations (or via model_extra depending on your SDK version) so the references panel can render once the answer settles.

I dropped a custom EventSource handler I'd written for a different vendor — back to the stock openai stream parser.

Per-call cost on a header

The bit dev.to readers ask about most after "does it work": cost visibility. Pixserp puts the per-call cost on the response header:

x-cost-usd: <decimal value per request>
Enter fullscreen mode Exit fullscreen mode

Stick that into your structured logs and you've got per-call cost-per-customer, per-feature, per-route — without paying for a third-party billing analytics tool.

Gotchas (the honest part)

Five things I tripped over while migrating:

  1. Don't bring your own retries. The server handles upstream-vendor retries (Google Flights timing out, etc.) and surfaces a definitive answer or a definitive error. Wrapping a retry loop around chat.completions.create in your app double-counts and burns credit. Trust the server, fail fast on 4xx.

  2. tool_calls are not OpenAI function calls. If the response includes a tool_calls array, those are the internal retrieval steps surfaced for transparency (which page fetches, which searches). They are NOT a request for you to execute a function — there's no tool-loop to close. Don't submit_tool_outputs. Render them or ignore them.

  3. Context windows are smaller than chat models. Search synthesis is fast and short by design; if you throw a 30k-token system prompt at it, you're paying for tokens you won't use. Keep the user message focused on the question.

  4. response_format: json_schema is web-grounded, not free-form. If you ask for {"answer": str, "year": int} and the live web doesn't know the year, the field will be null — not hallucinated. That's the feature, but worth knowing if your downstream code expects a non-null value.

  5. Streaming delta.citations shape depends on SDK version. pip show openai ≥ 1.55 exposes them in the typed delta. Older versions: read citations from r.choices[0].message.model_extra after the stream completes.

Wrapping up

If you're already calling openai.chat.completions.create, adding live web search to your agent is a base_url swap and a model-name change. No new SDK. No new auth pattern. No "we'll do it next sprint" because the integration shape changed.

For the full reference see pixserp's docs. Free credit on signup if you want to throw a real workload at it — no card.

I'm shipping the rest of my agent's verticals on top of this now. If you've been stitching three or four search APIs together for one chat reply, take fifteen minutes and try the swap. It's the smallest amount of code I've ever deleted to add a feature.


pixserp is an OpenAI-compatible AI search API — docs · pricing. Built at Teti AI.

Top comments (0)