DEV Community

Daniel Dong
Daniel Dong

Posted on

3 Lines of Code to Add Streaming AI Responses

Streaming makes your AI app feel 3x faster. Here's the minimal code to add it to any app using an OpenAI-compatible API.

Streaming is the #1 UX upgrade for AI apps. Instead of waiting 3 seconds for a full response, users see the first token in < 500ms.

Here's the minimal code:

response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[{"role": "user", "content": prompt}],
    stream=True  # ← That's it
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")
Enter fullscreen mode Exit fullscreen mode

Frontend (JavaScript):

const response = await fetch('/api/ai', { method: 'POST' });
const reader = response.body.getReader();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  console.log(new TextDecoder().decode(value));  // Stream tokens
}

Enter fullscreen mode Exit fullscreen mode

Result: Your users see output in real-time. Feels way faster.

Try it: aibridge-api.com

mainpage

models

playground

pricing

Top comments (0)