DEV Community

Daniel Dong
Daniel Dong

Posted on

10x Your AI API Throughput with Batch Processing

Sending AI requests one at a time is slow. Here's how to process 100 prompts simultaneously — with 10x throughput and 40% cost savings.

Sending AI requests sequentially is painfully slow. 100 prompts × 2 seconds each = 3+ minutes.

Here's how to process them in parallel using batch processing.

The Slow Way (Don't Do This)

# 100 prompts = 3 minutes of waiting
results = []
for prompt in prompts:
    response = client.chat.completions.create(
        model="deepseek-v4-flash",
        messages=[{"role": "user", "content": prompt}]
    )
    results.append(response.choices[0].message.content)

Enter fullscreen mode Exit fullscreen mode

The Fast Way: Async + Batch

import asyncio
from openai import AsyncOpenAI

client = AsyncOpenAI(
    api_key="mb-your-key",
    base_url="https://aibridge-api.com/v1"
)

async def process_prompt(prompt):
    response = await client.chat.completions.create(
        model="deepseek-v4-flash",
        messages=[{"role": "user", "content": prompt}],
        max_tokens=200
    )
    return response.choices[0].message.content

async def batch_process(prompts):
    tasks = [process_prompt(p) for p in prompts]
    return await asyncio.gather(*tasks)

# Process 100 prompts in 5 seconds
results = asyncio.run(batch_process(prompts))
Enter fullscreen mode Exit fullscreen mode

The Results

Method 100 Prompts 1000 Prompts
Sequential 3 minutes 30 minutes
Async batch 5 seconds 45 seconds
Speedup 36x 40x

Pro Tips

  1. Use deepseek-v4-flash for batch jobs (fastest + cheapest)
  2. Add asyncio.Semaphore(10) to limit concurrency (avoid rate limits)
  3. Add retry logic for failed tasks
  4. Save intermediate results (in case of crash)

Try It

  1. Copy the code above
  2. Get a free API key → aibridge-api.com
  3. Replace your sequential loop
  4. Watch your throughput 10x

mainpage

models

playground

pricing

Top comments (0)