10x Your AI API Throughput with Batch Processing

#ai #llm #api #python

Sending AI requests one at a time is slow. Here's how to process 100 prompts simultaneously — with 10x throughput and 40% cost savings.

Sending AI requests sequentially is painfully slow. 100 prompts × 2 seconds each = 3+ minutes.

Here's how to process them in parallel using batch processing.

The Slow Way (Don't Do This)

# 100 prompts = 3 minutes of waiting
results = []
for prompt in prompts:
    response = client.chat.completions.create(
        model="deepseek-v4-flash",
        messages=[{"role": "user", "content": prompt}]
    )
    results.append(response.choices[0].message.content)

The Fast Way: Async + Batch

import asyncio
from openai import AsyncOpenAI

client = AsyncOpenAI(
    api_key="mb-your-key",
    base_url="https://aibridge-api.com/v1"
)

async def process_prompt(prompt):
    response = await client.chat.completions.create(
        model="deepseek-v4-flash",
        messages=[{"role": "user", "content": prompt}],
        max_tokens=200
    )
    return response.choices[0].message.content

async def batch_process(prompts):
    tasks = [process_prompt(p) for p in prompts]
    return await asyncio.gather(*tasks)

# Process 100 prompts in 5 seconds
results = asyncio.run(batch_process(prompts))