DEV Community

Cover image for Stop Rewriting the Same LLM Boilerplate: Batch-Process DataFrames in 3 Lines
ptimizeroracle
ptimizeroracle

Posted on

Stop Rewriting the Same LLM Boilerplate: Batch-Process DataFrames in 3 Lines

I processed a 158K-row product catalog through GPT-4 last year. Free-text descriptions in, structured attributes out — brand, category, material, quality score. The script ran over a weekend.

Monday morning: $400 gone, crashed at row 91K, no checkpoint. I restarted from zero.

That was the fourth time I'd written the same batch-processing boilerplate for an LLM job. Retry logic, rate-limit handling, checkpoint/resume, cost tracking, structured output validation. Different project each time, same 150 lines of glue code. I kept rewriting it because none of the existing tools quite fit — LangChain for agents, LlamaIndex for RAG, but nothing for "just run this prompt on every row of my DataFrame and give me columns back."

So I built the thing.

The actual problem

Here's what batch LLM processing looks like without a library:

# The boilerplate you've probably written 3 times already
import openai, json, time

client = openai.OpenAI()
results = []

for i, row in enumerate(df.itertuples()):
    for attempt in range(3):
        try:
            response = client.chat.completions.create(
                model="gpt-4o-mini",
                messages=[{"role": "user",
                    "content": f"Extract brand and category from: {row.description}"}],
                response_format={"type": "json_object"}
            )
            results.append(json.loads(response.choices[0].message.content))
            break
        except Exception as e:
            if attempt == 2:
                results.append({"brand": None, "category": None})
            time.sleep(2 ** attempt)

# No checkpoint. No cost tracking. Fingers crossed it doesn't die at row 80K.
Enter fullscreen mode Exit fullscreen mode

That's the minimal version — no checkpointing, no cost control, no schema enforcement. Add those and you're at 150+ lines before a single row of business logic. And you'll write it again on the next project because it's too tangled to extract cleanly.

Here's the same thing with Ondine:

from ondine import QuickPipeline

result = QuickPipeline(
    source=df,
    prompt="Extract brand, category, and sentiment from: {description}",
    output_columns=["brand", "category", "sentiment"],
    model="gpt-4o-mini",
    budget_limit=25.00,
).run()

# result now has 3 new columns: brand, category, sentiment
print(result[["description", "brand", "category", "sentiment"]].head())
Enter fullscreen mode Exit fullscreen mode

The {description} placeholder maps to your input column. output_columns tells Ondine exactly which new columns to create — it enforces that the LLM returns all three, and retries if any are missing. Checkpointing, retries, cost tracking all on by default.

What I built into it (and why)

Checkpointing was first. Obviously, after the $400 crash. Ondine saves state to Parquet after every batch. Process dies at row 50K of 200K? Restart, pick up at 50K. I've recovered more runs with this than I want to admit.

Cost control came second. Before any API call, Ondine estimates the full run cost from token counts and current pricing. You set a hard dollar cap. It stops when it hits the cap — not "logs a warning." Stops. I needed something I could start before leaving the office.

Structured output is the one people underestimate. You ask GPT to return JSON and 95% of responses are valid. Fine. But at 100K rows, 5% is 5,000 broken responses scattered through your results. Ondine enforces Pydantic schemas and auto-retries on bad output (up to 3x). Define the shape, it guarantees it.

from pydantic import BaseModel
from ondine import Pipeline

class ProductAttributes(BaseModel):
    brand: str
    category: str
    sentiment: float  # 0.0 to 1.0, always a float, always

result = (
    Pipeline
    .from_csv("products.csv", input_columns=["title", "description"], output_columns=["brand", "category", "sentiment"])
    .with_prompt("From {title} and {description}, extract brand, category, and sentiment score")
    .with_model("gpt-4o-mini")
    .with_schema(ProductAttributes)
    .run()
)
# result["sentiment"] is always a float. result["brand"] is always a string.
# If the LLM returns garbage, Ondine retries up to 3x before marking the row failed.
Enter fullscreen mode Exit fullscreen mode

Multi-row batching is where throughput gets interesting. Instead of one API call per row, Ondine packs N rows into one call. Same total tokens. For 10K rows at batch_size=50, that's 200 calls instead of 10,000 — the wall time difference is significant on large runs.

The anti-hallucination layer I added last, after finding the model was inventing brand names that weren't anywhere in the source text. A post-processing context store (Rust + SQLite + FTS5, ~3% overhead) checks each output against the input. Flags unsupported assertions, catches contradictions when the same row appears twice with different outputs. Not foolproof, but it catches the obvious cases.

Under the hood

Python asyncio runs the pipeline. Hot paths — context store, schema validation, checkpoint serialization — are Rust via PyO3 because Python was the bottleneck on large runs. LiteLLM handles provider switching so the same call works against 100+ models:

model="gpt-4o-mini"                    # OpenAI
model="groq/llama-3.1-70b-versatile"   # Groq (fast and cheap)
model="ollama/mistral"                 # local Ollama
model="mlx/mlx-community/Meta-Llama-3.1-8B-Instruct-4bit"  # Apple Silicon, no server
Enter fullscreen mode Exit fullscreen mode

One thing worth knowing upfront: multi-row batching works great against cloud APIs and local vLLM/SGLang servers. It doesn't help with vanilla Ollama — single-request architecture. If you're running local models through Ollama, you still get checkpointing and structured output enforcement, just not the batching speedup.

What it's not

Not an agent framework. No tool-calling, no multi-step reasoning, no memory. If you need that, LangChain or Marvin is the right call — Ondine is deliberately narrower.

Not PandasAI either. That's conversational "talk to your data." This is production batch processing — you define the task, it executes it at scale.

The positioning that works for me: it's what you reach for the fifth time you've rewritten the same LLM batch loop. If you're on your first or second time, the boilerplate is probably fine. By the third or fourth, you'll want this.

Try it

pip install ondine
Enter fullscreen mode Exit fullscreen mode
from ondine import QuickPipeline

result = QuickPipeline(
    source="your_data.csv",
    prompt="your task here",
    model="gpt-4o-mini",
    budget_limit=10.00,
).run()
Enter fullscreen mode Exit fullscreen mode

Docs at docs.ondine.dev — quickstart runs in under 2 minutes. GitHub at github.com/ptimizeroracle/ondine, MIT licensed. I'm on issues if anything breaks.

Top comments (0)