I processed a 158K-row product catalog through GPT-4 last year. Free-text descriptions in, structured attributes out — brand, category, material, quality score. The script ran over a weekend.
Monday morning: $400 gone, crashed at row 91K, no checkpoint. I restarted from zero.
That was the fourth time I'd written the same batch-processing boilerplate for an LLM job. Retry logic, rate-limit handling, checkpoint/resume, cost tracking, structured output validation. Different project each time, same 150 lines of glue code. I kept rewriting it because none of the existing tools quite fit — LangChain for agents, LlamaIndex for RAG, but nothing for "just run this prompt on every row of my DataFrame and give me columns back."
So I built the thing.
The actual problem
Here's what batch LLM processing looks like without a library:
# The boilerplate you've probably written 3 times already
import openai, json, time
client = openai.OpenAI()
results = []
for i, row in enumerate(df.itertuples()):
for attempt in range(3):
try:
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user",
"content": f"Extract brand and category from: {row.description}"}],
response_format={"type": "json_object"}
)
results.append(json.loads(response.choices[0].message.content))
break
except Exception as e:
if attempt == 2:
results.append({"brand": None, "category": None})
time.sleep(2 ** attempt)
# No checkpoint. No cost tracking. Fingers crossed it doesn't die at row 80K.
That's the minimal version — no checkpointing, no cost control, no schema enforcement. Add those and you're at 150+ lines before a single row of business logic. And you'll write it again on the next project because it's too tangled to extract cleanly.
Here's the same thing with Ondine:
from ondine import QuickPipeline
result = QuickPipeline(
source=df,
prompt="Extract brand, category, and sentiment from: {description}",
output_columns=["brand", "category", "sentiment"],
model="gpt-4o-mini",
budget_limit=25.00,
).run()
# result now has 3 new columns: brand, category, sentiment
print(result[["description", "brand", "category", "sentiment"]].head())
The {description} placeholder maps to your input column. output_columns tells Ondine exactly which new columns to create — it enforces that the LLM returns all three, and retries if any are missing. Checkpointing, retries, cost tracking all on by default.
What I built into it (and why)
Checkpointing was first. Obviously, after the $400 crash. Ondine saves state to Parquet after every batch. Process dies at row 50K of 200K? Restart, pick up at 50K. I've recovered more runs with this than I want to admit.
Cost control came second. Before any API call, Ondine estimates the full run cost from token counts and current pricing. You set a hard dollar cap. It stops when it hits the cap — not "logs a warning." Stops. I needed something I could start before leaving the office.
Structured output is the one people underestimate. You ask GPT to return JSON and 95% of responses are valid. Fine. But at 100K rows, 5% is 5,000 broken responses scattered through your results. Ondine enforces Pydantic schemas and auto-retries on bad output (up to 3x). Define the shape, it guarantees it.
from pydantic import BaseModel
from ondine import Pipeline
class ProductAttributes(BaseModel):
brand: str
category: str
sentiment: float # 0.0 to 1.0, always a float, always
result = (
Pipeline
.from_csv("products.csv", input_columns=["title", "description"], output_columns=["brand", "category", "sentiment"])
.with_prompt("From {title} and {description}, extract brand, category, and sentiment score")
.with_model("gpt-4o-mini")
.with_schema(ProductAttributes)
.run()
)
# result["sentiment"] is always a float. result["brand"] is always a string.
# If the LLM returns garbage, Ondine retries up to 3x before marking the row failed.
Multi-row batching is where throughput gets interesting. Instead of one API call per row, Ondine packs N rows into one call. Same total tokens. For 10K rows at batch_size=50, that's 200 calls instead of 10,000 — the wall time difference is significant on large runs.
The anti-hallucination layer I added last, after finding the model was inventing brand names that weren't anywhere in the source text. A post-processing context store (Rust + SQLite + FTS5, ~3% overhead) checks each output against the input. Flags unsupported assertions, catches contradictions when the same row appears twice with different outputs. Not foolproof, but it catches the obvious cases.
Under the hood
Python asyncio runs the pipeline. Hot paths — context store, schema validation, checkpoint serialization — are Rust via PyO3 because Python was the bottleneck on large runs. LiteLLM handles provider switching so the same call works against 100+ models:
model="gpt-4o-mini" # OpenAI
model="groq/llama-3.1-70b-versatile" # Groq (fast and cheap)
model="ollama/mistral" # local Ollama
model="mlx/mlx-community/Meta-Llama-3.1-8B-Instruct-4bit" # Apple Silicon, no server
One thing worth knowing upfront: multi-row batching works great against cloud APIs and local vLLM/SGLang servers. It doesn't help with vanilla Ollama — single-request architecture. If you're running local models through Ollama, you still get checkpointing and structured output enforcement, just not the batching speedup.
What it's not
Not an agent framework. No tool-calling, no multi-step reasoning, no memory. If you need that, LangChain or Marvin is the right call — Ondine is deliberately narrower.
Not PandasAI either. That's conversational "talk to your data." This is production batch processing — you define the task, it executes it at scale.
The positioning that works for me: it's what you reach for the fifth time you've rewritten the same LLM batch loop. If you're on your first or second time, the boilerplate is probably fine. By the third or fourth, you'll want this.
Try it
pip install ondine
from ondine import QuickPipeline
result = QuickPipeline(
source="your_data.csv",
prompt="your task here",
model="gpt-4o-mini",
budget_limit=10.00,
).run()
Docs at docs.ondine.dev — quickstart runs in under 2 minutes. GitHub at github.com/ptimizeroracle/ondine, MIT licensed. I'm on issues if anything breaks.
Top comments (0)