DEV Community

Murali Gour
Murali Gour

Posted on

We built columnar data ops for AI agents — here's why and how

If you've built an AI agent that touches real enterprise data, you've probably hit this wall.

Your agent pulls 2,000 records from Salesforce. Now what? The model can't reliably filter, sort, or group 2,000 rows inside its context window. You don't want to dump all of it as raw JSON. And spinning up a Python runtime just to run a pandas filter feels like overkill for what should be a simple operation.

This is the problem we kept running into at DataGrout. So we built Frame.

What is Frame?

Frame is a suite of columnar data operations exposed as MCP tools — callable directly by any AI agent without a Python runtime, extra infrastructure, or round-trips to an analytics API.

Here's what it looks like in practice. An agent receiving tabular records from a CRM can do this in a single workflow step:

frame.filter({ payload, where: [{ field: "status", op: "eq", value: "active" }] })
→ frame.sort({ payload: "$filter.records", by: [{ field: "revenue", dir: "desc" }] })
→ frame.slice({ payload: "$sort.records", offset: 0, limit: 50 })

No Python. No pandas. No external call. Pure deterministic output the agent can immediately act on.

The full operation set

frame.filter — declarative row filtering with 10+ operators (eq, neq, gte, lte, contains, starts_with, is_null...)
frame.sort — multi-column sorting with per-field direction control
frame.group — aggregate by key, compute counts, sums, averages
frame.pivot — reshape rows into columns for cross-tab analysis
frame.join — merge two datasets on a shared key field
frame.slice — page or window over large records
frame.select — keep, drop, or rename columns in one pass
Frame.pluck —extract one column into a flat array, dot-notation supported

Why deterministic matters

One of the core design decisions with Frame was making every operation pure and deterministic. No AI generation touches the data transformation layer. The agent decides what to do, Frame executes it exactly. This eliminates a whole class of hallucination risk that comes with asking an LLM to reshape data directly.

Handling large datasets

Frame accepts cache_ref outputs from previous tool calls, so agents can operate on large paginated datasets without retransmitting the full payload each time. This was critical for production workflows where data sets run into tens of thousands of rows.

How it composes

Frame tools chain together natively via flow.into inside DataGrout workflows. The output of frame.filter feeds directly into frame.sort without any manual wiring. This composability is what makes it genuinely useful in multi-step agent workflows rather than just as a standalone utility.

Where we are today

Frame is live at datagrout.ai/tools/frame and We launched on Product Hunt today — would appreciate your support if this is useful to you!
We're actively building out the operation set. What data operations are you missing in your agent workflows? Drop them in the comments — we're reading everything.

Top comments (0)