DEV Community

Hoshang Mehta
Hoshang Mehta

Posted on

5 unsexy data things to get right to make AI work

Here are ๐Ÿ๐ข๐ฏ๐ž ๐ฎ๐ง๐ฌ๐ž๐ฑ๐ฒ ๐›๐ฎ๐ญ ๐œ๐ซ๐ข๐ญ๐ข๐œ๐š๐ฅ ๐๐š๐ญ๐š ๐ญ๐ก๐ข๐ง๐ ๐ฌ you need to get right before your analytics agent touches real customer data in databases, warehouses, and business apps.

Most teams start with the same assumptions: give the agent read-only database access, put a thin API in front of it, rely on RBAC or row-level security, and figure out monitoring later if something breaks. These approaches feel safe because theyโ€™ve worked for humans and services -but they werenโ€™t designed for autonomous systems that explore, retry, and operate at scale.

A few core things to consider:

  1. ๐ˆ๐ฌ๐จ๐ฅ๐š๐ญ๐ข๐จ๐ง, ๐ง๐จ๐ญ ๐ฃ๐ฎ๐ฌ๐ญ ๐ฉ๐ž๐ซ๐ฆ๐ข๐ฌ๐ฌ๐ข๐จ๐ง๐ฌ
    Agents shouldnโ€™t see raw tables. They need sandboxed, pre-defined views that already encode some level of joins, filters, and business logic. Safety has to exist before the query runs.

  2. ๐€๐ ๐ž๐ง๐ญ-๐š๐ฐ๐š๐ซ๐ž ๐š๐œ๐œ๐ž๐ฌ๐ฌ ๐ฆ๐จ๐๐ž๐ฅ๐ฌ
    Human IAM assumes intent. Agents donโ€™t have intentโ€”they explore. Access needs hard boundaries: what can be queried, how often, with which parameters, and at what cost.

  3. ๐ƒ๐ž๐ญ๐ž๐ซ๐ฆ๐ข๐ง๐ข๐ฌ๐ญ๐ข๐œ ๐ข๐ง๐ญ๐ž๐ซ๐Ÿ๐š๐œ๐ž๐ฌ ๐จ๐ฏ๐ž๐ซ ๐Ÿ๐ซ๐ž๐ž-๐Ÿ๐จ๐ซ๐ฆ ๐ช๐ฎ๐ž๐ซ๐ฒ๐ข๐ง๐ 
    Unbounded SQL is a footgun. Structured tools with defined inputs reduce prompt injection, data leakage, and accidental over-querying.

  4. ๐‚๐ซ๐จ๐ฌ๐ฌ-๐ฌ๐ฒ๐ฌ๐ญ๐ž๐ฆ ๐œ๐จ๐ง๐ญ๐ž๐ฑ๐ญ
    Most real questions span CRM + product + billing + support. Teams either overexpose everything or duplicate logic in brittle APIs. Neither scales.

  5. ๐Ž๐›๐ฌ๐ž๐ซ๐ฏ๐š๐›๐ข๐ฅ๐ข๐ญ๐ฒ ๐š๐ฌ ๐š ๐Ÿ๐ข๐ซ๐ฌ๐ญ-๐œ๐ฅ๐š๐ฌ๐ฌ ๐ซ๐ž๐ช๐ฎ๐ข๐ซ๐ž๐ฆ๐ž๐ง๐ญ
    You need to know what agents queried, what they returned, how long it took, and how much it cost -per agent, per workflow. Post-hoc logs arenโ€™t enough.

The biggest mistake I see teams make is treating security as an afterthought. Once your first agent is live, itโ€™s already too late to bolt it on.

This isnโ€™t about trusting models to behave. Itโ€™s about designing a clear agent-to-data access layer upfront with guardrails that define what agents are allowed to see, query, and act on.

--

๐ˆ๐Ÿ ๐ฒ๐จ๐ฎโ€™๐ซ๐ž ๐ญ๐ก๐ข๐ง๐ค๐ข๐ง๐  ๐ญ๐ก๐ซ๐จ๐ฎ๐ ๐ก ๐ก๐จ๐ฐ ๐ญ๐จ ๐ ๐ž๐ญ ๐ฒ๐จ๐ฎ๐ซ ๐๐š๐ญ๐š ๐ซ๐ž๐š๐๐ฒ ๐Ÿ๐จ๐ซ ๐€๐ˆ, ๐ก๐š๐ฉ๐ฉ๐ฒ ๐ญ๐จ ๐ฌ๐ก๐š๐ซ๐ž ๐ฐ๐ก๐š๐ญ ๐ฐ๐žโ€™๐ซ๐ž ๐ฅ๐ž๐š๐ซ๐ง๐ข๐ง๐  ๐ฐ๐ก๐ข๐ฅ๐ž ๐›๐ฎ๐ข๐ฅ๐๐ข๐ง๐  ๐๐ฒ๐ฅ๐š๐ซ

Top comments (0)