DEV Community

Cover image for Fast, Efficient, and Confidently Delivered — But Wrong
John Dreic
John Dreic

Posted on • Originally published at contextgate.ai

Fast, Efficient, and Confidently Delivered — But Wrong

Have you ever asked a simple question — “how many customers do we have?” — and got three different answers?

I have. Sales said one number, finance said another, and the founder pulled up a dashboard that disagreed with both. Nobody was wrong. They were each looking at a different system, with a slightly different definition. We’d been making decisions on those numbers for months.

I came across a thread on a BI sub the other week. An analyst had put two team dashboards side-by-side. Both had a “Revenue” column. The numbers didn’t match. He looked into it: “Two analysts had written two different calculations for ‘Revenue.’ One was gross. The other was net. Neither was wrong. They just never agreed on a single definition.”

That’s the boring version of the problem. The newer version is in another thread from r/analytics — three weeks old, 426 upvotes — where the OP’s CEO killed the BI tool and told everyone to “just ask Claude” for the numbers instead. Predictable result: “sales VP was pulling numbers and they didn’t match with finance. Claude was hallucinating retention figures because the underlying tables hadn’t been cleaned since 2022.”

Top comment, 216 upvotes, basically wrote the moral: “AI only works if the underlying data is already clean and the metrics are defined. If you skip that step, Claude just gives you confident nonsense faster.”

Different problem, same shape. Definitions drift. AI on top makes the drift faster.

So I built an agent that doesn’t get to drift.

How many enterprise customers do we have?

I seeded a workspace database with the kind of mess most companies actually have. Two tables. One called stripe_customers with billing data. One called hubspot_companies with the CRM view. Same business, same set of customers, different definitions of “enterprise.”

Then I asked the agent the simple question.

The agent's reply: 'We have 9 enterprise customers in Stripe and 9 enterprise customer companies in HubSpot. Reconciled by company name and domain, 8 companies match across both systems, with 1 Stripe-only and 1 HubSpot-only.' Followed by a metric table breaking down Stripe enterprise (9), Stripe active enterprise (8), HubSpot enterprise customers (9), Overlap (8), Stripe-only (1), HubSpot-only (1). Below it: the auditable SQL filters used and the matched / unmatched company list.
*Same question, six numbers. The agent surfaced all of them and explained why they disagree.*

It came back with six numbers, not one.

  • Stripe enterprise plan: 9
  • Stripe active + paying enterprise: 8
  • HubSpot lifecycle=customer + enterprise tier: 9
  • Match across both: 8
  • Stripe-only: 1 (a $0 enterprise trial — Stripe says they’re enterprise, HubSpot hasn’t tagged them as a customer yet)
  • HubSpot-only: 1 (a deal signed last Friday — HubSpot has them as customer, Stripe billing starts next cycle)

Both totals say nine. They count different nines. Nobody’s lying. Both numbers are defensible. The agent surfaced all of them and explained why in plain English — “sync or naming/domain mismatch rather than a count mismatch.”

I didn’t write that reconciliation logic. I didn’t write the SQL. I asked one question.

What you don’t see in a chat box

The thing nobody talks about with “ask the AI for the numbers” tools is what’s happening underneath. Most of them work one of two ways:

The first kind stuffs your data into the prompt. Your customer table has 10,000 rows? You’re paying to read all of it, every question. And the LLM is a prediction engine, not a calculator — ask it to sum a thousand numbers and it’ll cheerfully invent the total.

The second kind wraps a chat box around your database. The AI never actually queries — it asks a translator, the translator guesses some SQL, you get a number back. You can’t see what was computed. When the number’s wrong, you can’t tell why.

The agent above doesn’t do either. It has actual SQL tools. It saw the schema and wrote the query itself. The query ran against the real database. The exact query got logged.

A run-history view showing two consecutive 'Execute Sql' tool calls. Call 7 of 8 is marked with a red X (Failed). Call 8 of 8 is marked with a green check (Completed). The Request Arguments for call 8 are expanded, showing a long common-table-expression SQL query with WITH clauses for stripe\_base, stripe\_enterprise, hubspot\_base and hubspot\_enterprise, joined together with a FULL OUTER JOIN normalising company names and email domains. Below the query, the agent's plain-English summary begins: 'We have 9 enterprise customers in Stripe and 9 enterprise customer companies in HubSpot.'
*The exact SQL the agent wrote — in the audit log. First attempt failed (red); it self-corrected on the next try.*

That’s the agent’s actual SQL up there. A multi-step query joining the two tables together by matching company names and email domains. The agent worked it out on its own, in five sub-queries it wrote and ran. The first attempt failed; it self-corrected on the next try.

The numbers are real because the SQL is real. Cost stays bounded — you pay for the question and a small response, not for dumping the table in every time. And the audit log is the proof. Compliance, finance, anyone on the team can scroll the run history and see every SELECT … FROM … the agent ran. When sales and finance disagree next quarter, you don’t have a “whose number is right” meeting — you have a “show me the SQL” conversation that takes ninety seconds.

Try it

Here’s the exact prompt I gave to the Workspace Assistant in ContextGate (that little robot icon on the bottom right) to build the whole thing for me.

Build me an agent that answers “how many enterprise customers do we have?” — it queries the workspace database directly so the numbers are real. When our stripe_customers and hubspot_companies tables disagree (they do), it should surface all the numbers and explain why in plain English. Read-only.

Click approve when it asks to set up the database tools and you have it.

Top comments (0)