DEV Community: Gaur Data

The WHERE clause you have to remember on every query

Gaur Data — Wed, 08 Jul 2026 09:01:34 +0000

In most embedded analytics, the only thing standing between two customers' data is a WHERE workspace_id = :me that your application code has to remember on every single query.

It holds for the first dashboard and the tenth. Then one query forgets the filter, and customer A is looking at customer B's revenue inside a tile that renders without complaint.

That is not an analytics bug. It is a data-leak security bug wearing an analytics costume. The output looks like a chart, so it gets reviewed like a chart, when it should be reviewed like an auth boundary.

The story

Here is an illustrative scenario. The company is invented; the mechanism is not, and I have watched the shape of this happen more than once.

A retail-analytics SaaS sells a dashboard to store chains. Each chain is a workspace. Inside a workspace, a regional manager drills into a single store: revenue today, orders this week, a trend line by day.

The isolation lives where it usually lives, in the API layer. Every handler pulls the workspace off the authenticated session and appends the filter before the query runs.

SELECT sum(revenue), sum(orders)
FROM store_daily_sales
WHERE workspace_id = $1   -- from the session, never the client
  AND store_id = $2
  AND day >= $3;

For a year this is fine. There are maybe fifteen endpoints. Code review catches the one time someone pastes a query without the workspace_id line. The team feels good about it, and they are not wrong to.

Then product ships the feature everyone asks for: "ask your data." A chat box on the dashboard. A regional manager types "which of my stores grew fastest last month" and gets an answer in plain language.

Under the hood, a model now writes the SQL.

Read that again, because it is the whole essay. The filter that used to live in fifteen hand-reviewed endpoints is now the responsibility of a language model generating novel SQL on every question, for every user, in production. You are trusting the model to remember WHERE workspace_id = :me on a query no human will read before it runs.

The dashboard tiles and the copilot are both software consumers. Neither of them can look at a result and think "wait, these numbers are too big, this must be someone else's chain." A human analyst might catch it. A tile just renders whatever it is handed, and a model answers fluently in the voice of confidence.

The insight

Isolation re-implemented per query is a breach waiting for the one query that forgets. That was already true when the queries were handwritten, when the same risk sat hidden behind a small, countable set of endpoints a person could review. The copilot just removed that cover.

When the consumer is software, "I'll remember the filter" stops being a plan. The dashboard tile cannot sanity-check its own scope. The model cannot either, and it fails in the most dangerous direction, by producing a plausible answer rather than an error. A missing WHERE does not throw an error. It silently returns the wrong rows.

The deeper problem is where the tenant boundary lives. Put it in the SQL and everything that writes SQL, including a model, has to remember it on every query.

How teams usually try to hold the line

There is an honest progression here, and each step is a real improvement that runs out of road.

Roll-your-own filter in application code. The WHERE workspace_id = :me pattern. The rule is correct, but it lives in application logic and gets re-typed on every query. Its safety is a function of discipline and how many surfaces you have. Add a surface that writes its own queries, like a copilot, and discipline is no longer enough, because there is no fixed set of queries to be disciplined about.

Database session-variable row-level security. Stronger, because now the database enforces the predicate instead of your handlers. This is a genuine step up. But it has sharp edges: under a connection pooler in transaction mode, a session variable set for one tenant can bleed into the next borrower of that connection unless you are meticulous about resetting it. And it only covers the SQL layer. It does nothing for the analytics API contract above it, and nothing for the model-driven interface that calls that API. Your isolation and your product surfaces are enforced in different places.

Generic embedded-dashboard tools with signed or locked parameters. These scope a human-facing dashboard: you sign the tenant into an embed token, the tool refuses to let the browser widen it. That is the right idea, and for a rendered dashboard it works. But it secures a dashboard, not a governed API and not an AI copilot. The moment your consumer is software that composes its own queries, a locked dashboard parameter is not the boundary anymore.

Each of these helps. Each of them enforces isolation at a different layer than where your next consumer will show up. That is the pattern worth naming: isolation keeps getting re-implemented per surface, and every new surface is a new place to forget.

The principle

Isolation belongs at the interface, declared once and enforced by the runtime, not re-added per query or per surface.

If the boundary is a property of the interface that every consumer calls, then the dashboard, the reporting API, and the copilot all inherit the same rule without any of them writing a WHERE. A caller that never writes the filter cannot forget the filter.

How Gaur does this

In Gaur, consumers do not send SQL. They call a contract, and the row-level security is part of the contract, so it is applied to every query the runtime generates, on every protocol, whether the caller is a dashboard tile or a model.

Two things make this hold up in a multi-tenant setting, and the distinction between them is the important part.

Auth context is bound to the API key when the key is created. It is immutable and workspace-level, for example workspace_id. The caller cannot change it, because it was fixed the moment the key was issued.

Request context is sent per call by your backend, derived from the authenticated session, and varies per end user, for example store_id. This is the piece the browser must never set. If an end user could put their own store_id on the request, they could read other stores. Your backend sets it from the session it already trusts; the client never touches it.

The rule itself is a boolean sql condition with {{ placeholder }} tokens, plus a parameters map that binds each token to either auth.<field> or request.<field>. Here is an illustrative contract. The retail data model is invented; the rls structure and its enforcement are the real, documented mechanism.

{
  "name": "store_sales",
  "sources": [{ "source_type": "model", "name": "store_daily_sales", "alias": "s" }],
  "dimensions": {
    "day": { "sql": "s.day", "type": "date", "semantic_type": "time", "time_grains": ["day","week","month"] }
  },
  "measures": {
    "revenue": { "sql": "sum(s.revenue)", "type": "number", "additivity": "additive" },
    "orders":  { "sql": "sum(s.orders)",  "type": "number", "additivity": "additive" }
  },
  "rls": {
    "sql": "s.workspace_id = {{ workspace_id }} AND s.store_id = {{ store_id }}",
    "parameters": {
      "workspace_id": "auth.workspace_id",
      "store_id": "request.store_id"
    }
  }
}

Both tokens are injected as bound SQL parameters, never string-interpolated into the query text. So the isolation predicate is not a place where injection lives, and the values arrive as parameters the caller did not get to write.

The same rule is enforced identically across the REST query API, the OpenAI-compatible chat endpoint, and MCP. That is the line that matters for the story above. When the team adds the copilot, the copilot talks to the chat endpoint, which calls the same contract, which carries the same rls block. The AI feature inherits the exact isolation the dashboard already had. Nothing needs to be kept in sync, because no caller, human or model, ever writes the filter at all.

Now the honest part. A contract does not stop you from writing the wrong rule. If you bind store_id to auth when it should be request, or you forget a clause, the runtime will faithfully enforce your mistake. What the contract changes is where the risk lives. It moves you from "every endpoint and every generated query must remember the filter" to "one rule, written once, tested once." You review the boundary in a single place instead of trusting every surface to re-add it. This does not guarantee correctness, and a wrong rule is still enforced faithfully. What it does is make the forgotten WHERE structurally hard to ship, a narrower and more honest promise.

Shameless plug

If you are shipping customer-facing analytics or an AI copilot over per-tenant data, the failure in this post is your default failure, not an edge case. Anywhere the consumer is software (an embedded dashboard, a reporting API, a tool-calling agent, a chat box over your business data) the tenant boundary cannot depend on every query remembering a WHERE.

Gaur lets you declare the isolation rule once on the contract and have the runtime enforce it identically across REST, chat, and MCP, so a new surface inherits the boundary instead of re-implementing it. It helps make the cross-tenant leak structurally hard to ship, not impossible to write.

If that is the boundary you are trying to hold, book a demo at gaur.run and bring your messiest multi-tenant case.

The customer_id that isn't a customer

Gaur Data — Mon, 29 Jun 2026 17:40:40 +0000

An error is the safe outcome. The dangerous one is a number.

When a text-to-SQL feature breaks the way you hope, it throws a missing column and dies on the screen. You see it, you fix it, nobody was misled.

The failure that should keep you up is the other one. The query runs, returns rows, and hands a real user a number that is wrong, at the moment the stakes are highest. Nothing errored. Just a plausible figure, formatted to two decimals, in front of the person least able to know it's off.

That gap, between an answer that is wrong and one that looks wrong, is the whole problem with pointing an LLM at a production database. It never shows up in the demo. The demo is built to avoid it.

What follows is not a hypothetical. It is a public dataset and a question with a known answer that an LLM gets confidently, catastrophically wrong. The data is the Olist Brazilian e-commerce dataset on Kaggle. You can reproduce every number here yourself.

"Nobody ever comes back"

What is our repeat-purchase rate? How many customers come back and buy again. That number separates a business that compounds from one that re-buys its whole revenue base every quarter.

You wire an "ask your data" box into your product. Someone types the question in plain English, and an LLM turns it into SQL and runs it. Two tables are in scope:

olist_customers_dataset          olist_orders_dataset
  customer_id                       order_id        (pk)
  customer_unique_id                customer_id
  customer_zip_code_prefix          order_status
  customer_city                     order_purchase_timestamp
  customer_state                    ...

Here is the SQL the LLM wrote. No human reviewed it; it ran straight against the database:

SELECT
    CASE WHEN order_count > 1 THEN 'repeat' ELSE 'one-time' END AS segment,
    COUNT(*) AS customers
FROM (
    SELECT c.customer_id, COUNT(DISTINCT o.order_id) AS order_count
    FROM olist_orders_dataset o
    JOIN olist_customers_dataset c ON c.customer_id = o.customer_id
    GROUP BY c.customer_id
) per_customer
GROUP BY segment;

The answer: one-time 99,441. Repeat 0. Repeat rate 0.00%.

Nobody ever comes back. Not one customer out of ninety-nine thousand has placed a second order.

That query, and that 0.00%, are the LLM's own output. Read it like a code reviewer: the join is valid, the aggregation is clean, COUNT(DISTINCT o.order_id) is exactly how you count a customer's orders. By the standard of "is this reasonable SQL," it is reasonable SQL.

It is also wrong, in the most decision-changing direction a retention number can be. And it's wrong for a reason the schema will never tell you.

The column that lies about what it is

Two real values from the customers table:

customer_id         06b8999e2fba1a1fbc88172c00ba8bc7
customer_unique_id  861eff4711a542e4b93843c6dd7febb0

Two 32-character hex strings, identical in form. Nothing signals which is the person and which is the receipt. If you had to guess which means "customer," you'd flip a coin. So does the LLM.

In Olist, customer_id is minted fresh for every order. It is an order key wearing the word "customer" in its name. The actual person, stable across orders, is customer_unique_id.

The customers table has exactly one row per customer_id: 99,441 rows, 99,441 ids. So when you group orders by customer_id, every group has exactly one order, by construction. Max orders per customer_id in the whole dataset: 1.

That is why the rate came back 0.00%. The query computed the repeat rate of orders pretending to be people, which is always exactly zero.

Change one token, group by the column that identifies a person, and the data flips:

    SELECT c.customer_unique_id, COUNT(DISTINCT o.order_id) AS order_count
    ...
    GROUP BY c.customer_unique_id   -- the only change: _unique_id

The truth: 96,096 real customers, 2,997 repeat buyers, a 3.12% repeat rate. One customer ordered 17 times.

Not a great retention number. But the distance between "3.12% come back" and "nobody comes back" is the distance between a business and a fire.

The simple count breaks too. COUNT(DISTINCT customer_id) says 99,441 customers; COUNT(DISTINCT customer_unique_id) says 96,096. The naive version overstates the base by 3,345 people, about 3.5%.

The count is wrong by a little. The retention answer is wrong by everything.

A bigger model is not the fix

The reflex is to reach for a better LLM. So I ran a small test. Not a benchmark, a sanity check.

Two open-source models, Llama 3.3 70B and Qwen 2.5 Coder 32B, both solid SQL writers, each asked the repeat-rate question several times, with only the raw schema to go on.

Both did the obvious thing: grouped by customer_id, reported 0.00% most of the time. The bigger model was not meaningfully safer than the smaller one.

A frontier model with the same schema and the same blind spot makes the same mistake with more fluency. The variable is not the model. It is what the model is allowed to know, and what it is allowed to do.

Why the demo never caught it

A demo tests questions a human already validated by eye. "Orders last month," fine. "Revenue by state," fine. Each one happened to dodge the customer_id versus customer_unique_id distinction.

The retention question was the first to force the choice. By then the only reader was someone who typed a sentence and could not see the SQL.

The LLM makes a modeling decision, which column is a customer, fresh on every query, with only the schema to go on. A column named customer_id does not announce that it is secretly an order key. That fact lives in someone's head, or nowhere.

The tooling makes it concrete: the interface here is an MCP server in raw-SQL mode, exposing a read-only execute_sql tool. No layer sits between the model's guess and the database. It runs the bad query and returns 0.00%, because running well-formed read-only SQL is its whole job.

Use the simplest abstraction the workload tolerates

None of this makes text-to-SQL a mistake. It makes it a choice with a domain.

Poking around Olist yourself on a Tuesday? Raw SQL over MCP is great. When the rate comes back 0.00%, you'll notice, because 0% is absurd, and you'll go find the _unique_id column. The blast radius is one person who can distrust the number.

Internal versus external is the obvious line, and it's the wrong one. The example above was internal, someone querying their own store, and the number could still walk into an investor update.

The real axis is who consumes the answer and whether they can sanity-check it. A human exploring squints at 0.00% and smells something wrong. Software does not. A dashboard tile, or an agent that reads the number and acts on it, has no instinct. "Looks plausible" is not a correctness check.

And 0.00% is the cruelest case: on a retention dashboard it doesn't look implausible. It looks like bad news. Nobody files a bug to report bad news.

So, the principle worth keeping even if you never touch Gaur: a confident wrong answer is what you get when you bring an exploration-grade abstraction to a consumption-grade workload. The mismatch is the bug, not the query engine.

Two layers: one helps, one enforces

This is the problem we work on at Gaur. Gaur (gaur.run) is a developer-first backend for data applications: connect your scattered sources, model them, and expose them to your app over REST, an OpenAI-compatible chat endpoint, and MCP, without standing up a warehouse and a data team first.

It gives you two ways to push back on the wrong answer above. They are not equally strong.

The cheap layer: tell the model what the schema can't

The schema failed to carry meaning, so carry it. In Gaur, every table, model, and contract holds author-written context that gets embedded and surfaced to chat and MCP. For the customers table, it says the one thing the column names hide:

## Field descriptions
- customer_id — A per-ORDER id, minted fresh for every order. NOT a person.
- customer_unique_id — The actual person. Stable across orders.

## Rules
- For any retention / repeat / per-customer metric, aggregate by
  customer_unique_id. Grouping by customer_id makes every customer look
  new, so the repeat rate computes as 0%. The true rate is 3.12%.

I re-ran the same test with that context in front of both models. The improvement was real: both reached for customer_unique_id much more often, and returned 3.12% where they had returned 0.00%.

But be precise, because "context fixes it" is the overclaim this essay argues against. Context helps. It does not enforce. The model still chooses; you've just made the right choice the obvious one. On the runs where it still picked customer_id, the answer was still 0.00%, still confident, still wrong.

The enforcing layer: take the choice away

The only way to make 0.00% unreachable is to stop exposing the column that produces it. That is what a contract does.

First, a model collapses orders to one row per real person:

-- Model: olist_customer_orders. Grain: one row per customer_unique_id.
SELECT c.customer_unique_id, count(DISTINCT o.order_id) AS order_count
FROM olist_orders_dataset o
JOIN olist_customers_dataset c ON c.customer_id = o.customer_id
GROUP BY c.customer_unique_id

Then a contract over that model exposes only declared measures and dimensions, all defined on customer_unique_id:

{
  "name": "customer_retention",
  "sources": [{ "source_type": "model", "name": "olist_customer_orders", "alias": "co" }],
  "measures": {
    "customers":        { "sql": "count(distinct co.customer_unique_id)", "type": "number" },
    "repeat_customers": { "sql": "sum(case when co.order_count > 1 then 1 else 0 end)", "type": "number" },
    "repeat_rate":      { "behavior": "derived", "numerator": "repeat_customers", "denominator": "customers" }
  }
}

The point: customer_id is not in this contract at all.

A consumer calling it, over REST, chat, or MCP, picks from declared measures and dimensions. It never names a column, never writes a GROUP BY, never sees that customer_id exists. It cannot reproduce 0.00%, because the building block is not on the shelf. The contract returns customers = 96,096, repeat_customers = 2,997, repeat_rate = 0.0312.

So I ran the test a third time. I published this contract to a contract-only consumer and pointed the same two models at it. Both returned 3.12% every time. There was no wrong column left to reach for.

One model still hallucinated, but it named a field the contract didn't define, and the contract rejected it. That is the behavior you want: the failure that errors, not the failure that lies.

Two honest caveats. The SQL-only demo from the earlier story doesn't expose contracts at all; that consumer's job is raw SQL, which is exactly why the model could reach the wrong column. Whether a caller can write SQL is a property of the consumer's query mode. Same platform, two postures.

And I won't claim contracts make wrong answers impossible. A measure that aggregates across the wrong join can still publish; you catch it by testing before you ship. What a contract changes is where the decision lives: the hard call, which column is a person, gets made once by someone who can reason about the schema, instead of re-guessed on every question by a model holding two indistinguishable hex strings.

Check it yourself

Don't take my word for it. Olist is a free download. Run the count both ways:

SELECT COUNT(DISTINCT customer_id)        FROM olist_customers_dataset;  -- 99,441
SELECT COUNT(DISTINCT customer_unique_id) FROM olist_customers_dataset;  -- 96,096

Then compute the repeat rate both ways and watch the second number contradict the first. One says nobody comes back. The other says 2,997 people did, one of them seventeen times. Nothing errors. Nothing warns you. The database hands you both answers with the same confidence, and only one is true.

The part we build (shameless plug)

That 0.00% is not special to retention or to Olist. It is the default failure for any LLM pointed at real business data: a support copilot answering questions about an account, an agent flow that reads a metric and then acts on it, customer-facing analytics, a reporting API. The query runs, the number is wrong, and software downstream believes it.

Contracts are how you put an LLM in front of that data without handing it a blank SQL prompt. The model composes answers from a fixed set of measures and dimensions instead of guessing columns. Row-level security scopes every caller the same way across REST, chat, and MCP. An undeclared field errors instead of lying. It doesn't make every answer right; it helps make the confident-wrong one structurally hard to ship.

If you're building a chatbot or an agent flow over your own business data and you don't want it shipping numbers you can't trust, that is exactly what we're building Gaur for. Book a demo at gaur.run.