<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Gaur Data</title>
    <description>The latest articles on DEV Community by Gaur Data (@gaur_data_c6511af396c4bce).</description>
    <link>https://dev.to/gaur_data_c6511af396c4bce</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4007500%2F509d3655-243a-48ba-b514-76e9b21fcde1.png</url>
      <title>DEV Community: Gaur Data</title>
      <link>https://dev.to/gaur_data_c6511af396c4bce</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/gaur_data_c6511af396c4bce"/>
    <language>en</language>
    <item>
      <title>The customer_id that isn't a customer</title>
      <dc:creator>Gaur Data</dc:creator>
      <pubDate>Mon, 29 Jun 2026 17:40:40 +0000</pubDate>
      <link>https://dev.to/gaur_data_c6511af396c4bce/the-customerid-that-isnt-a-customer-3a00</link>
      <guid>https://dev.to/gaur_data_c6511af396c4bce/the-customerid-that-isnt-a-customer-3a00</guid>
      <description>&lt;p&gt;An error is the safe outcome. The dangerous one is a number.&lt;/p&gt;

&lt;p&gt;When a text-to-SQL feature breaks the way you hope, it throws a missing column and dies on the screen. You see it, you fix it, nobody was misled.&lt;/p&gt;

&lt;p&gt;The failure that should keep you up is the other one. The query runs, returns rows, and hands a real user a number that is wrong, at the moment the stakes are highest. Nothing errored. Just a plausible figure, formatted to two decimals, in front of the person least able to know it's off.&lt;/p&gt;

&lt;p&gt;That gap, between an answer that is wrong and one that &lt;em&gt;looks&lt;/em&gt; wrong, is the whole problem with pointing an LLM at a production database. It never shows up in the demo. The demo is built to avoid it.&lt;/p&gt;

&lt;p&gt;What follows is not a hypothetical. It is a public dataset and a question with a known answer that an LLM gets confidently, catastrophically wrong. The data is the &lt;a href="https://www.kaggle.com/datasets/olistbr/brazilian-ecommerce" rel="noopener noreferrer"&gt;Olist Brazilian e-commerce dataset&lt;/a&gt; on Kaggle. You can reproduce every number here yourself.&lt;/p&gt;

&lt;h2&gt;
  
  
  "Nobody ever comes back"
&lt;/h2&gt;

&lt;p&gt;What is our repeat-purchase rate? How many customers come back and buy again. That number separates a business that compounds from one that re-buys its whole revenue base every quarter.&lt;/p&gt;

&lt;p&gt;You wire an "ask your data" box into your product. Someone types the question in plain English, and an &lt;strong&gt;LLM turns it into SQL and runs it&lt;/strong&gt;. Two tables are in scope:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;olist_customers_dataset          olist_orders_dataset
  customer_id                       order_id        (pk)
  customer_unique_id                customer_id
  customer_zip_code_prefix          order_status
  customer_city                     order_purchase_timestamp
  customer_state                    ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here is the SQL the LLM wrote. No human reviewed it; it ran straight against the database:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="k"&gt;CASE&lt;/span&gt; &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;order_count&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="s1"&gt;'repeat'&lt;/span&gt; &lt;span class="k"&gt;ELSE&lt;/span&gt; &lt;span class="s1"&gt;'one-time'&lt;/span&gt; &lt;span class="k"&gt;END&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;segment&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;order_count&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;olist_orders_dataset&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;
    &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;olist_customers_dataset&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;
    &lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;per_customer&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;segment&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The answer: &lt;strong&gt;one-time 99,441. Repeat 0. Repeat rate 0.00%.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Nobody ever comes back. Not one customer out of ninety-nine thousand has placed a second order.&lt;/p&gt;

&lt;p&gt;That query, and that 0.00%, are the LLM's own output. Read it like a code reviewer: the join is valid, the aggregation is clean, &lt;code&gt;COUNT(DISTINCT o.order_id)&lt;/code&gt; is exactly how you count a customer's orders. By the standard of "is this reasonable SQL," it is reasonable SQL.&lt;/p&gt;

&lt;p&gt;It is also wrong, in the most decision-changing direction a retention number can be. And it's wrong for a reason the schema will never tell you.&lt;/p&gt;

&lt;h2&gt;
  
  
  The column that lies about what it is
&lt;/h2&gt;

&lt;p&gt;Two real values from the customers table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;customer_id&lt;/span&gt;&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="mi"&gt;06&lt;/span&gt;&lt;span class="err"&gt;b&lt;/span&gt;&lt;span class="mi"&gt;8999e2&lt;/span&gt;&lt;span class="err"&gt;fba&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="err"&gt;a&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="err"&gt;fbc&lt;/span&gt;&lt;span class="mi"&gt;88172&lt;/span&gt;&lt;span class="err"&gt;c&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt;&lt;span class="err"&gt;ba&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="err"&gt;bc&lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;customer_unique_id&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="mi"&gt;861&lt;/span&gt;&lt;span class="err"&gt;eff&lt;/span&gt;&lt;span class="mi"&gt;4711&lt;/span&gt;&lt;span class="err"&gt;a&lt;/span&gt;&lt;span class="mi"&gt;542e4&lt;/span&gt;&lt;span class="err"&gt;b&lt;/span&gt;&lt;span class="mi"&gt;93843&lt;/span&gt;&lt;span class="err"&gt;c&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="err"&gt;dd&lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="err"&gt;febb&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two 32-character hex strings, identical in form. Nothing signals which is the person and which is the receipt. If you had to guess which means "customer," you'd flip a coin. So does the LLM.&lt;/p&gt;

&lt;p&gt;In Olist, &lt;code&gt;customer_id&lt;/code&gt; is minted fresh for every order. It is an order key wearing the word "customer" in its name. The actual person, stable across orders, is &lt;code&gt;customer_unique_id&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The customers table has exactly one row per &lt;code&gt;customer_id&lt;/code&gt;: 99,441 rows, 99,441 ids. So when you group orders by &lt;code&gt;customer_id&lt;/code&gt;, every group has exactly one order, by construction. Max orders per &lt;code&gt;customer_id&lt;/code&gt; in the whole dataset: 1.&lt;/p&gt;

&lt;p&gt;That is why the rate came back 0.00%. The query computed the repeat rate of orders pretending to be people, which is always exactly zero.&lt;/p&gt;

&lt;p&gt;Change one token, group by the column that identifies a person, and the data flips:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_unique_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;order_count&lt;/span&gt;
    &lt;span class="p"&gt;...&lt;/span&gt;
    &lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_unique_id&lt;/span&gt;   &lt;span class="c1"&gt;-- the only change: _unique_id&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The truth: &lt;strong&gt;96,096 real customers, 2,997 repeat buyers, a 3.12% repeat rate.&lt;/strong&gt; One customer ordered 17 times.&lt;/p&gt;

&lt;p&gt;Not a great retention number. But the distance between "3.12% come back" and "nobody comes back" is the distance between a business and a fire.&lt;/p&gt;

&lt;p&gt;The simple count breaks too. &lt;code&gt;COUNT(DISTINCT customer_id)&lt;/code&gt; says 99,441 customers; &lt;code&gt;COUNT(DISTINCT customer_unique_id)&lt;/code&gt; says 96,096. The naive version overstates the base by 3,345 people, about 3.5%.&lt;/p&gt;

&lt;p&gt;The count is wrong by a little. The retention answer is wrong by everything.&lt;/p&gt;

&lt;h2&gt;
  
  
  A bigger model is not the fix
&lt;/h2&gt;

&lt;p&gt;The reflex is to reach for a better LLM. So I ran a small test. Not a benchmark, a sanity check.&lt;/p&gt;

&lt;p&gt;Two open-source models, Llama 3.3 70B and Qwen 2.5 Coder 32B, both solid SQL writers, each asked the repeat-rate question several times, with only the raw schema to go on.&lt;/p&gt;

&lt;p&gt;Both did the obvious thing: grouped by &lt;code&gt;customer_id&lt;/code&gt;, reported 0.00% most of the time. The bigger model was not meaningfully safer than the smaller one.&lt;/p&gt;

&lt;p&gt;A frontier model with the same schema and the same blind spot makes the same mistake with more fluency. The variable is not the model. It is what the model is allowed to know, and what it is allowed to do.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why the demo never caught it
&lt;/h2&gt;

&lt;p&gt;A demo tests questions a human already validated by eye. "Orders last month," fine. "Revenue by state," fine. Each one happened to dodge the &lt;code&gt;customer_id&lt;/code&gt; versus &lt;code&gt;customer_unique_id&lt;/code&gt; distinction.&lt;/p&gt;

&lt;p&gt;The retention question was the first to force the choice. By then the only reader was someone who typed a sentence and could not see the SQL.&lt;/p&gt;

&lt;p&gt;The LLM makes a &lt;em&gt;modeling&lt;/em&gt; decision, which column is a customer, fresh on every query, with only the schema to go on. A column named &lt;code&gt;customer_id&lt;/code&gt; does not announce that it is secretly an order key. That fact lives in someone's head, or nowhere.&lt;/p&gt;

&lt;p&gt;The tooling makes it concrete: the interface here is an MCP server in raw-SQL mode, exposing a read-only &lt;code&gt;execute_sql&lt;/code&gt; tool. No layer sits between the model's guess and the database. It runs the bad query and returns 0.00%, because running well-formed read-only SQL is its whole job.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use the simplest abstraction the workload tolerates
&lt;/h2&gt;

&lt;p&gt;None of this makes text-to-SQL a mistake. It makes it a choice with a domain.&lt;/p&gt;

&lt;p&gt;Poking around Olist yourself on a Tuesday? Raw SQL over MCP is great. When the rate comes back 0.00%, you'll notice, because 0% is absurd, and you'll go find the &lt;code&gt;_unique_id&lt;/code&gt; column. The blast radius is one person who can distrust the number.&lt;/p&gt;

&lt;p&gt;Internal versus external is the obvious line, and it's the wrong one. The example above was internal, someone querying their own store, and the number could still walk into an investor update.&lt;/p&gt;

&lt;p&gt;The real axis is who consumes the answer and whether they can sanity-check it. A human exploring squints at 0.00% and smells something wrong. Software does not. A dashboard tile, or an agent that reads the number and acts on it, has no instinct. "Looks plausible" is not a correctness check.&lt;/p&gt;

&lt;p&gt;And 0.00% is the cruelest case: on a retention dashboard it doesn't look implausible. It looks like bad news. Nobody files a bug to report bad news.&lt;/p&gt;

&lt;p&gt;So, the principle worth keeping even if you never touch Gaur: a confident wrong answer is what you get when you bring an exploration-grade abstraction to a consumption-grade workload. The mismatch is the bug, not the query engine.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two layers: one helps, one enforces
&lt;/h2&gt;

&lt;p&gt;This is the problem we work on at Gaur. Gaur (&lt;a href="https://gaur.run" rel="noopener noreferrer"&gt;gaur.run&lt;/a&gt;) is a developer-first backend for data applications: connect your scattered sources, model them, and expose them to your app over REST, an OpenAI-compatible chat endpoint, and MCP, without standing up a warehouse and a data team first.&lt;/p&gt;

&lt;p&gt;It gives you two ways to push back on the wrong answer above. They are not equally strong.&lt;/p&gt;

&lt;h3&gt;
  
  
  The cheap layer: tell the model what the schema can't
&lt;/h3&gt;

&lt;p&gt;The schema failed to carry meaning, so carry it. In Gaur, every table, model, and contract holds author-written context that gets embedded and surfaced to chat and MCP. For the customers table, it says the one thing the column names hide:&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Field descriptions&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; customer_id — A per-ORDER id, minted fresh for every order. NOT a person.
&lt;span class="p"&gt;-&lt;/span&gt; customer_unique_id — The actual person. Stable across orders.

&lt;span class="gu"&gt;## Rules&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; For any retention / repeat / per-customer metric, aggregate by
  customer_unique_id. Grouping by customer_id makes every customer look
  new, so the repeat rate computes as 0%. The true rate is 3.12%.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I re-ran the same test with that context in front of both models. The improvement was real: both reached for &lt;code&gt;customer_unique_id&lt;/code&gt; much more often, and returned 3.12% where they had returned 0.00%.&lt;/p&gt;

&lt;p&gt;But be precise, because "context fixes it" is the overclaim this essay argues against. Context helps. It does not enforce. The model still chooses; you've just made the right choice the obvious one. On the runs where it still picked &lt;code&gt;customer_id&lt;/code&gt;, the answer was still 0.00%, still confident, still wrong.&lt;/p&gt;

&lt;h3&gt;
  
  
  The enforcing layer: take the choice away
&lt;/h3&gt;

&lt;p&gt;The only way to make 0.00% unreachable is to stop exposing the column that produces it. That is what a contract does.&lt;/p&gt;

&lt;p&gt;First, a model collapses orders to one row per real person:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Model: olist_customer_orders. Grain: one row per customer_unique_id.&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_unique_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;order_count&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;olist_orders_dataset&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;olist_customers_dataset&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_unique_id&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then a contract over that model exposes only declared measures and dimensions, all defined on &lt;code&gt;customer_unique_id&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"customer_retention"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"sources"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"source_type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"olist_customer_orders"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"alias"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"co"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"measures"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"customers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"sql"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"count(distinct co.customer_unique_id)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"number"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"repeat_customers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"sql"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sum(case when co.order_count &amp;gt; 1 then 1 else 0 end)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"number"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"repeat_rate"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"behavior"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"derived"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"numerator"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"repeat_customers"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"denominator"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"customers"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The point: &lt;strong&gt;&lt;code&gt;customer_id&lt;/code&gt; is not in this contract at all.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A consumer calling it, over REST, chat, or MCP, picks from declared measures and dimensions. It never names a column, never writes a &lt;code&gt;GROUP BY&lt;/code&gt;, never sees that &lt;code&gt;customer_id&lt;/code&gt; exists. It cannot reproduce 0.00%, because the building block is not on the shelf. The contract returns customers = 96,096, repeat_customers = 2,997, repeat_rate = 0.0312.&lt;/p&gt;

&lt;p&gt;So I ran the test a third time. I published this contract to a contract-only consumer and pointed the same two models at it. Both returned 3.12% every time. There was no wrong column left to reach for.&lt;/p&gt;

&lt;p&gt;One model still hallucinated, but it named a field the contract didn't define, and the contract rejected it. That is the behavior you want: the failure that errors, not the failure that lies.&lt;/p&gt;

&lt;p&gt;Two honest caveats. The SQL-only demo from the earlier story doesn't expose contracts at all; that consumer's job is raw SQL, which is exactly why the model could reach the wrong column. Whether a caller can write SQL is a property of the consumer's query mode. Same platform, two postures.&lt;/p&gt;

&lt;p&gt;And I won't claim contracts make wrong answers impossible. A measure that aggregates across the wrong join can still publish; you catch it by testing before you ship. What a contract changes is &lt;em&gt;where the decision lives&lt;/em&gt;: the hard call, which column is a person, gets made once by someone who can reason about the schema, instead of re-guessed on every question by a model holding two indistinguishable hex strings.&lt;/p&gt;

&lt;h2&gt;
  
  
  Check it yourself
&lt;/h2&gt;

&lt;p&gt;Don't take my word for it. Olist is a &lt;a href="https://www.kaggle.com/datasets/olistbr/brazilian-ecommerce" rel="noopener noreferrer"&gt;free download&lt;/a&gt;. Run the count both ways:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;        &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;olist_customers_dataset&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;-- 99,441&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;customer_unique_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;olist_customers_dataset&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;-- 96,096&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then compute the repeat rate both ways and watch the second number contradict the first. One says nobody comes back. The other says 2,997 people did, one of them seventeen times. Nothing errors. Nothing warns you. The database hands you both answers with the same confidence, and only one is true.&lt;/p&gt;

&lt;h2&gt;
  
  
  The part we build (shameless plug)
&lt;/h2&gt;

&lt;p&gt;That 0.00% is not special to retention or to Olist. It is the default failure for any LLM pointed at real business data: a support copilot answering questions about an account, an agent flow that reads a metric and then acts on it, customer-facing analytics, a reporting API. The query runs, the number is wrong, and software downstream believes it.&lt;/p&gt;

&lt;p&gt;Contracts are how you put an LLM in front of that data without handing it a blank SQL prompt. The model composes answers from a fixed set of measures and dimensions instead of guessing columns. Row-level security scopes every caller the same way across REST, chat, and MCP. An undeclared field errors instead of lying. It doesn't make every answer right; it helps make the confident-wrong one structurally hard to ship.&lt;/p&gt;

&lt;p&gt;If you're building a chatbot or an agent flow over your own business data and you don't want it shipping numbers you can't trust, that is exactly what we're building Gaur for. &lt;strong&gt;Book a demo at &lt;a href="https://gaur.run" rel="noopener noreferrer"&gt;gaur.run&lt;/a&gt;.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>analytics</category>
      <category>dataengineering</category>
      <category>sql</category>
    </item>
  </channel>
</rss>
