<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: ArisynData</title>
    <description>The latest articles on DEV Community by ArisynData (@arisyndata).</description>
    <link>https://dev.to/arisyndata</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3640833%2F877a0e7c-a59f-4ed3-baac-6bf99ad9c964.jpg</url>
      <title>DEV Community: ArisynData</title>
      <link>https://dev.to/arisyndata</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/arisyndata"/>
    <language>en</language>
    <item>
      <title>Why Text-to-SQL Breaks When the Join Path Is Not Obvious</title>
      <dc:creator>ArisynData</dc:creator>
      <pubDate>Fri, 26 Jun 2026 13:05:00 +0000</pubDate>
      <link>https://dev.to/arisyndata/why-text-to-sql-breaks-when-the-join-path-is-not-obvious-3bk0</link>
      <guid>https://dev.to/arisyndata/why-text-to-sql-breaks-when-the-join-path-is-not-obvious-3bk0</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fe2nz0mzcca3zs9l6yx2n.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fe2nz0mzcca3zs9l6yx2n.jpg" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Most Text-to-SQL examples are too clean.&lt;/p&gt;

&lt;p&gt;They usually assume a simple schema, obvious table names, clear foreign keys, and a question that maps neatly to one or two tables. In that environment, generating SQL from natural language looks impressive.&lt;/p&gt;

&lt;p&gt;Enterprise databases are not like that.&lt;/p&gt;

&lt;p&gt;In real analytics work, the hard part is often not the SELECT clause. It is the join path.&lt;/p&gt;

&lt;h2&gt;
  
  
  The SQL Can Be Valid and Still Wrong
&lt;/h2&gt;

&lt;p&gt;Imagine a user asks:&lt;/p&gt;

&lt;p&gt;“Show revenue by customer for the last quarter.”&lt;/p&gt;

&lt;p&gt;A model may generate something like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;revenue&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_revenue&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;
  &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;order_date&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="s1"&gt;'2026-01-01'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;order_date&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="s1"&gt;'2026-04-01'&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_name&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Technically, this looks fine.&lt;/p&gt;

&lt;p&gt;But in an enterprise environment, several things may be wrong:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;customers&lt;/code&gt; may not be the approved customer master.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;orders.revenue&lt;/code&gt; may not be the finance-approved revenue field.&lt;/li&gt;
&lt;li&gt;Customer records may be duplicated across regions.&lt;/li&gt;
&lt;li&gt;Some orders may need to be excluded because they were adjusted later.&lt;/li&gt;
&lt;li&gt;The join may create duplication if there are multiple customer records per account.&lt;/li&gt;
&lt;li&gt;Last quarter may follow fiscal, not calendar, logic.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The database accepts the query.&lt;/p&gt;

&lt;p&gt;The dashboard renders.&lt;/p&gt;

&lt;p&gt;The answer is wrong.&lt;/p&gt;

&lt;p&gt;That is the uncomfortable part of enterprise Text-to-SQL.&lt;/p&gt;

&lt;p&gt;A syntactically valid query is not the same as a trusted query.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Join Paths Are Hard
&lt;/h2&gt;

&lt;p&gt;Join paths are obvious only when the data model is clean.&lt;/p&gt;

&lt;p&gt;In production systems, they are usually messy.&lt;/p&gt;

&lt;p&gt;You may have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Missing foreign keys&lt;/li&gt;
&lt;li&gt;Legacy tables&lt;/li&gt;
&lt;li&gt;Similar columns with different meanings&lt;/li&gt;
&lt;li&gt;One-to-many relationships that create fanout&lt;/li&gt;
&lt;li&gt;Historical snapshots&lt;/li&gt;
&lt;li&gt;Slowly changing dimensions&lt;/li&gt;
&lt;li&gt;Department-specific marts&lt;/li&gt;
&lt;li&gt;Fields reused for different purposes&lt;/li&gt;
&lt;li&gt;Business rules that exist only in old SQL reports&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Even experienced engineers often need time to inspect the schema, check existing reports, ask someone in finance, and run sample queries before trusting a join.&lt;/p&gt;

&lt;p&gt;Now ask an AI model to do the same thing with only table names and column names.&lt;/p&gt;

&lt;p&gt;It will guess.&lt;/p&gt;

&lt;p&gt;Sometimes the guess will be right.&lt;/p&gt;

&lt;p&gt;Sometimes it will be dangerously plausible.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Fanout Problem
&lt;/h2&gt;

&lt;p&gt;One of the most common issues is fanout.&lt;/p&gt;

&lt;p&gt;Suppose you join orders to order_lines and then to shipments.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;order_amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;revenue&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;order_lines&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;
  &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;shipments&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;
  &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;line_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;line_id&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If one order line can have multiple shipments, order revenue may be counted multiple times.&lt;/p&gt;

&lt;p&gt;The SQL is valid.&lt;/p&gt;

&lt;p&gt;The join is valid.&lt;/p&gt;

&lt;p&gt;The result is not valid for revenue reporting.&lt;/p&gt;

&lt;p&gt;A human analyst may know to aggregate at the order level first, or use a shipment-adjusted revenue table, or avoid this path altogether.&lt;/p&gt;

&lt;p&gt;A model needs that knowledge in context.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Metadata Alone Is Not Enough
&lt;/h2&gt;

&lt;p&gt;Metadata helps, but it does not solve the full problem.&lt;/p&gt;

&lt;p&gt;Column names can tell you that &lt;code&gt;customer_id&lt;/code&gt; relates to customers.&lt;/p&gt;

&lt;p&gt;They do not tell you whether this is the right customer relationship for financial reporting.&lt;/p&gt;

&lt;p&gt;Foreign keys can tell you that a relationship exists.&lt;/p&gt;

&lt;p&gt;They do not tell you whether the relationship is safe for aggregation.&lt;/p&gt;

&lt;p&gt;Descriptions can tell you what a table contains.&lt;/p&gt;

&lt;p&gt;They do not always explain historical exceptions.&lt;/p&gt;

&lt;p&gt;That is why enterprise Text-to-SQL needs more than schemas.&lt;/p&gt;

&lt;p&gt;It needs relationship context.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Relationship Context Should Include
&lt;/h2&gt;

&lt;p&gt;At minimum, an AI query system should know:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Candidate join paths between tables&lt;/li&gt;
&lt;li&gt;Approved join paths for common business questions&lt;/li&gt;
&lt;li&gt;Relationship cardinality&lt;/li&gt;
&lt;li&gt;Known fanout risks&lt;/li&gt;
&lt;li&gt;Join confidence&lt;/li&gt;
&lt;li&gt;Source of relationship evidence&lt;/li&gt;
&lt;li&gt;Whether the relationship came from constraints, SQL history, naming patterns, dbt models, BI datasets, or human approval&lt;/li&gt;
&lt;li&gt;Which paths are rejected or deprecated&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This changes the behavior of the system.&lt;/p&gt;

&lt;p&gt;Instead of simply generating SQL, it can reason about query safety.&lt;/p&gt;

&lt;p&gt;It can choose the trusted path.&lt;/p&gt;

&lt;p&gt;It can warn when no approved path exists.&lt;/p&gt;

&lt;p&gt;It can ask for clarification when multiple paths are possible.&lt;/p&gt;

&lt;p&gt;That is far more useful than blindly producing a query.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Better Pattern
&lt;/h2&gt;

&lt;p&gt;A more reliable Text-to-SQL system should work like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Parse the user question.&lt;/li&gt;
&lt;li&gt;Identify business entities and metrics.&lt;/li&gt;
&lt;li&gt;Resolve semantic definitions.&lt;/li&gt;
&lt;li&gt;Retrieve candidate tables.&lt;/li&gt;
&lt;li&gt;Retrieve trusted relationship paths.&lt;/li&gt;
&lt;li&gt;Check join risks.&lt;/li&gt;
&lt;li&gt;Generate SQL using approved paths.&lt;/li&gt;
&lt;li&gt;Validate the SQL against semantic and relationship rules.&lt;/li&gt;
&lt;li&gt;Explain the assumptions behind the result.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The key difference is step 5 and step 6.&lt;/p&gt;

&lt;p&gt;Many systems jump from semantic mapping directly to SQL generation.&lt;/p&gt;

&lt;p&gt;That is where errors enter.&lt;/p&gt;

&lt;p&gt;The missing layer is relationship intelligence.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thought
&lt;/h2&gt;

&lt;p&gt;Text-to-SQL is not just a language translation problem.&lt;/p&gt;

&lt;p&gt;It is a context problem.&lt;/p&gt;

&lt;p&gt;Models are getting better at writing SQL.&lt;/p&gt;

&lt;p&gt;But enterprise analytics requires more than syntactically correct SQL.&lt;/p&gt;

&lt;p&gt;It requires knowing which joins are safe, which paths are trusted, and which assumptions should be checked before the query runs.&lt;/p&gt;

&lt;p&gt;Until that context exists, Text-to-SQL will continue to work well in demos and struggle in real companies.&lt;/p&gt;

</description>
      <category>texttosql</category>
      <category>ai</category>
      <category>dataengineering</category>
      <category>datagovernance</category>
    </item>
    <item>
      <title>Why AI Analytics Has a Knowledge Problem</title>
      <dc:creator>ArisynData</dc:creator>
      <pubDate>Wed, 24 Jun 2026 13:38:00 +0000</pubDate>
      <link>https://dev.to/arisyndata/why-ai-analytics-has-a-knowledge-problem-542h</link>
      <guid>https://dev.to/arisyndata/why-ai-analytics-has-a-knowledge-problem-542h</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fgtsrgtgeubz9htlla4uf.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fgtsrgtgeubz9htlla4uf.jpg" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;One thing I’ve noticed while working with enterprise analytics systems:&lt;/p&gt;

&lt;p&gt;The hardest problems are rarely technical.&lt;/p&gt;

&lt;p&gt;Most modern models can generate SQL.&lt;/p&gt;

&lt;p&gt;Most warehouses are well documented.&lt;/p&gt;

&lt;p&gt;Most organizations have catalogs and governance programs.&lt;/p&gt;

&lt;p&gt;Yet teams still depend heavily on a handful of experienced engineers.&lt;/p&gt;

&lt;p&gt;Why?&lt;/p&gt;

&lt;p&gt;Because analytics depends on knowledge, not just data.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;p&gt;A schema may tell you that three customer tables exist.&lt;/p&gt;

&lt;p&gt;It doesn’t tell you:&lt;/p&gt;

&lt;p&gt;· which one is authoritative&lt;/p&gt;

&lt;p&gt;· which one is historical&lt;/p&gt;

&lt;p&gt;· which one executive reporting relies on&lt;/p&gt;

&lt;p&gt;Experienced engineers know the difference.&lt;/p&gt;

&lt;p&gt;AI doesn’t.&lt;/p&gt;

&lt;p&gt;That’s why many enterprise analytics failures aren’t caused by bad SQL generation.&lt;/p&gt;

&lt;p&gt;They’re caused by missing organizational knowledge.&lt;/p&gt;

&lt;p&gt;As AI adoption accelerates, I think we’re going to spend less time talking about prompts and more time talking about knowledge infrastructure.&lt;/p&gt;

&lt;p&gt;Because models can’t use knowledge that organizations never captured.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>sql</category>
      <category>data</category>
    </item>
    <item>
      <title>Why AI Keeps Generating Bad SQL Even When The Schema Is Correct</title>
      <dc:creator>ArisynData</dc:creator>
      <pubDate>Mon, 22 Jun 2026 13:32:00 +0000</pubDate>
      <link>https://dev.to/arisyndata/why-ai-keeps-generating-bad-sql-even-when-the-schema-is-correct-26fm</link>
      <guid>https://dev.to/arisyndata/why-ai-keeps-generating-bad-sql-even-when-the-schema-is-correct-26fm</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fopjqfojati03dceh5f2s.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fopjqfojati03dceh5f2s.jpg" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;One thing I've noticed while testing Text-to-SQL systems:&lt;br&gt;
The schema is often fine.&lt;br&gt;
The model is often fine.&lt;br&gt;
The SQL is often syntactically correct.&lt;br&gt;
The answer is still wrong.&lt;br&gt;
Why?&lt;br&gt;
Because SQL generation isn't the hard part.&lt;br&gt;
Join selection is.&lt;br&gt;
Imagine a warehouse containing:&lt;br&gt;
orders&lt;br&gt;
customers&lt;br&gt;
subscriptions&lt;br&gt;
invoices&lt;br&gt;
accounts&lt;br&gt;
A model may know all five tables exist.&lt;br&gt;
The challenge is deciding:&lt;br&gt;
Which relationship path should be used?&lt;br&gt;
That's where many systems fail.&lt;br&gt;
Most Text-to-SQL architectures focus on:&lt;br&gt;
Schema → Prompt → SQL&lt;br&gt;
But production environments usually require:&lt;br&gt;
Schema&lt;br&gt;
↓&lt;br&gt;
Relationship Discovery&lt;br&gt;
↓&lt;br&gt;
Trusted Join Path&lt;br&gt;
↓&lt;br&gt;
Prompt&lt;br&gt;
↓&lt;br&gt;
SQL&lt;br&gt;
Without relationship context, the model is forced to guess.&lt;br&gt;
And enterprise analytics is a terrible place for guessing.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>data</category>
      <category>sql</category>
    </item>
    <item>
      <title>Why AI Analytics Is Becoming a Data Infrastructure Problem</title>
      <dc:creator>ArisynData</dc:creator>
      <pubDate>Wed, 17 Jun 2026 13:44:00 +0000</pubDate>
      <link>https://dev.to/arisyndata/why-ai-analytics-is-becoming-a-data-infrastructure-problem-1366</link>
      <guid>https://dev.to/arisyndata/why-ai-analytics-is-becoming-a-data-infrastructure-problem-1366</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp6uufr15mrwfva2tqlnx.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp6uufr15mrwfva2tqlnx.jpg" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Most AI analytics discussions focus on models.&lt;/p&gt;

&lt;p&gt;In practice, many failures come from the surrounding data environment.&lt;/p&gt;

&lt;p&gt;A model can understand natural language and generate SQL.&lt;/p&gt;

&lt;p&gt;That doesn't mean it understands your business.&lt;/p&gt;

&lt;p&gt;Enterprise environments contain:&lt;/p&gt;

&lt;p&gt;disconnected systems&lt;br&gt;
undocumented relationships&lt;br&gt;
conflicting metric definitions&lt;br&gt;
hidden business logic&lt;br&gt;
What AI actually needs is context.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Business Question
↓
AI Agent
↓
Semantic Definitions
↓
Metadata
↓
Lineage
↓
Relationship Discovery
↓
Enterprise Data

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The question is no longer:&lt;/p&gt;

&lt;p&gt;"Can the model generate SQL?"&lt;/p&gt;

&lt;p&gt;It's:&lt;/p&gt;

&lt;p&gt;"Can the model understand the environment behind the SQL?"&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Why Text-to-SQL Needs Join Path Context, Not Just Schema</title>
      <dc:creator>ArisynData</dc:creator>
      <pubDate>Mon, 15 Jun 2026 13:25:00 +0000</pubDate>
      <link>https://dev.to/arisyndata/why-text-to-sql-needs-join-path-context-not-just-schema-27mk</link>
      <guid>https://dev.to/arisyndata/why-text-to-sql-needs-join-path-context-not-just-schema-27mk</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6nwixijicssmxrvxm94c.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6nwixijicssmxrvxm94c.jpg" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A lot of Text-to-SQL examples start with something like this:&lt;/p&gt;

&lt;p&gt;Give the model the schema.&lt;br&gt;
Ask the user for a question.&lt;br&gt;
Generate SQL.&lt;br&gt;
Run it.&lt;/p&gt;

&lt;p&gt;That can work for demos.&lt;/p&gt;

&lt;p&gt;It gets much harder in real enterprise data.&lt;/p&gt;

&lt;p&gt;The problem is not always SQL syntax. The model may generate perfectly valid SQL. The query may run. The output may even look reasonable.&lt;/p&gt;

&lt;p&gt;The issue is whether the query used the right path through the data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Schema does not tell the whole story&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A schema gives you tables, columns, types, and sometimes constraints.&lt;/p&gt;

&lt;p&gt;But it usually does not tell you enough about analytical intent.&lt;/p&gt;

&lt;p&gt;For example, imagine a question like:&lt;/p&gt;

&lt;p&gt;Show monthly revenue by customer segment.&lt;br&gt;
A model may see tables like:&lt;/p&gt;

&lt;p&gt;1）customers&lt;br&gt;
2）orders&lt;br&gt;
3）order_items&lt;br&gt;
4）payments&lt;br&gt;
5）invoices&lt;br&gt;
6）customer_segments&lt;br&gt;
It can probably generate a join.&lt;/p&gt;

&lt;p&gt;But which one?&lt;/p&gt;

&lt;p&gt;Should revenue come from orders, payments, invoices, or finance postings?&lt;/p&gt;

&lt;p&gt;Should customer segment be current segment or segment at the time of order?&lt;/p&gt;

&lt;p&gt;Should the query join customer directly to orders, or through account, membership, or billing entity?&lt;/p&gt;

&lt;p&gt;Will joining order items duplicate revenue?&lt;/p&gt;

&lt;p&gt;These are not syntax questions.&lt;/p&gt;

&lt;p&gt;They are relationship and grain questions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Valid SQL is not the same as correct SQL&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is where enterprise Text-to-SQL gets dangerous.&lt;/p&gt;

&lt;p&gt;A bad query does not always fail.&lt;/p&gt;

&lt;p&gt;Sometimes it returns a clean table and a nice chart.&lt;/p&gt;

&lt;p&gt;The model may choose a join path that looks obvious from column names but is not the business-approved path. Or it may join two fact tables directly and inflate the result. Or it may use a dimension table that is current-state when the metric needs historical context.&lt;/p&gt;

&lt;p&gt;No exception is thrown.&lt;/p&gt;

&lt;p&gt;The number is just wrong.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What join path context should include&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A more reliable Text-to-SQL system needs relationship context before SQL generation.&lt;/p&gt;

&lt;p&gt;At minimum, join path context should include:&lt;/p&gt;

&lt;p&gt;candidate tables for the question&lt;br&gt;
approved table relationships&lt;br&gt;
join keys&lt;br&gt;
cardinality&lt;br&gt;
table grain&lt;br&gt;
preferred paths&lt;br&gt;
risky paths&lt;br&gt;
duplication risks&lt;br&gt;
validation checks&lt;br&gt;
This turns table relationships into something the model can actually use.&lt;/p&gt;

&lt;p&gt;Instead of asking the model to infer everything from names, the system gives it a query plan.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A better workflow&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A schema-only workflow looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User question
→ schema prompt
→ SQL generation
→ execution
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A relationship-aware workflow looks more like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User question
→ intent detection
→ semantic matching
→ candidate table selection
→ trusted join path retrieval
→ query plan
→ SQL generation
→ validation
→ answer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That extra planning step is not just overhead.&lt;/p&gt;

&lt;p&gt;It is what prevents the system from producing plausible but wrong SQL.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The analyst version of this&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Good analysts already think this way.&lt;/p&gt;

&lt;p&gt;Before writing a join, they ask:&lt;/p&gt;

&lt;p&gt;What is the grain of this table?&lt;br&gt;
Is this the source of truth?&lt;br&gt;
Will this join duplicate the metric?&lt;br&gt;
Is this relationship current-state or historical?&lt;br&gt;
Does this path match the business definition?&lt;br&gt;
Text-to-SQL systems need to capture more of that reasoning.&lt;/p&gt;

&lt;p&gt;Not as a long prompt full of vague instructions, but as structured context the system can retrieve and apply.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Final thought&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Text-to-SQL is not just a translation problem.&lt;/p&gt;

&lt;p&gt;It is a data modeling problem, a governance problem, and a relationship-context problem.&lt;/p&gt;

&lt;p&gt;Models are getting better at writing SQL.&lt;/p&gt;

&lt;p&gt;But if the system does not know how the underlying data connects, better SQL generation just gives us wrong answers faster.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>sql</category>
      <category>dataengineering</category>
      <category>texttosql</category>
    </item>
    <item>
      <title>Why Text-to-SQL Needs Table Relationship Discovery Before SQL Generation</title>
      <dc:creator>ArisynData</dc:creator>
      <pubDate>Thu, 11 Jun 2026 16:12:00 +0000</pubDate>
      <link>https://dev.to/arisyndata/why-text-to-sql-needs-table-relationship-discovery-before-sql-generation-2ilf</link>
      <guid>https://dev.to/arisyndata/why-text-to-sql-needs-table-relationship-discovery-before-sql-generation-2ilf</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F05ljczrkg3feg7ei4zku.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F05ljczrkg3feg7ei4zku.jpg" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Most Text-to-SQL discussions start with language.&lt;/p&gt;

&lt;p&gt;Can the model understand the question?&lt;br&gt;
Can it generate valid SQL?&lt;br&gt;
Can it explain the result?&lt;/p&gt;

&lt;p&gt;Those are important.&lt;/p&gt;

&lt;p&gt;But in enterprise systems, the harder problem is often not language.&lt;/p&gt;

&lt;p&gt;It is table relationships.&lt;/p&gt;

&lt;p&gt;A model may understand the user’s question and still choose the wrong tables, the wrong join keys, or the wrong join path.&lt;/p&gt;

&lt;p&gt;The SQL may run.&lt;/p&gt;

&lt;p&gt;The answer may look reasonable.&lt;/p&gt;

&lt;p&gt;The result may still be wrong.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A simple example&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Suppose a user asks:&lt;/p&gt;

&lt;p&gt;What was revenue by customer segment last quarter?&lt;br&gt;
A model may retrieve tables that look relevant:&lt;/p&gt;

&lt;p&gt;customers&lt;br&gt;
customer_profiles&lt;br&gt;
customer_segments&lt;br&gt;
orders&lt;br&gt;
contracts&lt;br&gt;
invoices&lt;br&gt;
invoice_items&lt;br&gt;
revenue_snapshot&lt;/p&gt;

&lt;p&gt;Now the difficult part begins.&lt;/p&gt;

&lt;p&gt;Which customer table is authoritative?&lt;/p&gt;

&lt;p&gt;Should revenue come from invoices, contracts, orders, or a finance snapshot?&lt;/p&gt;

&lt;p&gt;Should customer segment be current or historical?&lt;/p&gt;

&lt;p&gt;How should invoice data connect to customer segment?&lt;/p&gt;

&lt;p&gt;Should the join go directly from invoices to customers, or through contracts, accounts, or a bridge table?&lt;/p&gt;

&lt;p&gt;These are not just semantic questions.&lt;/p&gt;

&lt;p&gt;They are relationship questions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Schema retrieval is not enough&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A common Text-to-SQL flow looks like this:&lt;/p&gt;

&lt;p&gt;User question&lt;br&gt;
→ retrieve relevant schemas&lt;br&gt;
→ send schemas to LLM&lt;br&gt;
→ generate SQL&lt;br&gt;
→ execute query&lt;br&gt;
This can work for simple demos.&lt;/p&gt;

&lt;p&gt;It breaks down when the schema is large, old, inconsistent, or distributed across systems.&lt;/p&gt;

&lt;p&gt;Schema retrieval can tell the model that a column exists.&lt;/p&gt;

&lt;p&gt;It does not always tell the model whether that column should be used.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;p&gt;customer.customer_id&lt;br&gt;
crm_account.account_id&lt;br&gt;
billing_customer.customer_ref&lt;br&gt;
customer_snapshot.customer_id&lt;br&gt;
customer_profile.customer_id&lt;br&gt;
All of these may look relevant.&lt;/p&gt;

&lt;p&gt;Only one may be right for the question.&lt;/p&gt;

&lt;p&gt;Or the correct answer may require more than one of them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Foreign keys are helpful, but incomplete&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Declared foreign keys are useful when they exist.&lt;/p&gt;

&lt;p&gt;But many enterprise systems do not fully declare relationships in the database.&lt;/p&gt;

&lt;p&gt;Some relationships are enforced in application code. Some come from ETL logic. Some exist across systems. Some live only in BI models or analyst knowledge.&lt;/p&gt;

&lt;p&gt;Even when foreign keys exist, they may not define the right analytical path.&lt;/p&gt;

&lt;p&gt;A join can be technically valid and analytically wrong.&lt;/p&gt;

&lt;p&gt;That is why Text-to-SQL systems need a relationship-aware layer before SQL generation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What relationship discovery should provide&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Before generating SQL, the system should be able to build context like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  "question": "revenue by customer segment last quarter",
  "candidate_tables": [
    "invoices",
    "contracts",
    "customers",
    "customer_segments"
  ],
  "preferred_join_path": [
    "invoices.contract_id -&amp;gt; contracts.contract_id",
    "contracts.customer_id -&amp;gt; customers.customer_id",
    "customers.segment_id -&amp;gt; customer_segments.segment_id"
  ],
  "grain_warning": "invoice detail level may duplicate revenue unless aggregated before joining",
  "time_logic": "use historical customer segment for period-based reporting",
  "confidence": "high"
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This kind of context gives the LLM a much better foundation.&lt;/p&gt;

&lt;p&gt;Instead of guessing from names, the model receives evidence about how the data connects.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A better Text-to-SQL workflow&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A relationship-aware workflow looks more like this:&lt;/p&gt;

&lt;p&gt;User question&lt;br&gt;
→ intent understanding&lt;br&gt;
→ semantic mapping&lt;br&gt;
→ metadata retrieval&lt;br&gt;
→ table relationship discovery&lt;br&gt;
→ join path selection&lt;br&gt;
→ SQL generation&lt;br&gt;
→ SQL validation&lt;br&gt;
→ execution&lt;br&gt;
→ explanation&lt;br&gt;
The relationship discovery step should answer:&lt;/p&gt;

&lt;p&gt;Which tables are relevant?&lt;br&gt;
Which joins are possible?&lt;br&gt;
Which join path is preferred?&lt;br&gt;
What is the relationship strength?&lt;br&gt;
Could the join duplicate rows?&lt;br&gt;
Is there a bridge table?&lt;br&gt;
Are there multiple valid paths?&lt;br&gt;
Should the system ask a clarification question?&lt;br&gt;
This step reduces the chance that the model produces SQL that is syntactically correct but structurally wrong.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What can go wrong without relationship context?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Here are common failure modes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Duplicate rows&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A one-to-many join multiplies records because the model does not understand data grain.&lt;/p&gt;

&lt;p&gt;SELECT&lt;br&gt;
  c.segment,&lt;br&gt;
  SUM(i.amount) AS revenue&lt;br&gt;
FROM invoices i&lt;br&gt;
JOIN customers c&lt;br&gt;
  ON i.customer_id = c.customer_id&lt;br&gt;
JOIN customer_contacts cc&lt;br&gt;
  ON c.customer_id = cc.customer_id&lt;br&gt;
GROUP BY c.segment;&lt;br&gt;
If each customer has multiple contacts, revenue may be over-counted.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Missing records&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;An inner join removes valid records because not all entities have a matching dimension row.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Wrong table&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The model chooses customer_profile when the approved reporting path requires customer_snapshot.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Wrong time logic&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The model uses the customer’s current segment instead of the segment at the time of transaction.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;False confidence&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The worst case: the query runs, the chart looks good, and nobody notices the join path was wrong.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Relationship discovery is not just for AI&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is useful beyond Text-to-SQL.&lt;/p&gt;

&lt;p&gt;Table relationship discovery also helps with:&lt;br&gt;
1）data warehouse modeling&lt;br&gt;
2）ETL and ELT design&lt;br&gt;
3）BI dataset preparation&lt;br&gt;
4）migration planning&lt;br&gt;
5）impact analysis&lt;br&gt;
6）data quality checks&lt;br&gt;
7）metadata documentation&lt;br&gt;
8）lineage and governance&lt;br&gt;
In many companies, relationship knowledge is scattered across SQL scripts, notebooks, dashboards, and senior analysts’ memory.&lt;/p&gt;

&lt;p&gt;AI systems need that knowledge to become structured and reusable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Final thought&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Text-to-SQL is not only a translation problem.&lt;/p&gt;

&lt;p&gt;It is a data context problem.&lt;/p&gt;

&lt;p&gt;A model needs to know what the user means.&lt;/p&gt;

&lt;p&gt;It also needs to know how the data connects.&lt;/p&gt;

&lt;p&gt;That is why table relationship discovery should happen before SQL generation, not after the query fails.&lt;/p&gt;

&lt;p&gt;Better prompts help.&lt;/p&gt;

&lt;p&gt;Better relationship context matters more.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>sql</category>
      <category>data</category>
    </item>
    <item>
      <title>Designing Relationship Context for Text-to-SQL Systems</title>
      <dc:creator>ArisynData</dc:creator>
      <pubDate>Mon, 08 Jun 2026 15:00:00 +0000</pubDate>
      <link>https://dev.to/arisyndata/designing-relationship-context-for-text-to-sql-systems-d4b</link>
      <guid>https://dev.to/arisyndata/designing-relationship-context-for-text-to-sql-systems-d4b</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwza3bkogevahzzk120po.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwza3bkogevahzzk120po.jpg" alt=" " width="799" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A common Text-to-SQL architecture looks like this:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;user question -&amp;gt; schema context -&amp;gt; prompt -&amp;gt; SQL&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This works well enough for demos.&lt;/p&gt;

&lt;p&gt;It is not enough for enterprise databases.&lt;/p&gt;

&lt;p&gt;In real production environments, the missing piece is often relationship&lt;br&gt;
context: structured information about how tables and fields connect,&lt;br&gt;
which join paths are trusted, and where ambiguity exists.&lt;/p&gt;

&lt;p&gt;A semantic layer can define what a business term means.&lt;/p&gt;

&lt;p&gt;But the SQL generator still needs to know how the physical data objects&lt;br&gt;
connect.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem with schema-only context
&lt;/h2&gt;

&lt;p&gt;Many Text-to-SQL systems start by passing table names, column names, and&lt;br&gt;
a user question into a model.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;colgroup&gt;
&lt;col&gt;
&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;
&lt;p&gt;Question:&lt;/p&gt;
&lt;p&gt;Which customers have not placed an order in the last 90 days?&lt;/p&gt;
&lt;p&gt;Schema:&lt;/p&gt;
&lt;p&gt;customers(id, name, email, signup_date)&lt;/p&gt;
&lt;p&gt;orders(id, customer_id, order_date, status, total_amount)&lt;/p&gt;
&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For a simple schema, this is fine.&lt;/p&gt;

&lt;p&gt;The relationship is obvious:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;customers.id = orders.customer_id&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;But enterprise schemas are rarely that simple.&lt;/p&gt;

&lt;p&gt;You may have:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;colgroup&gt;
&lt;col&gt;
&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;
&lt;p&gt;crm_account&lt;/p&gt;
&lt;p&gt;customer_profile&lt;/p&gt;
&lt;p&gt;customer_profile_v2&lt;/p&gt;
&lt;p&gt;sales_contract&lt;/p&gt;
&lt;p&gt;invoice_header&lt;/p&gt;
&lt;p&gt;invoice_line&lt;/p&gt;
&lt;p&gt;payment_record&lt;/p&gt;
&lt;p&gt;finance_revenue_snapshot&lt;/p&gt;
&lt;p&gt;region_mapping&lt;/p&gt;
&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Some tables are authoritative.&lt;/p&gt;

&lt;p&gt;Some are historical.&lt;/p&gt;

&lt;p&gt;Some are replicated.&lt;/p&gt;

&lt;p&gt;Some are partially deprecated.&lt;/p&gt;

&lt;p&gt;Some relationships are declared as foreign keys.&lt;/p&gt;

&lt;p&gt;Many are not.&lt;/p&gt;

&lt;p&gt;A model can still generate SQL from this environment, but without the&lt;br&gt;
right relationship context, it may generate SQL that looks correct and&lt;br&gt;
returns the wrong answer.&lt;/p&gt;

&lt;p&gt;That is worse than an obvious error.&lt;/p&gt;

&lt;h2&gt;
  
  
  Semantic context is different from relationship context
&lt;/h2&gt;

&lt;p&gt;It is useful to separate two kinds of context.&lt;/p&gt;

&lt;p&gt;Semantic context answers:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;What does this business term mean?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Relationship context answers:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;How do the relevant data objects connect?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For example, semantic context may define:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;colgroup&gt;
&lt;col&gt;
&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;
&lt;p&gt;{&lt;/p&gt;
&lt;p&gt;"business_term": "revenue",&lt;/p&gt;
&lt;p&gt;"approved_metric": "recognized_revenue",&lt;/p&gt;
&lt;p&gt;"definition": "Revenue recognized according to finance reporting
rules",&lt;/p&gt;
&lt;p&gt;"owner": "Finance",&lt;/p&gt;
&lt;p&gt;"version": "2026.1"&lt;/p&gt;
&lt;p&gt;}&lt;/p&gt;
&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That helps the model understand the business meaning.&lt;/p&gt;

&lt;p&gt;But it does not fully answer which tables to join.&lt;/p&gt;

&lt;p&gt;For that, the system needs relationship context.&lt;/p&gt;

&lt;h2&gt;
  
  
  What relationship context might look like
&lt;/h2&gt;

&lt;p&gt;Relationship context can be represented as structured input before SQL&lt;br&gt;
generation.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;colgroup&gt;
&lt;col&gt;
&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;
&lt;p&gt;{&lt;/p&gt;
&lt;p&gt;"business_intent": "customers_without_recent_orders",&lt;/p&gt;
&lt;p&gt;"semantic_context": {&lt;/p&gt;
&lt;p&gt;"entity": "customer",&lt;/p&gt;
&lt;p&gt;"activity": "order",&lt;/p&gt;
&lt;p&gt;"time_window": "last_90_days"&lt;/p&gt;
&lt;p&gt;},&lt;/p&gt;
&lt;p&gt;"relationship_context": {&lt;/p&gt;
&lt;p&gt;"candidate_tables": [&lt;/p&gt;
&lt;p&gt;"customers",&lt;/p&gt;
&lt;p&gt;"orders",&lt;/p&gt;
&lt;p&gt;"order_items"&lt;/p&gt;
&lt;p&gt;],&lt;/p&gt;
&lt;p&gt;"preferred_path": [&lt;/p&gt;
&lt;p&gt;{&lt;/p&gt;
&lt;p&gt;"from": "customers.customer_id",&lt;/p&gt;
&lt;p&gt;"to": "orders.customer_id",&lt;/p&gt;
&lt;p&gt;"relationship_type": "customer_to_order",&lt;/p&gt;
&lt;p&gt;"confidence": 0.97&lt;/p&gt;
&lt;p&gt;}&lt;/p&gt;
&lt;p&gt;],&lt;/p&gt;
&lt;p&gt;"risk_flags": [&lt;/p&gt;
&lt;p&gt;"orders table contains cancelled orders",&lt;/p&gt;
&lt;p&gt;"order_date should be used instead of created_at"&lt;/p&gt;
&lt;p&gt;],&lt;/p&gt;
&lt;p&gt;"clarification_required": false&lt;/p&gt;
&lt;p&gt;}&lt;/p&gt;
&lt;p&gt;}&lt;/p&gt;
&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Now the model is not guessing from raw table names alone.&lt;/p&gt;

&lt;p&gt;It has a constrained context.&lt;/p&gt;

&lt;p&gt;It knows which tables matter.&lt;/p&gt;

&lt;p&gt;It knows how they connect.&lt;/p&gt;

&lt;p&gt;It knows which risks to consider.&lt;/p&gt;

&lt;h2&gt;
  
  
  A more reliable workflow
&lt;/h2&gt;

&lt;p&gt;A relationship-aware Text-to-SQL workflow might look like this:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;colgroup&gt;
&lt;col&gt;
&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;
&lt;p&gt;Natural language question&lt;/p&gt;
&lt;p&gt;↓&lt;/p&gt;
&lt;p&gt;Intent and entity extraction&lt;/p&gt;
&lt;p&gt;↓&lt;/p&gt;
&lt;p&gt;Business semantic mapping&lt;/p&gt;
&lt;p&gt;↓&lt;/p&gt;
&lt;p&gt;Metadata retrieval&lt;/p&gt;
&lt;p&gt;↓&lt;/p&gt;
&lt;p&gt;Relationship context assembly&lt;/p&gt;
&lt;p&gt;↓&lt;/p&gt;
&lt;p&gt;Join path selection&lt;/p&gt;
&lt;p&gt;↓&lt;/p&gt;
&lt;p&gt;SQL generation&lt;/p&gt;
&lt;p&gt;↓&lt;/p&gt;
&lt;p&gt;Validation&lt;/p&gt;
&lt;p&gt;↓&lt;/p&gt;
&lt;p&gt;Execution&lt;/p&gt;
&lt;p&gt;↓&lt;/p&gt;
&lt;p&gt;Explanation and feedback&lt;/p&gt;
&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The important point is that SQL generation happens after semantic and&lt;br&gt;
relationship context are assembled.&lt;/p&gt;

&lt;p&gt;This reduces the burden on the model.&lt;/p&gt;

&lt;p&gt;The model is still useful, but it is no longer responsible for inventing&lt;br&gt;
the entire data environment from scratch.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why prompts are not enough
&lt;/h2&gt;

&lt;p&gt;Prompt engineering helps.&lt;/p&gt;

&lt;p&gt;You can tell the model:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;colgroup&gt;
&lt;col&gt;
&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;
&lt;p&gt;Use the correct tables.&lt;/p&gt;
&lt;p&gt;Avoid ambiguous joins.&lt;/p&gt;
&lt;p&gt;Prefer approved metrics.&lt;/p&gt;
&lt;p&gt;Ask a clarifying question when necessary.&lt;/p&gt;
&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;These are good instructions.&lt;/p&gt;

&lt;p&gt;But they are not a substitute for missing data knowledge.&lt;/p&gt;

&lt;p&gt;If the system does not know which table is authoritative, the&lt;br&gt;
instruction "use the correct table" is not enough.&lt;/p&gt;

&lt;p&gt;If the system does not know which join path is trusted, the instruction&lt;br&gt;
"avoid ambiguous joins" is not enough.&lt;/p&gt;

&lt;p&gt;The system needs a governed source of truth for metadata, business&lt;br&gt;
definitions, and table relationships.&lt;/p&gt;

&lt;p&gt;Otherwise, the model is forced to make a probabilistic decision in a&lt;br&gt;
place where the business expects determinism.&lt;/p&gt;

&lt;h2&gt;
  
  
  What should be included in relationship context?
&lt;/h2&gt;

&lt;p&gt;A practical relationship context layer may include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;source table and target table&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;source field and target field&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;relationship type&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;confidence score&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;coverage ratio&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;uniqueness ratio&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;preferred join path&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;deprecated relationships&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;business meaning of the relationship&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;validation rules&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;known duplication risks&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;colgroup&gt;
&lt;col&gt;
&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;
&lt;p&gt;{&lt;/p&gt;
&lt;p&gt;"relationship": {&lt;/p&gt;
&lt;p&gt;"source": "sales_contract.customer_id",&lt;/p&gt;
&lt;p&gt;"target": "customer_master.customer_id",&lt;/p&gt;
&lt;p&gt;"type": "many_to_one",&lt;/p&gt;
&lt;p&gt;"confidence": 0.94,&lt;/p&gt;
&lt;p&gt;"coverage": 0.98,&lt;/p&gt;
&lt;p&gt;"preferred": true,&lt;/p&gt;
&lt;p&gt;"notes": "Preferred customer relationship for sales reporting"&lt;/p&gt;
&lt;p&gt;}&lt;/p&gt;
&lt;p&gt;}&lt;/p&gt;
&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This kind of context gives the SQL generation layer a much better&lt;br&gt;
starting point.&lt;/p&gt;

&lt;h2&gt;
  
  
  Validation matters
&lt;/h2&gt;

&lt;p&gt;Even with relationship context, generated SQL should not be executed&lt;br&gt;
blindly.&lt;/p&gt;

&lt;p&gt;Validation should check:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;whether referenced tables exist&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;whether join fields are resolvable&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;whether the query may duplicate records&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;whether the metric definition matches the approved semantic layer&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;whether the user has permission to access the requested data&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;whether ambiguity should trigger clarification&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For example, if there are two possible revenue definitions, the system&lt;br&gt;
should not guess.&lt;/p&gt;

&lt;p&gt;It should ask:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Do you mean booked revenue or recognized revenue?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A good Text-to-SQL system should know when not to answer.&lt;/p&gt;

&lt;h2&gt;
  
  
  The practical takeaway
&lt;/h2&gt;

&lt;p&gt;If you are building an enterprise Text-to-SQL system, do not start with&lt;br&gt;
the model alone.&lt;/p&gt;

&lt;p&gt;Start with these questions:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;What business terms does the user use?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Which metrics and dimensions are governed?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Which tables are authoritative?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Which join paths are trusted?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;What should the system do when the question is ambiguous?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;How will generated SQL be validated before execution?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;How will user feedback improve future answers?&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Only then should the model generate SQL.&lt;/p&gt;

&lt;p&gt;At Arisyn, this is one of the problems we are exploring: how to make&lt;br&gt;
enterprise Text-to-SQL more relationship-aware, governed, and&lt;br&gt;
explainable by design.&lt;/p&gt;

&lt;p&gt;The direction is simple:&lt;/p&gt;

&lt;p&gt;Better prompts help.&lt;/p&gt;

&lt;p&gt;Better context matters more.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>sql</category>
      <category>database</category>
      <category>dataengineering</category>
    </item>
    <item>
      <title>Why Text-to-SQL Needs Relationship Context, Not Just Better Prompts</title>
      <dc:creator>ArisynData</dc:creator>
      <pubDate>Fri, 05 Jun 2026 15:00:00 +0000</pubDate>
      <link>https://dev.to/arisyndata/why-text-to-sql-needs-relationship-context-not-just-better-prompts-3k3j</link>
      <guid>https://dev.to/arisyndata/why-text-to-sql-needs-relationship-context-not-just-better-prompts-3k3j</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiiep2t66q4our30jeb6m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiiep2t66q4our30jeb6m.png" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Text-to-SQL systems are often explained as a language problem:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;User asks a question. Model generates SQL. Database returns results.&lt;br&gt;
That mental model is too simple for real enterprise data.&lt;br&gt;
In production, the hard part is rarely whether a model can write a &lt;code&gt;SELECT&lt;/code&gt; statement. The hard part is whether the system has enough context to choose the right tables, the right metrics, the right filters, and the right join path.&lt;br&gt;
This is where many demos break.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;The clean demo problem&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A demo dataset usually&lt;br&gt;
 has a small number of well-named tables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;customers
orders
order_items
products
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The relationships are obvious. The naming is consistent. The business logic is easy to infer.&lt;br&gt;
Enterprise databases are different.&lt;br&gt;
You may see:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cust_master
customer_profile_v2
crm_account
sales_contract
invoice_record
invoice_detail
region_mapping
finance_revenue_snapshot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Some relationships are declared. Many are not. Some columns look similar but mean different things. Some tables are current; others are historical, replicated, or partially deprecated.&lt;br&gt;
A language model can still generate SQL, but without the right context, it may generate SQL that looks reasonable and returns the wrong answer.&lt;br&gt;
That is more dangerous than an obvious error.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The missing input: relationship context&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A better Text-to-SQL pipeline should not ask the LLM to infer everything from raw table names.&lt;br&gt;
It should provide structured context such as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"business_term"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"customer"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"candidate_tables"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"crm_account"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"customer_profile_v2"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"preferred_table"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"crm_account"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"relationships"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"from"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"crm_account.account_id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"to"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sales_contract.account_id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"confidence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.96&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"from"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sales_contract.contract_id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"to"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"invoice_record.contract_id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"confidence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.93&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the model is no longer guessing from names alone. It is generating SQL with context about semantics and relationships.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A more reliable architecture&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A relationship-aware Text-to-SQL workflow might look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Natural language question
        ↓
Intent and entity extraction
        ↓
Business semantic mapping
        ↓
Candidate table discovery
        ↓
Relationship / join path discovery
        ↓
SQL generation
        ↓
Validation and execution
        ↓
Explanation and feedback
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The important point is that SQL generation happens after semantic and relationship context is assembled.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why prompts are not enough&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Prompt engineering helps, but prompts cannot reliably solve missing data context.&lt;br&gt;
You can tell a model:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Use the correct tables and avoid ambiguous joins.&lt;br&gt;
But that instruction does not tell it which table is correct.&lt;br&gt;
The system needs a governed source of truth for metadata, business definitions, and table relationships. Otherwise, the model is forced to make a probabilistic decision in a place where the business expects determinism.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Where relationship discovery fits&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Relationship discovery is the process of finding and managing how tables and fields connect, even when explicit foreign keys are missing or incomplete.&lt;br&gt;
Signals may include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;database constraints&lt;/li&gt;
&lt;li&gt;naming patterns&lt;/li&gt;
&lt;li&gt;value overlap&lt;/li&gt;
&lt;li&gt;uniqueness and coverage&lt;/li&gt;
&lt;li&gt;co-occurrence&lt;/li&gt;
&lt;li&gt;business semantic mappings&lt;/li&gt;
&lt;li&gt;historical query patterns
No single signal is perfect. But combining them gives the SQL generation layer a much better starting point.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The practical takeaway&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you are building an enterprise Text-to-SQL system, do not start with the model alone.&lt;br&gt;
Start with these questions:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;What business terms does the user use?&lt;/li&gt;
&lt;li&gt;Which metrics and dimensions are governed?&lt;/li&gt;
&lt;li&gt;Which tables are authoritative?&lt;/li&gt;
&lt;li&gt;Which join paths are trusted?&lt;/li&gt;
&lt;li&gt;What should the system do when the question is ambiguous?
Only then should the model generate SQL.
At Arisyn, we are exploring this problem through a semantic query engine backed by data relationship discovery. The direction we believe in is simple: enterprise Text-to-SQL should be relationship-aware, governed, and explainable by design.&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>ai</category>
      <category>sql</category>
      <category>dataengineering</category>
      <category>llm</category>
    </item>
  </channel>
</rss>
