<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: ArisynData</title>
    <description>The latest articles on DEV Community by ArisynData (@arisyndata).</description>
    <link>https://dev.to/arisyndata</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3640833%2F877a0e7c-a59f-4ed3-baac-6bf99ad9c964.jpg</url>
      <title>DEV Community: ArisynData</title>
      <link>https://dev.to/arisyndata</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/arisyndata"/>
    <language>en</language>
    <item>
      <title>Why Text-to-SQL Needs Relationship Context, Not Just Better Prompts</title>
      <dc:creator>ArisynData</dc:creator>
      <pubDate>Fri, 05 Jun 2026 15:00:00 +0000</pubDate>
      <link>https://dev.to/arisyndata/why-text-to-sql-needs-relationship-context-not-just-better-prompts-3k3j</link>
      <guid>https://dev.to/arisyndata/why-text-to-sql-needs-relationship-context-not-just-better-prompts-3k3j</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiiep2t66q4our30jeb6m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiiep2t66q4our30jeb6m.png" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Text-to-SQL systems are often explained as a language problem:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;User asks a question. Model generates SQL. Database returns results.&lt;br&gt;
That mental model is too simple for real enterprise data.&lt;br&gt;
In production, the hard part is rarely whether a model can write a &lt;code&gt;SELECT&lt;/code&gt; statement. The hard part is whether the system has enough context to choose the right tables, the right metrics, the right filters, and the right join path.&lt;br&gt;
This is where many demos break.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;The clean demo problem&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A demo dataset usually&lt;br&gt;
 has a small number of well-named tables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;customers
orders
order_items
products
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The relationships are obvious. The naming is consistent. The business logic is easy to infer.&lt;br&gt;
Enterprise databases are different.&lt;br&gt;
You may see:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cust_master
customer_profile_v2
crm_account
sales_contract
invoice_record
invoice_detail
region_mapping
finance_revenue_snapshot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Some relationships are declared. Many are not. Some columns look similar but mean different things. Some tables are current; others are historical, replicated, or partially deprecated.&lt;br&gt;
A language model can still generate SQL, but without the right context, it may generate SQL that looks reasonable and returns the wrong answer.&lt;br&gt;
That is more dangerous than an obvious error.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The missing input: relationship context&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A better Text-to-SQL pipeline should not ask the LLM to infer everything from raw table names.&lt;br&gt;
It should provide structured context such as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"business_term"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"customer"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"candidate_tables"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"crm_account"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"customer_profile_v2"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"preferred_table"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"crm_account"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"relationships"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"from"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"crm_account.account_id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"to"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sales_contract.account_id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"confidence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.96&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"from"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sales_contract.contract_id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"to"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"invoice_record.contract_id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"confidence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.93&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the model is no longer guessing from names alone. It is generating SQL with context about semantics and relationships.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A more reliable architecture&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A relationship-aware Text-to-SQL workflow might look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Natural language question
        ↓
Intent and entity extraction
        ↓
Business semantic mapping
        ↓
Candidate table discovery
        ↓
Relationship / join path discovery
        ↓
SQL generation
        ↓
Validation and execution
        ↓
Explanation and feedback
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The important point is that SQL generation happens after semantic and relationship context is assembled.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why prompts are not enough&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Prompt engineering helps, but prompts cannot reliably solve missing data context.&lt;br&gt;
You can tell a model:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Use the correct tables and avoid ambiguous joins.&lt;br&gt;
But that instruction does not tell it which table is correct.&lt;br&gt;
The system needs a governed source of truth for metadata, business definitions, and table relationships. Otherwise, the model is forced to make a probabilistic decision in a place where the business expects determinism.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Where relationship discovery fits&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Relationship discovery is the process of finding and managing how tables and fields connect, even when explicit foreign keys are missing or incomplete.&lt;br&gt;
Signals may include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;database constraints&lt;/li&gt;
&lt;li&gt;naming patterns&lt;/li&gt;
&lt;li&gt;value overlap&lt;/li&gt;
&lt;li&gt;uniqueness and coverage&lt;/li&gt;
&lt;li&gt;co-occurrence&lt;/li&gt;
&lt;li&gt;business semantic mappings&lt;/li&gt;
&lt;li&gt;historical query patterns
No single signal is perfect. But combining them gives the SQL generation layer a much better starting point.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The practical takeaway&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you are building an enterprise Text-to-SQL system, do not start with the model alone.&lt;br&gt;
Start with these questions:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;What business terms does the user use?&lt;/li&gt;
&lt;li&gt;Which metrics and dimensions are governed?&lt;/li&gt;
&lt;li&gt;Which tables are authoritative?&lt;/li&gt;
&lt;li&gt;Which join paths are trusted?&lt;/li&gt;
&lt;li&gt;What should the system do when the question is ambiguous?
Only then should the model generate SQL.
At Arisyn, we are exploring this problem through a semantic query engine backed by data relationship discovery. The direction we believe in is simple: enterprise Text-to-SQL should be relationship-aware, governed, and explainable by design.&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>ai</category>
      <category>sql</category>
      <category>dataengineering</category>
      <category>llm</category>
    </item>
  </channel>
</rss>
