Why Text-to-SQL Needs Relationship Context, Not Just Better Prompts

#ai #sql #dataengineering #llm

Text-to-SQL systems are often explained as a language problem:

User asks a question. Model generates SQL. Database returns results.
That mental model is too simple for real enterprise data.
In production, the hard part is rarely whether a model can write a SELECT statement. The hard part is whether the system has enough context to choose the right tables, the right metrics, the right filters, and the right join path.
This is where many demos break.

The clean demo problem

A demo dataset usually
has a small number of well-named tables:

customers
orders
order_items
products

The relationships are obvious. The naming is consistent. The business logic is easy to infer.
Enterprise databases are different.
You may see:

cust_master
customer_profile_v2
crm_account
sales_contract
invoice_record
invoice_detail
region_mapping
finance_revenue_snapshot

Some relationships are declared. Many are not. Some columns look similar but mean different things. Some tables are current; others are historical, replicated, or partially deprecated.
A language model can still generate SQL, but without the right context, it may generate SQL that looks reasonable and returns the wrong answer.
That is more dangerous than an obvious error.

The missing input: relationship context

A better Text-to-SQL pipeline should not ask the LLM to infer everything from raw table names.
It should provide structured context such as:

{
  "business_term": "customer",
  "candidate_tables": ["crm_account", "customer_profile_v2"],
  "preferred_table": "crm_account",
  "relationships": [
    {
      "from": "crm_account.account_id",
      "to": "sales_contract.account_id",
      "confidence": 0.96
    },
    {
      "from": "sales_contract.contract_id",
      "to": "invoice_record.contract_id",
      "confidence": 0.93
    }
  ]
}

Now the model is no longer guessing from names alone. It is generating SQL with context about semantics and relationships.

A more reliable architecture

A relationship-aware Text-to-SQL workflow might look like this:

Natural language question
        ↓
Intent and entity extraction
        ↓
Business semantic mapping
        ↓
Candidate table discovery
        ↓
Relationship / join path discovery
        ↓
SQL generation
        ↓
Validation and execution
        ↓
Explanation and feedback

The important point is that SQL generation happens after semantic and relationship context is assembled.

Why prompts are not enough

Prompt engineering helps, but prompts cannot reliably solve missing data context.
You can tell a model:

Use the correct tables and avoid ambiguous joins.
But that instruction does not tell it which table is correct.
The system needs a governed source of truth for metadata, business definitions, and table relationships. Otherwise, the model is forced to make a probabilistic decision in a place where the business expects determinism.

Where relationship discovery fits

Relationship discovery is the process of finding and managing how tables and fields connect, even when explicit foreign keys are missing or incomplete.
Signals may include:

database constraints
naming patterns
value overlap
uniqueness and coverage
co-occurrence
business semantic mappings
historical query patterns No single signal is perfect. But combining them gives the SQL generation layer a much better starting point.

The practical takeaway

If you are building an enterprise Text-to-SQL system, do not start with the model alone.
Start with these questions:

What business terms does the user use?
Which metrics and dimensions are governed?
Which tables are authoritative?
Which join paths are trusted?
What should the system do when the question is ambiguous? Only then should the model generate SQL. At Arisyn, we are exploring this problem through a semantic query engine backed by data relationship discovery. The direction we believe in is simple: enterprise Text-to-SQL should be relationship-aware, governed, and explainable by design.

DEV Community

Why Text-to-SQL Needs Relationship Context, Not Just Better Prompts

Top comments (0)