DEV Community: ArisynData

I Didn't Break Enterprise Data Models. It Exposed Their Blind Spots.

ArisynData — Wed, 29 Jul 2026 06:19:24 +0000

Over the past few years, I've worked with several enterprise AI projects, especially those involving natural language querying and AI-powered analytics.

One pattern keeps showing up.

When an AI system returns the wrong answer, people usually blame the model.

"Maybe we need a larger LLM."

"Maybe the prompt needs more context."

"Maybe SQL generation isn't mature enough."

After digging into these projects, I came to a different conclusion.

In many cases, the model isn't the real problem.

The enterprise data model is.

Enterprise Data Models Were Never Designed for AI

For decades, enterprise databases have been optimized for applications.

Normalization reduces redundancy.

Indexes improve query performance.

Foreign keys maintain integrity.

Data warehouses organize information for reporting.

Everything makes sense because applications already know how the business works.

Business logic lives in source code, service layers, stored procedures, ETL pipelines, and developers' experience—not necessarily in the database itself.

Applications don't need the database to explain what a "customer" is.

Developers already know.

AI doesn't.

Schema Describes Structure, Not Meaning

Most AI systems start by reading metadata.

They can discover tables.

Columns.

Primary keys.

Sometimes foreign keys.

But metadata only tells AI how data is stored.

It doesn't explain what the data actually represents.

For example, imagine an enterprise with three different systems.

CRM stores customers.

ERP stores accounts.

The finance system stores billing entities.

To employees, these often represent the same business entity viewed from different business processes.

To AI, they are simply three unrelated tables.

Without additional business knowledge, every SQL statement becomes an educated guess.

The Hardest Problem Isn't Writing SQL

Modern language models are surprisingly good at generating SQL.

Syntax errors have become much less common.

The bigger challenge appears earlier.

Before generating SQL, AI must answer questions like:

Which customer table should I use?
Which data source is considered authoritative?
Are these two entities actually the same customer?
Which relationship reflects real business rules?

These aren't SQL problems.

They're knowledge problems.

Business Knowledge Lives Outside the Database

One thing I find interesting is that enterprise knowledge is rarely stored where AI can access it.

Developers understand join paths.

Business analysts understand metric definitions.

Database administrators understand physical schemas.

Domain experts understand the business process.

Each group holds part of the knowledge.

Very little of it is represented explicitly in the data model itself.

Humans bridge these gaps naturally.

AI cannot.

AI Has Become a New Consumer of Enterprise Data

This is probably the biggest architectural change we're seeing.

For years, applications were the only consumers of enterprise databases.

Now AI is becoming another consumer.

Unlike applications, AI doesn't read source code.

It doesn't attend design meetings.

It doesn't ask senior developers which table is "correct."

It only sees what the enterprise has documented.

And many enterprises have documented far less than they assumed.

The Future Isn't About Bigger Models

Large language models will continue to improve.

They'll write better SQL.

Reason more effectively.

Handle longer contexts.

But none of these improvements automatically provide business knowledge.

If an enterprise hasn't clearly defined its business entities, trusted relationships, or business semantics, AI has no reliable foundation to reason from.

The model can infer.

It can estimate.

It can guess.

It cannot know.

Final Thoughts

I don't think AI is exposing weaknesses in language models.

It's exposing weaknesses in enterprise data architecture.

For years, our data models were built to support applications.

Today, they also need to support AI.

That doesn't necessarily mean redesigning every database.

But it does mean making business entities, relationships, and business semantics far more explicit than they have been in the past.

The smarter AI becomes, the more valuable well-structured enterprise knowledge will be.

Foreign Keys Aren't Enough: Why Enterprise AI Needs Relationship Discovery

ArisynData — Mon, 27 Jul 2026 03:15:19 +0000

Modern AI systems are surprisingly good at writing SQL.

Give an LLM a database schema, and it can often generate syntactically correct queries within seconds. With Retrieval-Augmented Generation (RAG), database metadata, and function calling, connecting AI to enterprise databases has become easier than ever.

Yet many enterprise AI projects encounter the same problem after deployment:

The SQL executes successfully, but the answer is still wrong.

This isn't usually a model problem.

It's a relationship problem.

The assumption most AI systems make

Most AI-powered database assistants follow roughly the same workflow:

User Question
      │
      ▼
Schema Retrieval
      │
      ▼
LLM Generates SQL
      │
      ▼
Database Execution
      │
      ▼
Answer

This workflow assumes something important:

The schema contains enough information for AI to understand how data is connected.

Unfortunately, enterprise databases rarely work that way.

Real enterprise databases rarely have complete foreign keys

In tutorials, relationships are simple.

Customer
    │
    ▼
Order
    │
    ▼
Invoice

Every relationship is explicitly defined.

Every foreign key exists.

Every table follows the same modeling standard.

Production systems look very different.

A company may have:

an ERP system
a CRM platform
a financial system
a manufacturing system
a data warehouse
several legacy databases

Each system was designed independently.

Relationships frequently exist without database constraints.

For example:

the same customer appears under different identifiers
order numbers are reused across systems
warehouse codes connect operational databases
contract IDs link finance and sales data
relationships rely on business rules rather than foreign keys

Applications understand these connections because developers encoded them years ago.

The database itself often does not.

Why column names are unreliable

A common strategy is to infer joins from matching field names.

For example:

customer_id

appears in two tables.

Therefore they must be related.

Sometimes that's correct.

Sometimes it's completely wrong.

Likewise,

customer_code

and

account_number

may represent exactly the same business entity even though their names are different.

Enterprise systems evolve over years.

Naming conventions change.

Systems are merged.

Columns are duplicated.

Temporary tables become permanent.

Column names alone cannot describe the real data model.

Relationships exist inside the data

One of the biggest misconceptions is that relationships only exist in metadata.

In reality, much stronger evidence often exists inside the data itself.

Imagine two tables:

Orders

and

Customers

Neither contains a foreign key.

However:

every customer ID appearing in Orders also appears in Customers
Customer IDs are unique in Customers
the inclusion ratio remains stable over time

That combination provides strong evidence that the relationship is real.

Similar signals include:

distinct value overlap
inclusion ratios
uniqueness
composite key candidates
relationship direction
cardinality
value distribution

Unlike naming conventions, these signals come directly from the data.

Discovery is only the first step

Finding candidate relationships is valuable.

Trusting them automatically is dangerous.

Enterprise environments frequently contain multiple possible join paths.

For example:

Customer
   │
Order
   │
Invoice

may be valid.

So may:

Customer
   │
Contract
   │
Invoice

Technically, both queries execute.

Only one answers the business question correctly.

Relationship discovery should therefore be followed by relationship governance.

Organizations need to know:

which relationships were verified
where they came from
how reliable they are
which scenarios they support
which path should be preferred

Without that layer, AI still has to guess.

AI needs a relationship layer

Instead of discovering relationships every time SQL is generated, enterprises should treat relationship knowledge as reusable metadata.

An effective relationship layer can:

detect implicit relationships across databases
discover candidate joins from actual data
measure confidence using statistical evidence
preserve verified relationship paths
expose trusted relationships to AI systems
update relationship knowledge as data evolves

Once this layer exists, AI no longer starts from zero every time it answers a question.

It starts with knowledge accumulated by the organization.

Beyond SQL generation

Enterprise AI discussions often focus on better prompts, larger models, or more powerful agents.

Those improvements certainly matter.

But they cannot compensate for missing relationship knowledge.

A model cannot reliably choose a relationship that has never been made visible.

Better reasoning helps evaluate evidence.

It cannot invent enterprise knowledge that the organization has never captured.

As AI becomes a permanent part of enterprise software, relationship discovery is becoming more than a data engineering task.

It is becoming part of the infrastructure that enables AI to understand how enterprise data actually fits together.

Because in enterprise databases, the most important relationships often aren't missing.

They're simply invisible.

Designing a Production-Grade Text-to-SQL Pipeline

ArisynData — Mon, 13 Jul 2026 07:21:21 +0000

Text-to-SQL demos usually look like this:

Question → LLM → SQL → Result

That is fine for a controlled dataset.

It is not enough for a production system.

In a real warehouse, the model has to deal with duplicated concepts, undocumented joins, multiple date fields, inconsistent naming, and tables that were never designed for AI access.

The hard part is not generating SQL.

The hard part is building the context the model needs before generation.

Start with the question, not the schema

Take a simple request:

Show revenue by customer segment for last quarter.

A model can turn that into SQL quickly. But before it does, the system needs to answer a few basic questions:

Which revenue definition should be used?
Which date field represents the reporting period?
Is customer segment current or historical?
Which customer table is authoritative?
Which join path avoids duplicating revenue?

Those decisions should not be left to the model to guess.

A better pipeline resolves them before SQL generation.

A more realistic workflow

A production flow looks closer to this:

User Question
    ↓
Intent Parsing
    ↓
Semantic Mapping
    ↓
Metadata Retrieval
    ↓
Relationship Discovery
    ↓
Join Path Selection
    ↓
SQL Generation
    ↓
Validation
    ↓
Execution
    ↓
Explanation

Each step has a separate job.

1. Intent parsing

Extract the actual request:

{
  "metric": "revenue",
  "dimension": "customer_segment",
  "time_range": "last_quarter"
}

This is also where the system should detect ambiguity.

For example, “revenue” may refer to booked, invoiced, recognized, or paid revenue.

2. Semantic mapping

Map the user’s language to governed business definitions.

{
  "metric": "recognized_revenue",
  "formula": "SUM(invoice_line.recognized_amount)",
  "time_field": "invoice_line.recognition_date"
}

This prevents the model from choosing fields based only on similar names.

3. Metadata retrieval

Retrieve only the relevant tables and columns.

Passing the entire warehouse schema into the prompt usually creates more noise than value.

The model should receive a narrow working set:

customer
customer_segment_history
invoice
invoice_line

4. Relationship discovery

This is where many systems remain weak.

Foreign keys are useful, but enterprise databases often have missing, incomplete, or misleading constraints.

A relationship layer should provide more than table names. It should include:

source and target columns
relationship direction
cardinality
confidence
known fanout risk
preferred usage

For example:

{
  "from": "invoice.customer_id",
  "to": "customer.customer_id",
  "cardinality": "many_to_one",
  "confidence": 0.98,
  "fanout_risk": false
}

5. Join path selection

There may be several valid paths between the same business entities.

The shortest path is not always the safest one.

A good system should prefer a path that matches the query grain and metric definition, not just one that happens to connect the tables.

For the revenue example, joining directly to a current customer table may produce a valid result but lose historical segment accuracy.

The correct path may require a segment history table and an effective-date condition.

Validation has to be explicit

SQL execution is not validation.

A query can run successfully and still return the wrong answer.

At minimum, the validation stage should check:

- Are the selected tables approved for this metric?
- Does the join path match the required grain?
- Can the join duplicate fact rows?
- Are filters applied to the correct date field?
- Are permissions respected?
- Is the aggregation consistent with the metric definition?

Some checks are static. Others require running a small test query.

For example, a join can be tested for row multiplication before the final query is executed.

SELECT
    COUNT(*) AS rows_before,
    COUNT(DISTINCT invoice_line.id) AS distinct_rows
FROM invoice_line
JOIN customer
  ON invoice_line.customer_id = customer.customer_id;

If those numbers drift unexpectedly, the pipeline should stop.

Clarification is part of the system

One of the most useful behaviors in Text-to-SQL is asking a question instead of generating one.

Do you mean recognized revenue or invoiced revenue?

That is not a failure.

It is often the safest possible response.

A production system should know when the available semantic or relationship context is not strong enough to proceed.

Keep the reasoning visible

The final response should include more than the result.

A useful explanation might show:

Metric: Recognized Revenue
Time Field: recognition_date
Tables Used: invoice_line, customer_segment_history
Join Path: invoice_line.customer_id → customer_segment_history.customer_id
Validation: No fanout detected

That gives analysts a chance to review the logic and gives data teams something they can audit.

The model is not the whole pipeline

The LLM is still important. It can parse questions, generate SQL, explain results, and handle conversation.

But production reliability comes from the surrounding system:

Semantic context
+ Metadata
+ Trusted relationships
+ Validation
+ Governance

That is the difference between a query that looks reasonable and a query that can be trusted.

At Arisyn, we split those responsibilities across two layers: Semora handles business semantics, query reasoning, SQL generation, validation, and explanation, while IntaLink provides the table and field relationship context needed to choose safer data paths.

The SQL is generated near the end.

Most of the real work happens before it.

Enterprise Databases Were Built for Applications, Not AI

ArisynData — Fri, 10 Jul 2026 16:09:00 +0000

As engineers, we spend a lot of time talking about AI models.

Which model generates better SQL?

Which model reasons better?

Which one has the largest context window?

But after working with enterprise data, I've started to think we're looking in the wrong place.

Most enterprise databases were never designed for AI.

They were designed for applications.

## Applications Know the Rules. AI Doesn't.

A business application already knows where everything is.

If an order needs a customer record, the developer has already defined the relationship.

If a dashboard needs revenue, someone has already decided which calculation to use.

The application doesn't need to discover anything.

AI does.

When an LLM connects to an enterprise database, all it sees is hundreds of tables and thousands of columns.

It has no idea:

Which customer table is authoritative.
Which tables are safe to join.
Whether two IDs represent the same business entity.
Which revenue definition the business actually uses.

Generating SQL isn't the difficult part anymore.

Choosing the right data is.

*## Schemas Describe Structure, Not Business Knowledge
*
Even well-designed databases have this problem.

A schema tells you that a table exists.

It doesn't tell you:

why it exists,
when it should be used,
or whether another table has replaced it over time.

The knowledge that engineers build up over years of maintaining a system rarely exists inside the database itself.

It's stored in documentation, meeting notes, old dashboards—or simply in someone's head.

That's exactly the information AI is missing.

## Two Things Make Enterprise Data More Understandable

In my experience, AI becomes much more reliable when two gaps are addressed.

First, data relationships.

AI needs to know how tables, fields, and business entities are connected—not just through foreign keys, but through relationships that have been verified across real enterprise systems. Discovering and validating those relationships is the foundation of platforms like Arisyn-IntaLink. :contentReference[oaicite:0]{index=0}

Second, business semantics.

Even after the right data is found, AI still needs to understand what that data means. Shared metric definitions, business terminology, and governed semantic rules help ensure that "Revenue" or "Customer" means the same thing to everyone. That's exactly the role of a semantic layer such as Arisyn-Semora. :contentReference[oaicite:1]{index=1}

Relationships explain how data is connected.

Semantics explain what the data means.

AI needs both.

## Final Thoughts

I don't think enterprise AI is limited by SQL generation anymore.

The bigger challenge is helping AI understand enterprise data the way experienced engineers do.

The better we capture relationships and business semantics, the less AI has to guess.

And in enterprise systems, fewer guesses almost always lead to better decisions.

AI Agents Don't Need More Tables. They Need Better Relationships.

ArisynData — Wed, 08 Jul 2026 02:19:43 +0000

Most conversations about enterprise AI focus on models.

How smart they are.

How many tokens they support.

How well they generate SQL.

After working with enterprise data, I think we're paying attention to the wrong problem.

The Problem Isn't Finding Data

Enterprise AI usually has access to plenty of data.

Schemas.

Data catalogs.

Documentation.

Historical SQL.

Yet it still struggles with surprisingly simple business questions.

Why?

Because it doesn't understand how the data is connected.

A Simple Example

Imagine asking an AI agent:

"Show the top customers by revenue."

It scans the database and finds:

4 customer tables
3 revenue-related tables
multiple possible join keys

From the model's perspective, several SQL queries look perfectly valid.

Only one matches how the business actually defines revenue and customers.

The model can't infer that from table names alone.

Relationships Carry Business Knowledge

This is something I've started appreciating more over the past year.

The relationship between tables isn't just a technical detail.

It's business knowledge.

A trusted relationship tells you:

which table is the source of truth
which join path has been validated
which fields represent the same business entity
which datasets should never be joined together

Without that context, AI is forced to guess.

Sometimes it guesses correctly.

Sometimes it doesn't.

Bigger Models Don't Fix Missing Context

Every new model is better at reasoning.

That's great.

But reasoning only works when the underlying context is reliable.

If the relationships are ambiguous, a more capable model simply produces a more convincing wrong answer.

That's why many enterprise AI projects spend far more time validating results than generating them.

Final Thought

I'm becoming convinced that enterprise AI isn't just a language problem.

It's a data relationship problem.

Models will continue to improve.

The bigger opportunity is helping them understand how enterprise data actually fits together.

Because once AI understands relationships, everything else becomes much easier.

Stop Building AI Agents Like Standalone Applications

ArisynData — Tue, 07 Jul 2026 13:11:00 +0000

Over the past few months, I've experimented with quite a few enterprise AI projects.

One thing has become obvious.

Most teams are still building AI agents the same way they used to build web applications.

Every new use case becomes another agent.

Another prompt.

Another knowledge base.

Another API integration.

It works at first.

But it doesn't scale.

Every Agent Starts Solving the Same Problems

Imagine a company with ten AI agents.

One helps Sales.

One supports Finance.

Another assists HR.

Another generates weekly reports.

They look different from the outside, but internally they're solving many of the same problems.

Each needs:

authentication
permission control
business definitions
access to enterprise data
shared documents
tools
monitoring

Yet many teams implement these capabilities over and over again.

The result is duplicated logic that becomes harder to maintain every month.

We Already Solved This Problem in Software Engineering

Traditional applications rarely implement infrastructure from scratch anymore.

Authentication is shared.

Logging is shared.

Monitoring is shared.

Configuration is shared.

Developers focus on business logic because the platform provides the rest.

I think AI engineering is heading toward the same architecture.

Agents shouldn't own everything themselves.

They should consume shared platform capabilities.

What Should Live Outside the Agent?

When I look at enterprise AI systems, I increasingly think the agent should remain lightweight.

Instead of embedding everything inside prompts, I'd rather separate responsibilities.

For example:

Context Service

Responsible for business definitions, trusted datasets, and reusable organizational knowledge.

Tool Registry

A single place where agents discover available APIs, SQL tools, search services, and enterprise systems.

Permission Layer

Every agent follows the same access policies instead of implementing its own authorization rules.

Memory Service

Shared long-term memory instead of isolated conversation histories.

Observability

One dashboard to understand how agents are performing, what tools they're calling, and where failures occur.

None of these capabilities belong inside an individual agent.

They're platform concerns.

Keep Agents Small

One lesson I've learned is that smaller agents are usually easier to improve.

When an agent focuses on a single responsibility, it's easier to test, debug, and replace.

The shared platform handles everything else.

Instead of creating increasingly complex prompts, we should be investing in better infrastructure.

The more reusable the platform becomes, the simpler every new agent is to build.

A Different Mental Model

I no longer think of an AI agent as an application.

I think of it as a runtime component.

It receives a task.

It requests context.

It discovers available tools.

It checks permissions.

It completes the work.

Most of the intelligence isn't inside the agent itself.

It's distributed across the platform supporting it.

Final Thoughts

Right now, building an AI agent has become surprisingly easy.

Operating dozens—or eventually hundreds—of them inside an enterprise won't be.

The organizations that move fastest won't necessarily build more agents.

They'll build better platforms for those agents to run on.

To me, that's where enterprise AI engineering is heading next.

Stop Optimizing Your Data Platform for Dashboards

ArisynData — Fri, 03 Jul 2026 02:28:32 +0000

For years, the success of a data platform was measured by one thing:

How easily people could build dashboards.

Today, I think that's changing.

More and more enterprise data is being consumed by AI agents instead of analysts. That changes what a "good" data platform looks like.

The problem is that most data platforms were never designed for AI.

Dashboards Hide Complexity

When a human opens a dashboard, they already know a lot.

They know which KPIs Finance trusts.

They know which report leadership uses every Monday.

They know that two customer tables exist, but only one should be used.

Most of that knowledge never appears in the database.

Humans simply carry it with them.

AI doesn't.

AI Doesn't Want Charts

AI isn't looking at your dashboard.

It's looking at the underlying data.

If that data contains three revenue definitions, duplicated customer IDs, or five possible join paths, the AI has no way to know which one represents the business truth.

The model can generate SQL.

That doesn't mean it understands your business.

What Should a Modern Data Platform Provide?

Instead of focusing only on BI performance, I've started thinking about a different checklist.

Can the platform answer questions like these?

Which relationships between tables are actually trusted?
Which metric definition is the official one?
Which tables are deprecated?
Which joins are safe to reuse?
Which business terms mean the same thing?

These questions aren't about analytics.

They're about context.

A Small Example

Imagine asking an AI agent:

Show quarterly revenue by customer.

Finding the sales table is easy.

The difficult part is everything that comes next.

Which customer table?
Gross revenue or net revenue?
Calendar quarter or fiscal quarter?
Should internal transactions be excluded?
Which join path has already been validated?

Those decisions are usually made by experienced analysts.

AI needs that knowledge too.

This Changes Platform Design

I don't think data platforms will disappear.

But I do think their priorities will change.

Traditional platforms optimized for:

dashboards
reports
SQL performance
human exploration

AI-native platforms will also need to optimize for:

trusted relationships
shared business definitions
governed metrics
reusable organizational knowledge

The data hasn't changed.

The consumer has.

My Take

Every major shift in software has introduced a new primary user.

Web browsers changed frontend development.

Mobile phones changed application design.

Cloud changed infrastructure.

I think AI agents are going to change enterprise data architecture in the same way.

If we're still building data platforms only for analysts, we're solving yesterday's problem.

The next generation of data platforms won't just help people understand data.

They'll help AI understand it too.

Building AI Agents That Can Actually Understand Enterprise Data

ArisynData — Wed, 01 Jul 2026 13:52:00 +0000

The hardest part of building an enterprise AI agent isn't reasoning. It's helping the agent understand your data before it starts reasoning.

Over the past year, I've built and tested several AI-powered analytics workflows.

One thing surprised me.

Getting an LLM to generate SQL isn't nearly as difficult as I expected.

Getting that SQL to reflect how the business actually works is.

That's where most enterprise AI agents quietly fail.

The Typical AI Agent Architecture
Most AI agent tutorials follow roughly the same pattern.

User
│
▼
AI Agent
│
▼
LLM
│
▼
SQL Generator
│
▼
Database
For demos, this works remarkably well.

Ask:

Show me the top-selling products this month.

The agent generates SQL.

The database responds.

Everyone is impressed.

Production is a different story.

The Real Problem Starts Before SQL
Imagine a business user asks:

Which strategic customers are growing the fastest this year?

Now the agent has to answer questions like:

Which customer table should I use?
Which revenue definition is approved?
Which customers were merged after acquisitions?
Which fiscal calendar applies?
Which joins are actually trusted?
Which historical records should be excluded?
None of these questions require a smarter language model.

They require business context.

SQL Is Easy. Context Is Hard.
Most enterprise databases weren't designed for AI.

They were designed for applications.

Over time they accumulate:

duplicated entities
inconsistent naming
legacy schemas
conflicting metrics
undocumented business rules
A human analyst usually learns these through experience.

An AI agent has no such experience.

If the context isn't available, the agent has no choice but to guess.

Why Prompt Engineering Doesn't Scale
One common solution is to keep expanding the system prompt.

Developers add:

table descriptions
metric definitions
join rules
business exceptions
Eventually the prompt becomes hundreds of lines long.

It works…

Until another team builds another AI agent.

Now every project maintains its own version of business knowledge.

A few months later, nobody knows which version is correct.

The problem isn't prompting.

The problem is architecture.

Think About How You Onboard a New Engineer
When a new data engineer joins your company, you don't hand them database credentials and expect them to understand everything.

You explain things like:

which dashboards leadership trusts
where official metrics come from
why certain tables shouldn't be used
which relationships are verified
how different systems connect
Only after learning that context can they contribute confidently.

AI agents need the same onboarding process.

The difference is that their onboarding has to be captured as reusable infrastructure.

What Every Enterprise AI Agent Should Know
In my experience, an enterprise AI agent should understand at least five things before it writes a single SQL query.

1. Business Definitions
What exactly does "active customer" mean?

Does "revenue" include returns?

Is "inventory" updated in real time?

Without shared definitions, different agents produce different answers.

2. Trusted Relationships
Enterprise databases often contain multiple ways to join the same datasets.

Some are technically valid.

Only one reflects how the business actually works.

The agent shouldn't discover this by trial and error.

3. Approved Metrics
Not every calculation is official.

Finance usually has one approved revenue metric.

Operations may have another.

The agent needs to know which one belongs to which scenario.

4. Governance Rules
Some tables are deprecated.

Some columns should never be queried directly.

Some users have row-level permissions.

Governance is part of reasoning.

5. Organizational Knowledge
This is the most overlooked category.

Every company has knowledge that exists only because experienced employees remember it.

For example:

"Don't use that table after 2023."

"Those records were duplicated during the migration."

Humans learn these informally.

AI doesn't.

The Missing Layer
Instead of embedding all this knowledge inside prompts, I think enterprises need a shared context layer.

Something like this:

Business Users
│
▼
AI Agents
│
▼
Shared Context Layer
│
├─ Business Definitions
├─ Trusted Relationships
├─ Approved Metrics
├─ Governance Rules
└─ Organizational Knowledge
│
▼
LLM
│
▼
Enterprise Data
Now every AI application learns from the same source of truth.

Not from a different prompt.

A Better Way to Build AI Agents
I've started asking different questions when evaluating enterprise AI projects.

Instead of asking:

Which model are you using?

I ask:

Where does the agent get business definitions?
How are trusted joins managed?
Who owns metric definitions?
How does every AI application stay consistent?
Those answers usually tell me much more about whether the project will succeed.

Final Thoughts
The industry is spending a lot of time making AI agents better at reasoning.

That's important.

But reasoning only works when the information being reasoned about is trustworthy.

For enterprise AI, the challenge isn't simply building smarter agents.

It's building better data infrastructure for those agents.

Once that foundation exists, better models become an advantage.

Without it, every new model is just making more confident guesses.

We Stopped Improving Our AI Prompts. We Started Improving Our Data Instead.

ArisynData — Mon, 29 Jun 2026 17:30:00 +0000

Modern LLMs are already surprisingly good at generating SQL. The real challenge is giving them enough context to generate SQL that people actually trust.

The Question That Changed Our Approach

Like many teams working on enterprise AI, we spent a lot of time trying to improve prompts.

We experimented with different prompt templates.

We added more examples.

We adjusted temperatures.

We tried different models.

Sometimes the results improved.

Sometimes they didn't.

But one thing became obvious after working with real enterprise databases.

The model wasn't failing because it couldn't write SQL.

It was failing because it didn't understand the organization.

That realization completely changed how we approached enterprise AI.

Instead of asking:

"How can we make the model smarter?"

we started asking:

"How can we make our data easier for AI to understand?"

That turned out to be a much more interesting engineering problem.

Demo Databases Hide the Real Problem

Most AI SQL demos look fantastic.

A clean schema.

Simple foreign keys.

Consistent naming.

Twenty tables.

One customer table.

One orders table.

Everything joins naturally.

Ask:

Which customers spent the most last month?

The model generates SQL.

The SQL executes.

Everyone applauds.

Production systems don't look anything like that.

One customer table becomes twelve.

Revenue exists in multiple systems.

Finance has one definition.

Sales has another.

Marketing has a third.

Historical migrations leave duplicate entities everywhere.

Some relationships exist only because one senior engineer remembers them.

That's the environment enterprise AI actually operates in.

SQL Isn't the Hard Part Anymore

Five years ago, SQL generation itself was a research challenge.

Today, it's becoming a solved problem.

Give GPT-4, Claude, Gemini, or another modern LLM a reasonably organized schema, and they'll usually generate valid SQL.

The bottleneck has moved.

Today's bottleneck looks more like this:

Which table should the model use?
Which definition is officially trusted?
Which join path is correct?
Which records should be excluded?
Which business rule overrides the default logic?

Those questions have nothing to do with SQL syntax.

They're questions about organizational knowledge.

We Started Looking at Failed Queries

One exercise turned out to be incredibly useful.

Instead of reviewing successful demos, we collected failed enterprise queries.

Not queries with syntax errors.

Queries that returned technically correct answers nobody trusted.

Patterns appeared almost immediately.

Pattern 1 — Multiple Sources of Truth

The database contained several revenue tables.

Each one existed for a legitimate reason.

Historical reporting.

Operational reporting.

Finance adjustments.

Regional systems.

The AI selected one.

Finance expected another.

Nobody considered the answer reliable.

Pattern 2 — Hidden Business Rules

The SQL was valid.

The numbers were wrong.

Why?

Because experienced analysts always excluded cancelled transactions after reconciliation.

That rule wasn't stored anywhere.

It simply lived inside institutional knowledge.

The AI had no way to discover it.

Pattern 3 — Relationship Ambiguity

The schema suggested one join.

Senior engineers always used another.

Not because the database required it.

Because years of production experience had proven it was safer.

Again, nothing in the schema explained that.

Pattern 4 — Business Language Doesn't Match Database Language

Users ask:

Active customers

The database stores:

customer_status = 7

Business users understand "active."

The database understands integers.

Someone has to bridge that gap.

Better Prompts Didn't Solve These Problems

Our first instinct was exactly what most teams try.

Improve the prompt.

Add examples.

Explain business terminology.

Increase context length.

Eventually we realized we were embedding organizational knowledge inside prompts.

That doesn't scale.

Every application duplicates the same context.

Every prompt becomes longer.

Every update requires editing multiple systems.

Eventually prompts become documentation.

And documentation always drifts.

We Needed Shared Context Instead

The breakthrough came when we stopped thinking about prompts as the primary source of intelligence.

Instead, we began thinking about reusable context.

Things every AI application should understand before generating SQL.

Examples include:

trusted business definitions
approved metrics
validated relationships
preferred join paths
business terminology
governance rules

Instead of rebuilding this knowledge for every prompt, why not maintain it once?

That idea changed our engineering priorities.

The Architecture Started Looking Different

Originally our architecture looked familiar.


User

↓

LLM

↓

Database

Simple.

Elegant.

Wrong.

Eventually it evolved into something closer to this.


User

↓

AI Application

↓

Context Layer

• Semantic Definitions

• Relationship Discovery

• Business Metrics

• Governance Rules

↓

LLM

↓

Database

Notice something interesting.

The model didn't disappear.

It simply stopped carrying all the responsibility.

Context Is Becoming Infrastructure

I think many engineering teams still treat context as application logic.

Every AI assistant maintains its own prompts.

Its own examples.

Its own business rules.

Its own metadata.

That works when you have one assistant.

It becomes a maintenance nightmare when you have ten.

Or fifty.

Or hundreds.

Context shouldn't live inside applications.

It should live inside infrastructure.

Just like authentication.

Just like monitoring.

Just like APIs.

Shared.

Governed.

Reusable.

Data Engineering Is Quietly Changing

This has also changed how I think about data engineering.

Traditionally, data engineers focused on things like:

ingestion
transformation
storage
performance
orchestration

Those responsibilities still matter.

But AI introduces another responsibility.

Preparing data for machine reasoning.

That includes questions like:

Can AI understand this metric?

Can AI safely join these tables?

Can AI explain where this number came from?

Can AI distinguish between similar concepts?

Those weren't traditional data engineering problems.

They're becoming increasingly important now.

The New Question I Ask

Whenever someone tells me they're building enterprise AI, I no longer ask:

Which model are you using?

Instead I ask:

Where does your AI get its business context?

Sometimes the answer is:

"Our prompts."

Sometimes it's:

"Our documentation."

Sometimes nobody knows.

That's usually where the biggest opportunity exists.

What I Think Will Matter Over the Next Five Years

Every year models become better.

That's almost guaranteed.

But every enterprise is using roughly the same foundation models.

Competitive advantage probably won't come from choosing Model A instead of Model B.

It will come from something much harder to copy.

The quality of organizational context.

Companies that organize business knowledge into reusable infrastructure will build AI systems that are more reliable, easier to maintain, and far more scalable.

Companies that don't will continue fighting the same problems with increasingly sophisticated prompts.

Final Thoughts

Looking back, the biggest shift wasn't technical.

It was conceptual.

We stopped treating enterprise AI as a language problem.

We started treating it as a data infrastructure problem.

LLMs are becoming exceptional reasoning engines.

What they still lack is trusted organizational context.

And I increasingly believe that's where the next generation of enterprise engineering will focus.

Not writing better prompts.

Not switching models every six months.

But building data infrastructure that allows every AI application to reason from the same trusted foundation.

When that foundation exists, SQL generation becomes almost the easy part.

Without it, even the smartest model is simply making educated guesses.

Why Text-to-SQL Breaks When the Join Path Is Not Obvious

ArisynData — Fri, 26 Jun 2026 13:05:00 +0000

Most Text-to-SQL examples are too clean.

They usually assume a simple schema, obvious table names, clear foreign keys, and a question that maps neatly to one or two tables. In that environment, generating SQL from natural language looks impressive.

Enterprise databases are not like that.

In real analytics work, the hard part is often not the SELECT clause. It is the join path.

The SQL Can Be Valid and Still Wrong

Imagine a user asks:

“Show revenue by customer for the last quarter.”

A model may generate something like:

SELECT
  c.customer_name,
  SUM(o.revenue) AS total_revenue
FROM customers c
JOIN orders o
  ON c.customer_id = o.customer_id
WHERE o.order_date >= '2026-01-01'
  AND o.order_date < '2026-04-01'
GROUP BY c.customer_name;

Technically, this looks fine.

But in an enterprise environment, several things may be wrong:

customers may not be the approved customer master.
orders.revenue may not be the finance-approved revenue field.
Customer records may be duplicated across regions.
Some orders may need to be excluded because they were adjusted later.
The join may create duplication if there are multiple customer records per account.
Last quarter may follow fiscal, not calendar, logic.

The database accepts the query.

The dashboard renders.

The answer is wrong.

That is the uncomfortable part of enterprise Text-to-SQL.

A syntactically valid query is not the same as a trusted query.

Why Join Paths Are Hard

Join paths are obvious only when the data model is clean.

In production systems, they are usually messy.

You may have:

Missing foreign keys
Legacy tables
Similar columns with different meanings
One-to-many relationships that create fanout
Historical snapshots
Slowly changing dimensions
Department-specific marts
Fields reused for different purposes
Business rules that exist only in old SQL reports

Even experienced engineers often need time to inspect the schema, check existing reports, ask someone in finance, and run sample queries before trusting a join.

Now ask an AI model to do the same thing with only table names and column names.

It will guess.

Sometimes the guess will be right.

Sometimes it will be dangerously plausible.

The Fanout Problem

One of the most common issues is fanout.

Suppose you join orders to order_lines and then to shipments.

SELECT
  o.customer_id,
  SUM(o.order_amount) AS revenue
FROM orders o
JOIN order_lines l
  ON o.order_id = l.order_id
JOIN shipments s
  ON l.line_id = s.line_id
GROUP BY o.customer_id;

If one order line can have multiple shipments, order revenue may be counted multiple times.

The SQL is valid.

The join is valid.

The result is not valid for revenue reporting.

A human analyst may know to aggregate at the order level first, or use a shipment-adjusted revenue table, or avoid this path altogether.

A model needs that knowledge in context.

Why Metadata Alone Is Not Enough

Metadata helps, but it does not solve the full problem.

Column names can tell you that customer_id relates to customers.

They do not tell you whether this is the right customer relationship for financial reporting.

Foreign keys can tell you that a relationship exists.

They do not tell you whether the relationship is safe for aggregation.

Descriptions can tell you what a table contains.

They do not always explain historical exceptions.

That is why enterprise Text-to-SQL needs more than schemas.

It needs relationship context.

What Relationship Context Should Include

At minimum, an AI query system should know:

Candidate join paths between tables
Approved join paths for common business questions
Relationship cardinality
Known fanout risks
Join confidence
Source of relationship evidence
Whether the relationship came from constraints, SQL history, naming patterns, dbt models, BI datasets, or human approval
Which paths are rejected or deprecated

This changes the behavior of the system.

Instead of simply generating SQL, it can reason about query safety.

It can choose the trusted path.

It can warn when no approved path exists.

It can ask for clarification when multiple paths are possible.

That is far more useful than blindly producing a query.

A Better Pattern

A more reliable Text-to-SQL system should work like this:

Parse the user question.
Identify business entities and metrics.
Resolve semantic definitions.
Retrieve candidate tables.
Retrieve trusted relationship paths.
Check join risks.
Generate SQL using approved paths.
Validate the SQL against semantic and relationship rules.
Explain the assumptions behind the result.

The key difference is step 5 and step 6.

Many systems jump from semantic mapping directly to SQL generation.

That is where errors enter.

The missing layer is relationship intelligence.

Final Thought

Text-to-SQL is not just a language translation problem.

It is a context problem.

Models are getting better at writing SQL.

But enterprise analytics requires more than syntactically correct SQL.

It requires knowing which joins are safe, which paths are trusted, and which assumptions should be checked before the query runs.

Until that context exists, Text-to-SQL will continue to work well in demos and struggle in real companies.

Why AI Analytics Has a Knowledge Problem

ArisynData — Wed, 24 Jun 2026 13:38:00 +0000

One thing I’ve noticed while working with enterprise analytics systems:

The hardest problems are rarely technical.

Most modern models can generate SQL.

Most warehouses are well documented.

Most organizations have catalogs and governance programs.

Yet teams still depend heavily on a handful of experienced engineers.

Why?

Because analytics depends on knowledge, not just data.

For example:

A schema may tell you that three customer tables exist.

It doesn’t tell you:

· which one is authoritative

· which one is historical

· which one executive reporting relies on

Experienced engineers know the difference.

AI doesn’t.

That’s why many enterprise analytics failures aren’t caused by bad SQL generation.

They’re caused by missing organizational knowledge.

As AI adoption accelerates, I think we’re going to spend less time talking about prompts and more time talking about knowledge infrastructure.

Because models can’t use knowledge that organizations never captured.

Why AI Keeps Generating Bad SQL Even When The Schema Is Correct

ArisynData — Mon, 22 Jun 2026 13:32:00 +0000

One thing I've noticed while testing Text-to-SQL systems:
The schema is often fine.
The model is often fine.
The SQL is often syntactically correct.
The answer is still wrong.
Why?
Because SQL generation isn't the hard part.
Join selection is.
Imagine a warehouse containing:
orders
customers
subscriptions
invoices
accounts
A model may know all five tables exist.
The challenge is deciding:
Which relationship path should be used?
That's where many systems fail.
Most Text-to-SQL architectures focus on:
Schema → Prompt → SQL
But production environments usually require:
Schema
↓
Relationship Discovery
↓
Trusted Join Path
↓
Prompt
↓
SQL
Without relationship context, the model is forced to guess.
And enterprise analytics is a terrible place for guessing.