DEV Community

ShannonData.AI
ShannonData.AI

Posted on

ShannonBase: The Lightweight Semantic Layer for Enterprise AI SQL

ShannonBase: The Lightweight Semantic Layer for Enterprise AI SQL

Star the repo
🧩 Submit PRs
🐞 Open Issues
💬 Join Discussion


Enterprise AI agents are entering a new phase.

The first generation of AI data products focused on a single capability:

Natural Language → SQL

And for a while, that worked surprisingly well.

Projects like Text2SQL, NL2SQL systems, and many AI BI tools proved that LLMs can generate SQL from plain English with impressive accuracy.

But once these systems entered real enterprises, a deeper problem emerged:

The hardest part of enterprise analytics is not SQL generation.

It’s business semantics.

That is exactly where ShannonBase is evolving differently.


The Problem with Traditional NL2SQL

Most current AI SQL systems follow the same architecture:

Natural Language
    ↓
LLM
    ↓
SQL
Enter fullscreen mode Exit fullscreen mode

ShannonBase itself originally started from this direction.

Current ShannonBase capabilities already include:

  • Automatic schema metadata reading
  • Table/column comment understanding
  • Prompt-based SQL generation
  • SQL validation and retry repair
  • Fully-qualified table enforcement
  • Multi-schema/table scope control
  • Multi-model support:
    • DeepSeek
    • Qwen
    • Llama
    • OpenAI-compatible models

Architecturally, it is a strong:

Schema-aware NL2SQL system

This works well for simple analytical queries.

But enterprise environments are never simple.


Why Enterprise Analytics Breaks NL2SQL

1. Metrics Are Not Columns

In enterprise systems:

  • GMV
  • Revenue
  • Active Users
  • New Customers
  • Retention

…are almost never raw fields.

They are combinations of:

  • aggregation logic
  • filters
  • business rules
  • status conditions
  • time windows

For example:

GMV != SUM(order_amount)
Enter fullscreen mode Exit fullscreen mode

It may require:

  • paid orders only
  • excluding refunds
  • excluding test users
  • timezone normalization
  • order status filtering

An LLM cannot reliably infer this from schema metadata alone.


2. Time Semantics Are Business Logic

Users ask questions like:

  • “this month”
  • “year to date”
  • “QoQ”
  • “YoY”
  • “up to now”

But every enterprise defines time differently.

Examples:

  • fiscal calendars
  • delayed revenue recognition
  • T+1 ingestion
  • timezone cutoffs
  • business-day alignment

Pure NL2SQL systems usually hallucinate these rules.


3. Join Paths Become Unmanageable

Real enterprise databases contain:

  • hundreds or thousands of tables
  • historical schemas
  • bridge tables
  • denormalized layers
  • legacy data marts

LLMs frequently produce:

  • incorrect joins
  • missing joins
  • duplicated joins
  • unstable join paths

Even if the SQL is syntactically correct,
the business answer can still be wrong.


4. Enterprise Schemas Are Messy

In reality:

  • naming conventions drift
  • comments are missing
  • tables become obsolete
  • duplicate datasets exist
  • historical migrations accumulate

Relying only on schema metadata inevitably creates hallucinations.

This is not a prompt engineering problem.

It is an architecture problem.


The Industry’s Response: Semantic Layers

To solve this, many companies moved toward semantic-layer architectures.

Examples include:

  • Cube Semantic Layer
  • LookML
  • Enterprise BI Semantic Engines

Their architecture looks like this:

Natural Language
    ↓
Semantic IR / LogicForm
    ↓
Semantic Runtime
    ↓
SQL / API / URL
Enter fullscreen mode Exit fullscreen mode

These systems introduce:

  • metric registries
  • business glossaries
  • entity modeling
  • join graphs
  • semantic planners
  • query compilers
  • ontology systems

The idea is powerful:

Move business understanding out of the LLM.

But there is a major problem.


Why Most Semantic Layer Projects Fail

Full semantic systems are extremely heavy.

They often become:

  • expensive
  • slow to deploy
  • difficult to maintain
  • dependent on specialized teams

Eventually, many organizations realize:

They are rebuilding a BI Operating System.

This starts to resemble previous generations of:

  • data governance platforms
  • enterprise data middle platforms
  • metadata governance systems

Projects become multi-year initiatives.

Adoption slows.

Maintenance explodes.

Many eventually stall or fail entirely.


ShannonBase’s Direction: Lightweight Semantic Layer

ShannonBase believes there is a better path.

Not:

“Build a giant semantic operating system.”

But:

“Inject business semantics directly into the NL2SQL workflow.”

The new architecture becomes:

Natural Language
    +
Schema Metadata
    +
Business Semantic Context
    ↓
LLM
    ↓
SQL
Enter fullscreen mode Exit fullscreen mode

This approach keeps the system:

  • lightweight
  • flexible
  • deployable
  • developer-friendly

While dramatically improving enterprise accuracy.


Introducing ShannonBase Lightweight Semantic Layer

Instead of forcing companies to build complex ontology systems,
ShannonBase introduces a simple but powerful concept:

System Semantic Tables

Example:

sys.nl_sql_semantics
Enter fullscreen mode Exit fullscreen mode

Users can maintain semantic definitions directly inside the database.

This includes:

  • metric definitions
  • join relationships
  • business terminology
  • synonyms
  • time semantics
  • filtering rules

Example:

Metric: GMV
Definition:
SUM(order_amount)
WHERE order_status = 'paid'
AND is_test = false
Enter fullscreen mode Exit fullscreen mode

Or:

Term: Active User
Definition:
Distinct users with at least one login event in 30 days
Enter fullscreen mode Exit fullscreen mode

Or:

Join Rule:
orders.user_id -> users.id
Preferred Join Type: LEFT JOIN
Enter fullscreen mode Exit fullscreen mode

The LLM no longer guesses business logic.

It retrieves semantic context before SQL generation.


Why This Matters

This architecture changes the role of the LLM.

Instead of asking the model to:

  • infer business definitions
  • invent joins
  • guess metrics

We let the model focus on what it does best:

  • reasoning
  • query composition
  • language understanding

And we externalize business truth into manageable semantic context.

This dramatically improves:

  • SQL stability
  • metric consistency
  • enterprise trust
  • explainability
  • maintainability

Without introducing a massive semantic platform.


The Key Insight

The future is probably not:

Pure NL2SQL

Nor:

Massive Semantic Operating Systems

The future is likely:

Lightweight Semantic-Augmented AI Data Agents

Systems that combine:

  • LLM flexibility
  • schema awareness
  • business semantic grounding

Without overwhelming operational complexity.

That is the direction ShannonBase is building toward.


Why Developers Like This Approach

ShannonBase keeps the developer experience simple.

You do not need to build:

  • ontology graphs
  • DSL compilers
  • semantic runtimes
  • massive metadata platforms

You only need to gradually define:

  • critical metrics
  • important joins
  • business vocabulary

This creates an incremental adoption model.

Teams can start small:

  • define GMV
  • define Active Users
  • define fiscal calendar rules

And improve accuracy immediately.

No “big bang” semantic migration required.


The Bigger Vision

AI Data Agents are evolving beyond SQL generation.

The next generation will require:

  • semantic grounding
  • business consistency
  • reliable execution
  • enterprise trust

But enterprises do not want another heavyweight platform.

ShannonBase aims to bridge that gap.

Not by replacing SQL.

Not by replacing BI systems.

But by making AI truly understand enterprise data semantics —
in a lightweight, practical, deployable way.


Final Thoughts

The core insight behind ShannonBase is simple:

Enterprise analytics problems are rarely SQL problems.

They are semantic problems.

And solving semantics does not necessarily require building a massive semantic operating system.

Sometimes, a lightweight semantic layer is enough to unlock reliable AI analytics at scale.

That is the direction ShannonBase is pursuing.

Top comments (0)