ShannonBase: The Lightweight Semantic Layer for Enterprise AI SQL
⭐ Star the repo
🧩 Submit PRs
🐞 Open Issues
💬 Join Discussion

Enterprise AI agents are entering a new phase.
The first generation of AI data products focused on a single capability:
Natural Language → SQL
And for a while, that worked surprisingly well.
Projects like Text2SQL, NL2SQL systems, and many AI BI tools proved that LLMs can generate SQL from plain English with impressive accuracy.
But once these systems entered real enterprises, a deeper problem emerged:
The hardest part of enterprise analytics is not SQL generation.
It’s business semantics.
That is exactly where ShannonBase is evolving differently.
The Problem with Traditional NL2SQL
Most current AI SQL systems follow the same architecture:
Natural Language
↓
LLM
↓
SQL
ShannonBase itself originally started from this direction.
Current ShannonBase capabilities already include:
- Automatic schema metadata reading
- Table/column comment understanding
- Prompt-based SQL generation
- SQL validation and retry repair
- Fully-qualified table enforcement
- Multi-schema/table scope control
- Multi-model support:
- DeepSeek
- Qwen
- Llama
- OpenAI-compatible models
Architecturally, it is a strong:
Schema-aware NL2SQL system
This works well for simple analytical queries.
But enterprise environments are never simple.
Why Enterprise Analytics Breaks NL2SQL
1. Metrics Are Not Columns
In enterprise systems:
- GMV
- Revenue
- Active Users
- New Customers
- Retention
…are almost never raw fields.
They are combinations of:
- aggregation logic
- filters
- business rules
- status conditions
- time windows
For example:
GMV != SUM(order_amount)
It may require:
- paid orders only
- excluding refunds
- excluding test users
- timezone normalization
- order status filtering
An LLM cannot reliably infer this from schema metadata alone.
2. Time Semantics Are Business Logic
Users ask questions like:
- “this month”
- “year to date”
- “QoQ”
- “YoY”
- “up to now”
But every enterprise defines time differently.
Examples:
- fiscal calendars
- delayed revenue recognition
- T+1 ingestion
- timezone cutoffs
- business-day alignment
Pure NL2SQL systems usually hallucinate these rules.
3. Join Paths Become Unmanageable
Real enterprise databases contain:
- hundreds or thousands of tables
- historical schemas
- bridge tables
- denormalized layers
- legacy data marts
LLMs frequently produce:
- incorrect joins
- missing joins
- duplicated joins
- unstable join paths
Even if the SQL is syntactically correct,
the business answer can still be wrong.
4. Enterprise Schemas Are Messy
In reality:
- naming conventions drift
- comments are missing
- tables become obsolete
- duplicate datasets exist
- historical migrations accumulate
Relying only on schema metadata inevitably creates hallucinations.
This is not a prompt engineering problem.
It is an architecture problem.
The Industry’s Response: Semantic Layers
To solve this, many companies moved toward semantic-layer architectures.
Examples include:
- Cube Semantic Layer
- LookML
- Enterprise BI Semantic Engines
Their architecture looks like this:
Natural Language
↓
Semantic IR / LogicForm
↓
Semantic Runtime
↓
SQL / API / URL
These systems introduce:
- metric registries
- business glossaries
- entity modeling
- join graphs
- semantic planners
- query compilers
- ontology systems
The idea is powerful:
Move business understanding out of the LLM.
But there is a major problem.
Why Most Semantic Layer Projects Fail
Full semantic systems are extremely heavy.
They often become:
- expensive
- slow to deploy
- difficult to maintain
- dependent on specialized teams
Eventually, many organizations realize:
They are rebuilding a BI Operating System.
This starts to resemble previous generations of:
- data governance platforms
- enterprise data middle platforms
- metadata governance systems
Projects become multi-year initiatives.
Adoption slows.
Maintenance explodes.
Many eventually stall or fail entirely.
ShannonBase’s Direction: Lightweight Semantic Layer
ShannonBase believes there is a better path.
Not:
“Build a giant semantic operating system.”
But:
“Inject business semantics directly into the NL2SQL workflow.”
The new architecture becomes:
Natural Language
+
Schema Metadata
+
Business Semantic Context
↓
LLM
↓
SQL
This approach keeps the system:
- lightweight
- flexible
- deployable
- developer-friendly
While dramatically improving enterprise accuracy.
Introducing ShannonBase Lightweight Semantic Layer
Instead of forcing companies to build complex ontology systems,
ShannonBase introduces a simple but powerful concept:
System Semantic Tables
Example:
sys.nl_sql_semantics
Users can maintain semantic definitions directly inside the database.
This includes:
- metric definitions
- join relationships
- business terminology
- synonyms
- time semantics
- filtering rules
Example:
Metric: GMV
Definition:
SUM(order_amount)
WHERE order_status = 'paid'
AND is_test = false
Or:
Term: Active User
Definition:
Distinct users with at least one login event in 30 days
Or:
Join Rule:
orders.user_id -> users.id
Preferred Join Type: LEFT JOIN
The LLM no longer guesses business logic.
It retrieves semantic context before SQL generation.
Why This Matters
This architecture changes the role of the LLM.
Instead of asking the model to:
- infer business definitions
- invent joins
- guess metrics
We let the model focus on what it does best:
- reasoning
- query composition
- language understanding
And we externalize business truth into manageable semantic context.
This dramatically improves:
- SQL stability
- metric consistency
- enterprise trust
- explainability
- maintainability
Without introducing a massive semantic platform.
The Key Insight
The future is probably not:
Pure NL2SQL
Nor:
Massive Semantic Operating Systems
The future is likely:
Lightweight Semantic-Augmented AI Data Agents
Systems that combine:
- LLM flexibility
- schema awareness
- business semantic grounding
Without overwhelming operational complexity.
That is the direction ShannonBase is building toward.
Why Developers Like This Approach
ShannonBase keeps the developer experience simple.
You do not need to build:
- ontology graphs
- DSL compilers
- semantic runtimes
- massive metadata platforms
You only need to gradually define:
- critical metrics
- important joins
- business vocabulary
This creates an incremental adoption model.
Teams can start small:
- define GMV
- define Active Users
- define fiscal calendar rules
And improve accuracy immediately.
No “big bang” semantic migration required.
The Bigger Vision
AI Data Agents are evolving beyond SQL generation.
The next generation will require:
- semantic grounding
- business consistency
- reliable execution
- enterprise trust
But enterprises do not want another heavyweight platform.
ShannonBase aims to bridge that gap.
Not by replacing SQL.
Not by replacing BI systems.
But by making AI truly understand enterprise data semantics —
in a lightweight, practical, deployable way.
Final Thoughts
The core insight behind ShannonBase is simple:
Enterprise analytics problems are rarely SQL problems.
They are semantic problems.
And solving semantics does not necessarily require building a massive semantic operating system.
Sometimes, a lightweight semantic layer is enough to unlock reliable AI analytics at scale.
That is the direction ShannonBase is pursuing.
Top comments (0)