DEV Community

Cover image for Why I Stopped Sending Data to LLMs: Introducing "Zero-Data Transport" Architecture
Daniel da Rosa
Daniel da Rosa

Posted on

Why I Stopped Sending Data to LLMs: Introducing "Zero-Data Transport" Architecture

Why I Stopped Sending Data to LLMs: Introducing "Zero-Data Transport" Architecture

The Problem with "Chat with your Data"

Let's be honest: the standard approach to RAG (Retrieval-Augmented Generation) for structured data is broken.

You know the drill:

User asks a question -> You run a query -> You fetch 500 rows -> You stuff those 500 rows into the LLM context -> You pray it doesn't hallucinate (or go broke on token costs).

I realized this wasn't scalable for Enterprise ERPs with huge schemas. So, I decided to flip the script.

What if we never sent the data to the AI?

Meet ADA: The "Zero-Data Transport" Agent

I’ve been architecting ADA (Autonomous Data Agent), a system designed to solve the "Context Window" problem using a technique I call Zero-Data Transport.

The concept is simple but powerful: treat data context like a .zip file.
Instead of sending the raw data payload to the LLM, we send a Context ID. The data remains in the database/cache, and the AI only manipulates the logic, never the rows.

The Secret Sauce: CTE Injection Strategy

Here is the engineering breakthrough. When a user asks to "filter the previous results", instead of re-sending the data, we use Common Table Expressions (CTEs).

1. The "Zip" (Redis)

When a query runs, we save the SQL logic and metadata in Redis. We return the data to the user, but for the Agent, we only give a Token ID (e.g., CTX_123).

The Zip Strategy Diagram

2. The Logic (LLM)

The LLM receives a prompt like: "You have a virtual table called PREVIOUS_RESULT. User wants top 5. Write the SQL."

The LLM output is tiny and cheap:

SELECT * FROM PREVIOUS_RESULT ORDER BY total DESC FETCH FIRST 5 ROWS ONLY
Enter fullscreen mode Exit fullscreen mode

3. The Injection (Backend)

My orchestrator (Java) pulls the original SQL from Redis and injects it into a CTE:

WITH PREVIOUS_RESULT AS (
    -- The original monster query from Redis is injected here
    SELECT ... FROM sales JOIN items ... WHERE year = 2024
)
SELECT * FROM PREVIOUS_RESULT ORDER BY total DESC FETCH FIRST 5 ROWS ONLY
Enter fullscreen mode Exit fullscreen mode

Result? Zero data transfer to the cloud. Zero chance of hallucinating values. 95% token savings.

CTE Injection Flowchart

The Stack (Enterprise Grade)

To make this robust, I moved away from simple vector stores and built a Converged Architecture:

  • Oracle 23ai: Handles both Relational Data and Vector Embeddings in the same engine. No more sync lag.
  • Neo4j: Acts as the "GPS". It validates JOIN paths so the LLM doesn't invent relationships that don't exist.
  • Redis: The ephemeral memory (Session Store) for our Context IDs.

ADA Architecture Stack

Why This Matters

We are moving from "Chatbots" to Agentic Engineering. By using Semantic Compression (mapping complex schemas to simple aliases) and CTE Injection, we turn fragile demos into robust, secure, and cheap enterprise software.

I'm currently implementing a Self-Optimization Layer where the system learns from "Context Misses" to create its own guardrails. But that's a topic for Part 2.

Self Optimization Loop

Does your RAG architecture handle 10k rows without breaking the bank? Let's discuss in the comments!

Top comments (2)

Collapse
 
a67793581_93 profile image
Carlo

If it's just a CRUD operation, then a SQL proxy mechanism can be used. But what if you need to extract a concept from the data, such as finding the author of this image or identifying which city this building in is from?