Most organizations chasing AI transformation are looking in the wrong direction. The highest-value data isn't in the shiny new tool , it's buried in the systems you've been running for years.
Every few months, I sit across from a technical leader who tells me some version of the same story: "We've got GPT wired up for internal chat, but nobody's using it." Or: "We built a chatbot, but it just makes things up." Or my personal favorite: "We tried RAG, but the results were garbage."
And almost every time, the problem isn't the model. It's the plumbing.
I run a company called Sprinklenet AI, where we build and deploy multi-LLM platforms , primarily Knowledge Spaces, our RAG-based system , for government agencies and enterprise clients. Over the past two years, I've watched the industry fixate on model selection (GPT-4 vs. Claude vs. Gemini) while almost completely ignoring the far harder, far more valuable problem: getting AI systems reliably connected to the raw operational data that actually drives decisions.
The gold isn't in the model. It's in the basement , in the ERP logs, the CRM records, the SharePoint graveyards, the PostgreSQL tables that nobody has touched since 2019. And the organizations that figure out how to connect AI to that data, securely and at scale, are the ones that will win.
The RAG Gap Nobody Talks About
Retrieval-Augmented Generation has become the default architecture for enterprise AI, and for good reason. Instead of fine-tuning a model on your data (expensive, brittle, and a governance nightmare), you retrieve relevant documents at query time and inject them into the prompt context. The model generates answers grounded in your actual information.
In theory, this is elegant. In practice, most RAG implementations fail at the retrieval step , not the generation step.
Here's what I mean. A typical proof-of-concept goes like this: someone uploads a few PDFs into a vector store, builds a simple semantic search pipeline, and demos it to leadership. "Look, it can answer questions about our procurement policy!" Everyone applauds. Budget gets approved.
Then reality hits. The production system needs to pull from Salesforce, a legacy SQL database, an internal wiki, and six different file shares with overlapping and contradictory versions of the same document. The PDFs that worked great in the demo turn out to represent maybe 3% of the organization's actual knowledge. The other 97% lives in structured databases, transactional systems, and formats that don't neatly convert to text chunks.
This is the RAG gap: the distance between "we can do semantic search on a folder of documents" and "we can give our people AI-powered access to everything they need to make decisions." It's enormous, and closing it is mostly an engineering problem, not an AI problem.
Why Data Connectors Are the Real Moat
When we architect Knowledge Spaces deployments, we spend roughly 60% of our integration time on data connectors , the unglamorous middleware that pulls information from source systems, normalizes it, chunks it appropriately, generates vector embeddings, and keeps everything in sync.
We've built connectors for more than 15 different source systems: Salesforce, PostgreSQL, REST APIs with OAuth flows, file systems, cloud storage. Each one has its own authentication model, rate limits, data schema, and update patterns. And each one requires a different chunking strategy to produce embeddings that actually return relevant results during retrieval.
This is the part that doesn't make it into conference talks. Nobody gives a keynote about spending three weeks tuning chunk sizes for a PostgreSQL connector so that semantic search over transactional records returns meaningful results. But that's the work that separates a demo from a system people actually rely on.
A few hard-won lessons:
Chunk size matters more than model choice. I've seen teams agonize over whether to use GPT-4o or Claude 3.5 while their retrieval pipeline is returning irrelevant context because they're splitting documents at arbitrary 500-token boundaries. For structured data from relational databases, we typically chunk by logical record , one row or one transaction per chunk, with schema metadata preserved. For long-form documents, overlapping chunks of 800-1200 tokens with section-header context prepended tend to outperform naive splitting. The right strategy depends entirely on how your users actually query the data.
Embeddings are not one-size-fits-all. Different embedding models perform differently depending on the domain and the nature of the queries. We've found that for government and defense use cases , where terminology is highly specific and acronym-dense , general-purpose embedding models underperform unless you prepend definitional context to chunks. Running a small evaluation set before committing to an embedding strategy saves weeks of debugging bad retrieval later.
Freshness is a first-class concern. Static RAG (upload once, query forever) works for reference documents. It falls apart for operational data. If your sales team is asking the AI about pipeline status and your Salesforce connector last synced three days ago, trust evaporates immediately. We run incremental sync jobs on configurable schedules , hourly for transactional data, daily for documents , and surface last-sync timestamps in the UI so users know what they're working with.
Security Isn't a Feature , It's the Foundation
Here's where enterprise RAG diverges most sharply from the open-source tutorials.
In a real deployment , especially in government , you can't just dump all your documents into a single vector store and let everyone query everything. That's a data spill waiting to happen. The AI system has to respect the same access controls that govern the source systems.
In Knowledge Spaces, we implement this through a four-tier RBAC hierarchy (Organization Owner, Admin, Contributor, Viewer) that controls not just who can query, but what data each query can retrieve against. When a user asks a question, the retrieval step filters the vector search results by that user's permissions before anything reaches the LLM. The model never sees data the user isn't authorized to access.
We also enforce SAML 2.0 SSO and support CAC/PKI authentication for defense clients , because if your AI platform has a separate login from everything else, your security team will (rightly) shut it down.
And then there's audit logging. We capture 64+ event types , every query, every retrieval, every model invocation, every document access. Not because we love logging, but because our government clients need to answer the question: "Who asked what, and what data informed the answer?" If you can't answer that question, you don't have a governed AI system. You have a liability.
The Multi-LLM Reality
One more pattern I want to surface, because I think it's underappreciated: in production, you almost certainly need more than one model.
We currently orchestrate across models from OpenAI, Anthropic, Google, Groq, and xAI , 16+ foundation models with support for tool calling, streaming, and structured JSON output. Different models excel at different tasks. Some are better at precise factual extraction. Others handle nuanced summarization more gracefully. Some are fast and cheap enough for high-volume classification tasks. Others are worth the latency for complex analytical queries.
The point isn't to have options for the sake of options. It's that when you're connecting AI to diverse enterprise data sources, the queries that hit your system are diverse too. A procurement analyst asking "What were the top three cost overruns on Program X last quarter?" needs a different model behavior than a policy researcher asking "How does this draft regulation compare to FAR Part 15?" Routing queries to the right model , and having guardrails that catch PII leakage, prompt injection, and off-topic responses regardless of which model is active , is table stakes for production deployment.
Start With the Basement
If I could give one piece of advice to a technical leader starting an enterprise AI initiative, it would be this: before you evaluate a single model, before you pick a vector database, before you write a line of prompt engineering , go inventory your data.
Map every system that holds information your people need to make decisions. Understand the access controls on each one. Document the update frequency. Figure out what's structured versus unstructured. Identify which sources overlap, which contradict each other, and which are authoritative.
Then build your RAG architecture around that map. Let the data topology drive the system design, not the other way around.
The organizations that get this right don't just get a better chatbot. They get something much more valuable: a single, governed, intelligent interface to their institutional knowledge. An interface that respects security boundaries, stays current with source systems, and gets smarter as more data flows through it.
The gold has been in the basement all along. You just need to build the stairs.
Jamie Thompson is the Founder and CEO of Sprinklenet AI, where he builds enterprise AI platforms for government and commercial clients. He writes weekly at newsletter.sprinklenet.com.
Top comments (0)