Xusheng Cao

Posted on Apr 20

RAG vs. Lucene: Architecting AI Knowledge Bases for On-Premises Customer Support Systems

#programming #ai #productivity #webdev

Hi everyone, I’m the creator of ShenDesk. For several years, I’ve been dedicated to building this secure, stable, and lightweight customer support system designed specifically for on-premises deployment. Seeing my product put into commercial use by real-world users is both deeply rewarding and the primary fuel for my work.

As we move into 2026, I feel this is a turning point. After years of iteration and trial-and-error—especially after last year’s intensive development push (you can see in our changelog that last year's updates exceeded the total of the previous few years combined)—I can now confidently say that ShenDesk has reached high-availability, enterprise-grade maturity. I want to sincerely thank the users who have supported me both professionally and personally throughout this journey.

For 2026, I’ve set a new goal: to document the development process of this system through a series of articles. This is partly to step back from the grind of pure coding and re-engage with the community to sharpen my writing and communication skills, but more importantly, I hope to find like-minded peers through this exchange of ideas.

In this series, I will not use AI to assist my writing, nor will I waste your time with long-winded introductions to basic concepts. Instead, I will focus entirely on the actual problems I encountered and my thought process in solving them. I’ll explain how I weighed different trade-offs and made the tough calls necessary to meet my core objectives: keeping the system secure, stable, reliable, lightweight, and fully deployable on-premises. These posts might be brief and the prose might be unpolished, so I appreciate your understanding.

If you’re interested in the product, feel free to check out the website: https://shendesk.com.

In this post, I will walk you through the architectural selection and thinking process behind the "Knowledge Base"—the core feature of ShenDesk's AI Intelligent Customer Service—and how I brought it to life.

Introduction

In the current wave of AI, no customer service software can afford to ignore it. Adding AI capabilities is no longer an option; it is a necessity. Generally, there are three ways to integrate AI into a customer support system:

Fully Managed AI Cloud Services: Upload the user's knowledge base to an AI platform and call their APIs to handle visitor inquiries in the chat window.
Self-Hosted AI Orchestration Platforms (e.g., Dify): Set up a dedicated platform like Dify and connect it to the support system via APIs. > Dify is an open-source LLM application development platform designed to simplify the creation and deployment of generative AI apps, providing a user-friendly interface for building production-grade AI applications.
Fully Built-in AI Capabilities: Implement native vectorization and Retrieval-Augmented Generation (RAG) within the system, alongside direct support for calling open-source models.

Trade-offs and Challenges

Option 1 (Cloud APIs) is the most lightweight and easiest to implement. However, it directly conflicts with the goal of 100% private deployment and 100% data sovereignty (keeping data local). For data-sensitive sectors like government, finance, and insurance, this is a deal-breaker.

Option 2 (Self-hosting Dify) is far too "heavy" for many small teams. Most small businesses lack dedicated IT specialists. Deploying and maintaining such a complex stack is a high barrier to entry, not to mention the requirement for GPU-equipped servers for model inference.

Option 3 (Fully Native AI), while powerful, carries a high development cost and significantly complicates deployment for smaller teams. It typically requires additional database components to support vector searches and, again, necessitates GPU servers for local inference. The barrier to entry remains prohibitively high.

My Situation and Goals

Let’s go back to my core mission: developing a secure, stable, reliable, lightweight, and self-hostable customer service system.

I specifically highlighted "lightweight" because, over years of development, I’ve worked with countless small teams—some with only a handful of people, or even solo founders. Often, they just want to add a simple chat feature to their existing website or app to communicate with potential leads and close deals. When they see complex system requirements and technical jargon, they’re immediately discouraged. What they need is simplicity—pure and simple.

Furthermore, these small teams often operate under tight server and bandwidth budgets. I’ve seen many users deploy their support system directly on their existing web server or on a budget-tier cloud instance (like a 2vCPU / 4GB RAM machine bought during a sale). This represents the vast majority of my user base. To be clear, their limited budget doesn't mean they have a high tolerance for instability or security flaws.

This essentially rules out Option 2 and Option 3. Unless I’m willing to abandon the majority of my users—which I’m not—those paths are non-starters.

That leaves Option 1: using managed AI cloud services.

However, an AI chatbot is more than just a chat window connected to a model. The real goal is to have the AI communicate based on the user's own knowledge base (e.g., answering specific questions like "How do I place an order?" or "What is your delivery window?").

The "knowledge base" is the bridge. If I rely entirely on a managed platform, users are forced to upload all their documents to a public cloud. For many, this is a deal-breaker. In an era where data security is a top priority—especially in government, finance, and insurance—moving sensitive data to the cloud is an immediate "no-go."

This brings us to a critical requirement: The knowledge base must reside 100% within the user's local database.

The most viable compromise is this: When a visitor asks a question, the system first searches the local knowledge base, constructs a prompt with the retrieved information, and then calls an AI model via API. The difference here is that I’m calling a raw model (like Gemini/GPT) rather than hosting the entire knowledge base on a third-party platform.

Managing the Local Knowledge Base

The core of this strategy is straightforward: How do we build and manage a local knowledge base?

While a vector database is technically the superior choice, I have a non-negotiable constraint—the on-premises deployment must remain lightweight and cannot increase the user's operational burden.

Consequently, my initial architectural concept became: Local Database + Full-Text Search + Top-N Retrieval + Prompt Engineering.

There are several mature solutions for full-text indexing:

Elasticsearch: The industry de facto standard. Its distributed architecture natively supports massive clusters, sharding, and replication.
OpenSearch: The open-source fork of Elasticsearch created by AWS following Elastic's licensing changes.
Solr: A veteran choice with strengths in precise word segmentation and traditional text retrieval, though its distributed scalability lags behind ES. It’s rarely a first choice for new projects today.
Others: Meilisearch (written in Rust), Typesense (written in C++), etc.

When I worked on large-scale corporate projects, we would default to Elasticsearch. Back then, we had big clients, multi-million dollar contracts, expansive server environments, and mature DevOps teams.

Now, I have none of those luxuries. The small teams I serve simply cannot afford to deploy a "heavy weapon" like Elasticsearch for their on-premises setup.

How do I resolve this dilemma? Looking at my current user base, their AI knowledge bases have a distinct characteristic: they are relatively small. If I told them my solution could query tens of thousands of documents across dozens of gigabytes in milliseconds, they would feel like I’m solving a problem they don't have.

Most small teams have a knowledge base consisting of only dozens of documents; reaching the hundreds is already rare. Let’s summarize the current constraints and requirements:

100% Data Sovereignty: Use a Local DB + Full-Text Search + Top-N Retrieval + Prompt Construction workflow, sending the final prompt to AI models like Gemini/GPT.
Extreme Portability: It must be lightweight with zero additional deployment overhead. Being able to index and retrieve a few hundred documents is more than enough.

Ultimately, only one mature choice remained: Lucene.

Lucene requires no standalone service installation. It is zero-dependency and can be integrated directly into the main server application. From the user's perspective during deployment, its presence is completely invisible.

In reality, Lucene is incredibly powerful. For collections ranging from 1 million to 10 million documents, query latency typically stays between 10–50ms. Using it to manage dozens or hundreds of documents for small-scale users is, quite frankly, effortless.

Conclusion

There is a wealth of excellent documentation and tutorials on how to use Lucene, so I won't rehash the basics here.

As I mentioned at the beginning, the goal of this series is to focus on the actual problems I encountered and the thought process behind solving them. I want to share how I weigh trade-offs and make the decisions necessary to achieve my core mission: building a secure, stable, reliable, and lightweight customer support system that truly belongs to the user.

My hope is that these articles serve as a candid chronicle of an independent developer’s journey. Thank you for your interest and support! :)

Top comments (2)

PEACEBINFLOW • Apr 24

The thing that resonates here isn't the specific technology choice—Lucene over Elasticsearch—but the reasoning behind it: designing for the user's actual scale, not the scale the industry tells you to expect.

There's a quiet pressure in software to architect for growth that may never come. Elasticsearch is the right answer if you have millions of documents, a DevOps team, and a cluster budget. But when you step back and actually measure what most small teams have—dozens of documents, a single VPS, no dedicated ops person—the "correct" solution from an enterprise playbook becomes actively wrong. It adds operational weight that the user never asked for and doesn't need.

I've noticed this pattern in my own work: the moment I stop asking "what's the best tool for this job in general?" and start asking "what's the best tool for this job at the scale my users actually operate at?", the answers shift dramatically. Sometimes it means embedded Lucene instead of a search cluster. Sometimes it means SQLite instead of Postgres. The tool isn't worse; it's just right-sized.

The part I'd be curious about is where the breaking point lives. You said most users have dozens of documents, rarely hundreds. But presumably some will grow. Have you thought about what the migration path looks like for the user who does eventually outgrow Lucene? Or is the bet that by the time they hit that scale, they'll have the resources to handle a migration themselves? That's not a criticism—sometimes "we'll cross that bridge when a real user actually reaches it" is the most honest answer. But I'm curious if you've sketched that line.

Xusheng Cao • Apr 24

Thank you for reading and for such a detailed and insightful response!

I couldn't agree more: a tool is just a tool. What truly matters is identifying the most appropriate one for the user's specific scenario.

Regarding your question: in the context of this article and Lucene, to be honest, if a user's knowledge base grows to a scale that Lucene can no longer support, the bottleneck from a system-wide perspective would have shifted far beyond just the knowledge base or Lucene itself. At that point, it would likely exceed the overall capacity of my current system. Supporting users at such a massive scale would require a comprehensive refactoring and strengthening of the entire architecture, turning it into a much larger systemic challenge.

This brings us back to the market positioning of my project: a lightweight customer service system that is easy to deploy and use in a single-machine environment, specifically tailored for smaller teams.

Rather than the challenges of scaling, I am more focused on future-proofing for high-demand clients who may require vector databases for AI capabilities. That is why, during the R&D of the AI customer service and knowledge base features, I conducted preliminary research on vector databases—even though they aren't implemented yet—and ensured the architecture leaves enough room for such a transition down the road.

Thanks again for your feedback and recognition. Much appreciated!