Hi everyone, I’m the creator of ShenDesk. For several years, I’ve been dedicated to building this secure, stable, and lightweight customer support system designed specifically for on-premises deployment. Seeing my product put into commercial use by real-world users is both deeply rewarding and the primary fuel for my work.
As we move into 2026, I feel this is a turning point. After years of iteration and trial-and-error—especially after last year’s intensive development push (you can see in our changelog that last year's updates exceeded the total of the previous few years combined)—I can now confidently say that ShenDesk has reached high-availability, enterprise-grade maturity. I want to sincerely thank the users who have supported me both professionally and personally throughout this journey.
For 2026, I’ve set a new goal: to document the development process of this system through a series of articles. This is partly to step back from the grind of pure coding and re-engage with the community to sharpen my writing and communication skills, but more importantly, I hope to find like-minded peers through this exchange of ideas.
In this series, I will not use AI to assist my writing, nor will I waste your time with long-winded introductions to basic concepts. Instead, I will focus entirely on the actual problems I encountered and my thought process in solving them. I’ll explain how I weighed different trade-offs and made the tough calls necessary to meet my core objectives: keeping the system secure, stable, reliable, lightweight, and fully deployable on-premises. These posts might be brief and the prose might be unpolished, so I appreciate your understanding.
If you’re interested in the product, feel free to check out the website: https://shendesk.com.
In this post, I will walk you through the architectural selection and thinking process behind the "Knowledge Base"—the core feature of ShenDesk's AI Intelligent Customer Service—and how I brought it to life.
Introduction
In the current wave of AI, no customer service software can afford to ignore it. Adding AI capabilities is no longer an option; it is a necessity. Generally, there are three ways to integrate AI into a customer support system:
- Fully Managed AI Cloud Services: Upload the user's knowledge base to an AI platform and call their APIs to handle visitor inquiries in the chat window.
- Self-Hosted AI Orchestration Platforms (e.g., Dify): Set up a dedicated platform like Dify and connect it to the support system via APIs. > Dify is an open-source LLM application development platform designed to simplify the creation and deployment of generative AI apps, providing a user-friendly interface for building production-grade AI applications.
- Fully Built-in AI Capabilities: Implement native vectorization and Retrieval-Augmented Generation (RAG) within the system, alongside direct support for calling open-source models.
Trade-offs and Challenges
Option 1 (Cloud APIs) is the most lightweight and easiest to implement. However, it directly conflicts with the goal of 100% private deployment and 100% data sovereignty (keeping data local). For data-sensitive sectors like government, finance, and insurance, this is a deal-breaker.
Option 2 (Self-hosting Dify) is far too "heavy" for many small teams. Most small businesses lack dedicated IT specialists. Deploying and maintaining such a complex stack is a high barrier to entry, not to mention the requirement for GPU-equipped servers for model inference.
Option 3 (Fully Native AI), while powerful, carries a high development cost and significantly complicates deployment for smaller teams. It typically requires additional database components to support vector searches and, again, necessitates GPU servers for local inference. The barrier to entry remains prohibitively high.
My Situation and Goals
Let’s go back to my core mission: developing a secure, stable, reliable, lightweight, and self-hostable customer service system.
I specifically highlighted "lightweight" because, over years of development, I’ve worked with countless small teams—some with only a handful of people, or even solo founders. Often, they just want to add a simple chat feature to their existing website or app to communicate with potential leads and close deals. When they see complex system requirements and technical jargon, they’re immediately discouraged. What they need is simplicity—pure and simple.
Furthermore, these small teams often operate under tight server and bandwidth budgets. I’ve seen many users deploy their support system directly on their existing web server or on a budget-tier cloud instance (like a 2vCPU / 4GB RAM machine bought during a sale). This represents the vast majority of my user base. To be clear, their limited budget doesn't mean they have a high tolerance for instability or security flaws.
This essentially rules out Option 2 and Option 3. Unless I’m willing to abandon the majority of my users—which I’m not—those paths are non-starters.
That leaves Option 1: using managed AI cloud services.
However, an AI chatbot is more than just a chat window connected to a model. The real goal is to have the AI communicate based on the user's own knowledge base (e.g., answering specific questions like "How do I place an order?" or "What is your delivery window?").
The "knowledge base" is the bridge. If I rely entirely on a managed platform, users are forced to upload all their documents to a public cloud. For many, this is a deal-breaker. In an era where data security is a top priority—especially in government, finance, and insurance—moving sensitive data to the cloud is an immediate "no-go."
This brings us to a critical requirement: The knowledge base must reside 100% within the user's local database.
The most viable compromise is this: When a visitor asks a question, the system first searches the local knowledge base, constructs a prompt with the retrieved information, and then calls an AI model via API. The difference here is that I’m calling a raw model (like Gemini/GPT) rather than hosting the entire knowledge base on a third-party platform.
Managing the Local Knowledge Base
The core of this strategy is straightforward: How do we build and manage a local knowledge base?
While a vector database is technically the superior choice, I have a non-negotiable constraint—the on-premises deployment must remain lightweight and cannot increase the user's operational burden.
Consequently, my initial architectural concept became: Local Database + Full-Text Search + Top-N Retrieval + Prompt Engineering.
There are several mature solutions for full-text indexing:
- Elasticsearch: The industry de facto standard. Its distributed architecture natively supports massive clusters, sharding, and replication.
- OpenSearch: The open-source fork of Elasticsearch created by AWS following Elastic's licensing changes.
- Solr: A veteran choice with strengths in precise word segmentation and traditional text retrieval, though its distributed scalability lags behind ES. It’s rarely a first choice for new projects today.
- Others: Meilisearch (written in Rust), Typesense (written in C++), etc.
When I worked on large-scale corporate projects, we would default to Elasticsearch. Back then, we had big clients, multi-million dollar contracts, expansive server environments, and mature DevOps teams.
Now, I have none of those luxuries. The small teams I serve simply cannot afford to deploy a "heavy weapon" like Elasticsearch for their on-premises setup.
How do I resolve this dilemma? Looking at my current user base, their AI knowledge bases have a distinct characteristic: they are relatively small. If I told them my solution could query tens of thousands of documents across dozens of gigabytes in milliseconds, they would feel like I’m solving a problem they don't have.
Most small teams have a knowledge base consisting of only dozens of documents; reaching the hundreds is already rare. Let’s summarize the current constraints and requirements:
- 100% Data Sovereignty: Use a Local DB + Full-Text Search + Top-N Retrieval + Prompt Construction workflow, sending the final prompt to AI models like Gemini/GPT.
- Extreme Portability: It must be lightweight with zero additional deployment overhead. Being able to index and retrieve a few hundred documents is more than enough.
Ultimately, only one mature choice remained: Lucene.
Lucene requires no standalone service installation. It is zero-dependency and can be integrated directly into the main server application. From the user's perspective during deployment, its presence is completely invisible.
In reality, Lucene is incredibly powerful. For collections ranging from 1 million to 10 million documents, query latency typically stays between 10–50ms. Using it to manage dozens or hundreds of documents for small-scale users is, quite frankly, effortless.
Conclusion
There is a wealth of excellent documentation and tutorials on how to use Lucene, so I won't rehash the basics here.
As I mentioned at the beginning, the goal of this series is to focus on the actual problems I encountered and the thought process behind solving them. I want to share how I weigh trade-offs and make the decisions necessary to achieve my core mission: building a secure, stable, reliable, and lightweight customer support system that truly belongs to the user.
My hope is that these articles serve as a candid chronicle of an independent developer’s journey. Thank you for your interest and support! :)
Top comments (0)