Xusheng Cao

Posted on Feb 26

Building a Real-Time Customer Support System in .NET - Prologue: Architecture, Constraints, and Engineering Decisions

#ai #programming #productivity #architecture

Most people think a customer support system is just a chat box.

It isn’t.

Behind that small floating widget lies a real-time messaging core that must deliver messages with low latency under unpredictable traffic. There is concurrency control to manage thousands — sometimes hundreds of thousands — of simultaneous connections. There is memory management under pressure, where inefficient allocation patterns can quietly introduce latency spikes. There is multi-tenant isolation to ensure that one customer’s workload never impacts another’s stability. There are deployment strategies that differ fundamentally between SaaS and on-premises environments. There are reliability guarantees that determine whether a message is delivered once, at least once, or lost under edge conditions. And beneath all of that, there are long-term architectural trade-offs that shape how the system evolves over years, not weeks.

What appears simple on the surface is, in reality, a layered system balancing performance, isolation, reliability, and maintainability. The interface may look minimal, but the engineering beneath it is anything but.

Over the past few years, I built a real-time customer support system called ShenDesk. It supports both SaaS and on-premises deployment, and it is designed around high concurrency, maintainability, and architectural clarity. What started as a minimal prototype gradually evolved into a production-grade system with a real messaging pipeline, tenant isolation, visitor tracking, extensible APIs, and a structure designed to adapt rather than accumulate accidental complexity.

This series is a technical record of that evolution — the decisions, the constraints, the revisions, and the lessons learned while building and refining a real-time infrastructure system from the ground up.

Why Write This Series?

Engineering transparency builds trust — especially in infrastructure-level software.

There are many products in the customer support space, including established platforms such as Zendesk and Intercom. They are mature systems, backed by significant resources, and refined through years of iteration. From the outside, they present clean interfaces and well-designed user experiences.

What often remains invisible, however, is the engineering reasoning that makes such systems viable at scale. The UI may appear straightforward, but the infrastructure beneath it must operate continuously under load, tolerate bursts of traffic, isolate tenants reliably, and evolve without collapsing under its own complexity. Those design choices — the trade-offs, constraints, and revisions — rarely surface in public discussions.

How do you design a real-time messaging core in .NET that remains stable under sustained concurrency? How do you manage six-figure WebSocket connections without introducing latency spikes? How do you reduce GC pressure in a system where allocation patterns directly influence responsiveness? What does multi-tenant isolation look like beyond theoretical diagrams? And what architectural compromises emerge when enterprise clients require on-premises deployment instead of pure SaaS delivery?

These questions are not answered by feature lists. They are answered through architectural thinking shaped by constraints.

This series will explore those questions from an engineering perspective. It will examine decisions that seemed reasonable at the time but later required revision. It will discuss trade-offs between simplicity and scalability, between abstraction and performance, and between immediate delivery and long-term maintainability.

It is not a marketing campaign, and it is not a step-by-step tutorial that hands out production-ready code. Instead, it is a structured exploration of architectural decisions, constraints encountered in practice, and lessons learned while building and evolving a real-time system over time.

What This Series Will Cover

This series will follow the system’s evolution from its earliest assumptions to its more mature architectural form. It will begin with the fundamental question of why building a customer support system from scratch made sense at all — and what constraints justified that decision. From there, it will move through the transition from a minimal prototype to a production-grade architecture, examining how early simplicity gradually gave way to structural rigor.

The discussions will cover the design of a real-time messaging core in .NET and the practical realities of handling high-concurrency WebSocket connections under sustained load. They will explore memory allocation patterns and GC behavior in latency-sensitive environments, not as theoretical topics but as forces that directly influence user experience. Stability mechanisms such as backpressure, queue management, and failure isolation will be examined in the context of preventing cascading breakdowns.

As the system expanded, architectural concerns shifted toward multi-tenant isolation, deployment flexibility, and reliability guarantees. Supporting both SaaS and on-premises environments introduced different operational constraints and forced explicit decisions about configurability, automation, and environmental assumptions. Messaging reliability models — and the compromises between theoretical guarantees and operational practicality — became central considerations.

Later articles will reflect on rewriting core components when early design decisions proved insufficient, designing extensible APIs without over-engineering abstractions, and integrating AI capabilities into a latency-sensitive, real-time pipeline without destabilizing the underlying system.

Each article will focus on engineering reasoning rather than surface-level features. Where appropriate, I will discuss trade-offs, rejected alternatives, and architectural compromises — because real systems are shaped more by constraints than by ideals. Architectural diagrams may suggest clarity, but clarity often emerges only after revision.

This series is not organized around what a product can do. It is organized around how a system survives growth, load, and time.

The planned topics include:

Why Build a Customer Support System from Scratch?
From MVP to Production Architecture
Designing a Real-Time Messaging Core in .NET
Handling High-Concurrency WebSocket Connections
Memory Optimization and GC Pressure in Real-Time Systems
Backpressure and Stability in Messaging Pipelines
Multi-Tenant Architecture: Logical vs Physical Isolation
Designing for On-Premises Deployment
Message Reliability and Delivery Guarantees
Lessons from Rewriting Core Components
Open API Design for Extensibility
Integrating AI into a Real-Time Support System
Each article will focus on engineering reasoning rather than surface-level features.

Where appropriate, I will discuss trade-offs, rejected alternatives, and architectural compromises — because real systems are shaped more by constraints than by ideals.

Scope and Boundaries

This series will focus on architecture, design thinking, and engineering lessons drawn from building and evolving a real-time system in production. The emphasis will be on reasoning: why certain decisions were made, what constraints shaped them, and how those decisions held up over time.

It will not disclose sensitive security details, customer data, deployment-specific configurations, or internal proprietary implementation mechanisms. Real systems operate within operational contexts that cannot — and should not — be exposed publicly. Stability and security are not theoretical concerns; they are practical responsibilities.

Where examples are provided, they will be abstracted to illustrate patterns rather than reveal internal structure. Benchmarks may be discussed in directional terms rather than exact production numbers. Architectural diagrams, if included, will represent conceptual models rather than literal infrastructure layouts.

The intention is to share thinking, not to expose operational risk. Engineering transparency does not require publishing every line of code or every deployment detail. It requires clarity about the decisions that shape a system and honesty about the trade-offs that accompany them.

Boundaries are not limitations of the discussion; they are part of responsible engineering practice.

Who This Series Is For

This series may be useful if you are building real-time systems where latency, concurrency, and reliability are not abstract concerns but daily engineering constraints. It may resonate with you if you are designing SaaS infrastructure and finding that architectural decisions rarely remain isolated — they ripple outward into deployment models, tenant isolation, operational complexity, and long-term maintainability.

If you are exploring multi-tenant architecture beyond theoretical diagrams, or working with WebSocket-heavy workloads where connection lifecycle management and memory behavior directly affect system stability, the topics discussed here may align with the problems you are facing. Likewise, if you care about designing systems that remain understandable and adaptable after years of iteration — not just weeks of development — this series is written with that perspective in mind.

This is not content optimized for speed of consumption. It assumes a willingness to think in terms of trade-offs rather than absolutes, and in terms of evolution rather than quick solutions. The focus is not on feature comparison, but on structural reasoning. It is not about building something that works once, but about building something that continues to work as complexity grows.

If you are simply looking for a quick “how to build a chat app in 10 minutes” guide, this will not be that. There are many excellent resources for rapid prototyping. This series is concerned with what happens after the prototype begins to encounter real traffic, real constraints, and real operational responsibility.

A Long-Term Engineering Record

Software systems evolve.

They accumulate complexity as features are added and edge cases surface. They encounter unexpected bottlenecks that were invisible in early designs. They force you to revisit assumptions that once seemed entirely reasonable. What appears clean in an architectural diagram often becomes layered and negotiated in production reality.

No system remains identical to its first version. Concurrency models are adjusted. Data structures are replaced. Message pipelines are rewritten. Deployment workflows are simplified — and sometimes complicated again — as new requirements emerge. Over time, the challenge is no longer just making the system work, but ensuring that it continues to work without collapsing under accumulated decisions.

This series is an attempt to document that journey — not as a polished success story, but as an ongoing engineering process. It will include revisions, structural corrections, and moments where initial confidence gave way to deeper understanding. Stability is rarely achieved through a single correct decision; it is usually the result of iteration, constraint, and refinement.

In infrastructure-level SaaS systems, evolution is not optional. Real traffic exposes hidden weaknesses. Real customers introduce requirements that reshape architecture. Real operational responsibility changes how trade-offs are evaluated. Over time, engineering becomes less about adding capabilities and more about preserving clarity while adapting to change.

If you are interested in real-time architecture, distributed design trade-offs, or the realities of building and maintaining systems that must operate continuously, you may find value in following this record.

Closing Notes

If you find these discussions useful, I welcome your thoughts, corrections, and alternative perspectives. Engineering benefits from dialogue, and thoughtful disagreement often sharpens architectural clarity. Thank you in advance to those who choose to follow this series and engage with it seriously.

ShenDesk is not a conceptual project. It is a real product, used in production environments by real users. The ideas discussed here are grounded in practical implementation rather than theoretical modeling. If you are curious about the product itself, you can learn more at:

https://www.shendesk.com

Conversations around system design are rarely complete within a single article. If you are building similar infrastructure, facing comparable trade-offs, or simply interested in discussing real-time architecture in depth, I would be glad to connect and exchange ideas.

Engineering is rarely a solitary endeavor — even when the system is built by a single developer.

Top comments (5)

Web Developer Hyper • Feb 26

Wow! You’re managing such a complex system and incorporating many concepts into it. You really have great skills. Good luck on your journey! 😄

Xusheng Cao • Feb 27

Thank you so much for your kind words!

ShenDesk has been a long-term engineering journey for me. Building a real-time system from scratch forced me to think deeply about concurrency, memory management, multi-tenant architecture, and production stability — not just features.

I’m glad the technical depth resonated with you. I’ll continue sharing more details about the architecture and trade-offs behind it. If there’s any specific area you’d like me to dive deeper into, I’d love to hear your thoughts.

Thanks again for the encouragement!

Web Developer Hyper • Feb 27

Thank you! Looking forward to your upcoming posts. 😀

Matthew Hou • Feb 27

This is the kind of architecture writeup that rarely gets published — the prologue where you explain why the constraints exist before showing how you solved them.

The multi-tenancy isolation point is particularly interesting. That's exactly the kind of requirement that AI coding tools struggle with. An AI can generate a perfectly functional messaging system, but getting the tenant isolation right requires understanding cross-cutting concerns that span the entire system. AI tends to optimize locally — it writes great code for one module but misses the global invariants.

I'd be curious to see how you approach AI-assisted development on a system like this, if at all. My experience with similarly complex architectures is that AI works best when the module boundaries are crystal clear and the interfaces are explicit. The more implicit global state the system has, the more likely AI is to introduce subtle cross-tenant bugs.

Looking forward to the next parts of the series.

Xusheng Cao • Feb 27

Thank you for such a thoughtful comment — I completely agree with your perspective on AI-assisted development.

Your point about AI optimizing locally while missing global invariants is exactly my experience. In a system like ShenDesk, especially with strict multi-tenancy isolation, the real complexity lives in cross-cutting constraints and architectural boundaries — not in individual functions or modules. That kind of global reasoning still requires human control.

I have started using AI in my workflow, but in a very constrained way. I design and control the overall architecture, tenant boundaries, core abstractions, and main execution flows myself. AI only assists in scenarios where both the input and output are extremely explicit — typically well-defined, self-contained code snippets. In those cases, it significantly improves efficiency, and the correctness is easy to verify because the boundaries are clear and testable.

In other words, I treat AI as a precision tool, not an architect.

Thank you again for the encouragement and for engaging so deeply with the ideas. I truly appreciate the support, and I’ll continue writing this series with the same level of transparency about the engineering decisions behind the system. 😊