Building distributed systems is hard. Understanding their behavior under load is
even harder. What if you could design, configure, and stress-test a distributed
system architecture without spinning up a single server? That's exactly what I
built—a fully interactive distributed system simulator that runs entirely in
your browser.
The Problem: Learning System Design is Abstract
When preparing for system design interviews or architecting real-world systems,
we often sketch boxes and arrows on whiteboards. "Put a load balancer here, add
a cache there, maybe throw in some message queues for async processing." But how
do you really know if your design will handle 10,000 requests per second? What
happens when a database replica fails? Where will the bottlenecks appear?
Traditional approaches require either:
- Production experience: Learn by breaking things in prod (not recommended)
- Complex local setups: Docker compose files with dozens of services
- Mental simulation: "I think this should work..."
I wanted something better—a visual, interactive playground where you can build
realistic distributed systems and watch them respond to load in real-time.
The Solution: A Browser-Based Simulation Engine
I built a System Design Simulator using React, TypeScript, and React Flow.
It lets you:
- Drag-and-drop components (load balancers, databases, caches, message queues, etc.)
- Configure their capacity, latency, failure rates, and replication
- Connect them with edges that define read/write/async traffic flow
- Run realistic simulations and observe metrics, bottlenecks, and costs
Architecture: Simulating Reality
Component Library (25+ Components)
The simulator includes authentic distributed system components, each with
realistic behavior.
The Simulation Engine: How It Works
The heart of the system is a tick-based simulation engine that processes
requests through your architecture 10 times per second.
Request Flow Processing
Here's what happens each tick (100ms):
function simulationTick(nodes, edges, tickDuration = 0.1) {
// 1. Get all client nodes (traffic sources)
const clients = nodes.filter((n) => n.data.componentType === "client");
// 2. Build connection graph from edges
const graph = buildConnectionGraph(edges);
// 3. Check circuit breaker recovery times
checkCircuitBreakerRecovery(nodes, currentTime);
// 4. For each client, generate requests
for (const client of clients) {
const requestsPerTick = client.data.rps / 10;
for (let i = 0; i < requestsPerTick; i++) {
// Determine request type (read/write/async)
const type = determineRequestType(client.data);
// Process request through the system
const result = processRequest(client, type, graph, nodes);
// Track metrics
if (result.success) {
successfulRequests++;
latencies.push(result.latency);
} else {
failedRequests++;
}
}
}
// 5. Calculate bottlenecks, costs, and health states
return aggregateMetrics(nodes, latencies, costs);
}
Component Processing
Each component type has specialized logic. For example, a cache implements
realistic caching behavior:
// Cache hit/miss logic
if (requestType === "read") {
const hitRate = cacheData.hitRate || 0.8;
if (Math.random() < hitRate) {
// Cache hit - return immediately with cache latency
return { success: true, latency: cacheData.latency };
}
// Cache miss - continue to backend
}
// Write modes
if (requestType === "write") {
if (cacheMode === "write-through") {
// Synchronous write to cache AND backend
totalLatency += cacheLatency + backendLatency;
} else if (cacheMode === "write-behind") {
// Async write to backend
totalLatency += cacheLatency;
queueBackgroundWrite();
} else if (cacheMode === "cache-aside") {
// Just invalidate cache, let app write to backend
totalLatency += backendLatency;
}
}
Realistic Failure Handling
The simulator implements production-grade resilience patterns:
Circuit Breaker Pattern:
// Check if component is available
function isNodeAvailable(node, currentTime) {
const {
circuitBreakerState,
circuitBreakerLastTrip,
circuitBreakerRecoveryTime,
} = node.data;
if (circuitBreakerState === "open") {
// Circuit is open - check if recovery time passed
if (currentTime - circuitBreakerLastTrip < circuitBreakerRecoveryTime) {
return false; // Still open
}
// Move to half-open state
return "half-open";
}
return circuitBreakerState !== "unhealthy";
}
Automatic Retry with Fallback:
// If primary node fails, try replicas
if (!success && retryEnabled) {
const fallbackNodes = findFallbackNodes(failedNode, graph);
for (const fallback of fallbackNodes) {
if (isNodeAvailable(fallback)) {
const retryResult = processRequest(fallback, requestType, graph);
if (retryResult.success) {
return {
...retryResult,
retried: true,
latency: retryResult.latency + retryLatency,
};
}
}
}
}
Smart Routing and Load Balancing
The routing system filters edges based on request types:
- all: Any request can flow through
- read-only: Only read requests (cache/replica reads)
- write-only: Only writes (primary database writes)
- async-only: Only async requests (message queues)
Load Balancer Algorithms are accurately simulated:
function selectNextNode(algorithm, availableNodes) {
switch (algorithm) {
case "round-robin":
return availableNodes[requestCount % availableNodes.length];
case "least-connections":
return availableNodes.reduce((min, node) =>
node.data.currentLoad < min.data.currentLoad ? node : min
);
case "random":
return availableNodes[
Math.floor(Math.random() * availableNodes.length)
];
}
}
Real-Time Cost Calculation
Every component has realistic AWS-like pricing:
- Per-request costs: API Gateway, Lambda, CDN
- Per-hour costs: EC2 instances, containers, databases
- Per-GB costs: Storage, data transfer, logs
function calculateComponentCost(node, tickDuration, requestsThisTick) {
const { costPerRequest, costPerHour, costPerGB } = node.data;
let totalCost = 0;
// Request-based costs
totalCost += requestsThisTick * costPerRequest;
// Compute costs (hourly costs converted to per-tick)
const replicas = node.data.replicas || 1;
totalCost += (costPerHour * replicas / 3600) * tickDuration;
// Storage costs (component-specific)
if (node.data.componentType === "database") {
const dataGB = (requestsThisTick * 0.001) / 1024 / 1024;
totalCost += dataGB * costPerGB;
}
return totalCost;
}
Key Features
1. Visual Architecture Builder
Drag components from the palette, connect them with edges, and see your
architecture come to life. React Flow provides smooth, interactive node
manipulation.
2. Granular Configuration
Every component is configurable:
- Capacity: Requests per second it can handle
- Latency: Response time in milliseconds
- Failure Rate: Probability of random failures
- Replicas: Number of instances for scaling
- Component-specific settings: Cache hit rates, queue sizes, routing algorithms, etc.
3. Real-Time Metrics Dashboard
Watch live metrics update 10 times per second:
- Throughput: Successful requests/second
- Latency: Average and P99 percentile
- Error Rate: Failed requests percentage
- Request Breakdown: Read/write/async counts
- Resilience Metrics: Retries, circuit breaker trips, failovers
4. Bottleneck Detection
The simulator automatically identifies bottlenecks and suggests fixes:
- "Web Server 1 is overloaded (125% capacity). Consider adding replicas or increasing capacity."
- "Cache has low hit rate (45%). Consider increasing cache size or adjusting TTL."
- "Message Queue is at 90% capacity. Consider adding consumers."
5. Cost Analysis
See real-time cost breakdown:
- Total cost per second/hour/month
- Cost per component
- Cost optimization suggestions
6. Scenario Management
Save and load architecture scenarios:
- E-commerce Platform: Load balancer → API Gateway → App Servers → Cache → Database
- Social Media Feed: CDN → API Gateway → Cache → Message Queue → Stream Processor
- Video Streaming: CDN → Object Storage with multi-region failover
Technical Challenges and Solutions
Challenge 1: Request Path Tracking
Problem: How do you track a request as it flows through multiple components
without infinite loops?
Solution: Visited set and path tracking:
const visited = new Set<string>();
const path: string[] = [];
while (hasMoreNodes && !visited.has(currentNode.id)) {
visited.add(currentNode.id);
path.push(currentNode.id);
// Process current node
const result = processComponent(currentNode, requestType);
// Get next nodes based on graph edges
currentNode = selectNextNode(graph, currentNode);
}
Challenge 2: Realistic Load Distribution
Problem: How do you simulate realistic load when you have replicas and load
balancers?
Solution: Track current load per node and implement accurate load balancing
algorithms:
// Update load after each request
node.data.currentLoad = (node.data.currentLoad || 0) + 1;
// Check capacity before accepting request
const maxCapacity = node.data.capacity * node.data.replicas;
if (node.data.currentLoad >= maxCapacity) {
return { success: false, reason: "capacity_exceeded" };
}
// Reset load counts each tick
afterTick(() => {
nodes.forEach((n) => n.data.currentLoad = 0);
});
Challenge 3: P99 Latency Calculation
Problem: How do you efficiently calculate the 99th percentile latency?
Solution: Collect all latency samples in an array and calculate percentiles:
function calculateP99Latency(latencies: number[]): number {
if (latencies.length === 0) return 0;
const sorted = [...latencies].sort((a, b) => a - b);
const p99Index = Math.floor(sorted.length * 0.99);
return sorted[p99Index];
}
Challenge 4: State Management Complexity
Problem: Managing simulation state, metrics history, node updates, and UI
state in React.
Solution: Zustand store with clean separation of concerns:
// Single source of truth for entire application
const useSystemDesignStore = create<StoreState>((set) => ({
// Architecture state
nodes: [],
edges: [],
// Simulation state
simulation: { isRunning: false, isPaused: false },
metricsHistory: [],
// Actions
startSimulation: () => {
const interval = setInterval(() => {
const result = runSimulationTick();
set((state) => ({
nodes: result.updatedNodes,
metricsHistory: [...state.metricsHistory, result.metrics],
}));
}, 100);
set({ simulation: { isRunning: true, interval } });
},
}));
What I Learned
1. Distributed Systems Are Complex
Building even a simulation of distributed systems highlighted how many failure
modes exist:
- Network partitions
- Cascading failures
- Thundering herd problems
- Split-brain scenarios
- Data consistency vs availability tradeoffs
2. React Flow is Powerful
React Flow made building the node-based interface straightforward. Custom node
types, edge routing, and viewport controls work beautifully.
3. Performance Matters in Simulations
Initially, I processed requests one-by-one sequentially. For 1000 RPS, this
meant 100 requests per tick—slow! I optimized by:
- Batching similar operations
- Using Set for O(1) visited checks
- Avoiding deep cloning where possible
- Memoizing expensive calculations
4. Real-World Patterns Are Nuanced
Implementing patterns like circuit breakers and retry logic taught me:
- Circuit breakers need states: closed → open → half-open
- Retries need exponential backoff and jitter
- Health checks must distinguish degraded vs unhealthy
- Failover requires careful replica selection
Try It Yourself
The best way to learn distributed systems is to build (and break) them. With
this simulator, you can:
- Test if your architecture handles Black Friday traffic
- Experiment with different caching strategies
- Compare costs between serverless and container-based architectures
- Understand why Netflix uses chaos engineering
- Prepare for system design interviews with realistic scenarios
Technical Stack
- React 19: UI framework
- TypeScript: Type safety and developer experience
- React Flow: Node-based interface
- Zustand: Lightweight state management
- Vite: Lightning-fast build tool
- Lucide Icons: Beautiful, consistent icons
Key Takeaways
Building this simulator taught me more about distributed systems than reading
dozens of blog posts. When you implement the actual behavior—request routing,
load balancing, failure handling, cost calculation—the concepts solidify.
For learners: This shows that you don't need a data center to understand
distributed systems. Browser simulations can teach fundamental concepts.
For architects: Prototyping architectures in a visual simulator before
building them can reveal bottlenecks and save costly mistakes.
For interviewers/interviewees: This demonstrates deep understanding of
distributed systems concepts through implementation, not just theory.
Conclusion
Distributed systems don't have to be abstract boxes on a whiteboard. With modern
web technologies, we can build interactive, visual simulations that make these
complex systems tangible and understandable.
Whether you're learning system design, preparing for interviews, or architecting
real systems, having a playground to experiment in is invaluable. And the best
part? It all runs in your browser—no Docker, no Kubernetes, no cloud bills.
Now go build something, break it, then build it better. That's how you truly
learn distributed systems.
Questions or ideas? Drop them in the comments below. I'd love to hear about
the architectures you build or features you'd like to see.
Top comments (0)