code_sherpa

Posted on Mar 6

How I simulated a Distributed System in the Browser

#webdev #distributedsystems #react #systemdesign

Building distributed systems is hard. Understanding their behavior under load is
even harder. What if you could design, configure, and stress-test a distributed
system architecture without spinning up a single server? That's exactly what I
built—a fully interactive distributed system simulator that runs entirely in
your browser.

The Problem: Learning System Design is Abstract

When preparing for system design interviews or architecting real-world systems,
we often sketch boxes and arrows on whiteboards. "Put a load balancer here, add
a cache there, maybe throw in some message queues for async processing." But how
do you really know if your design will handle 10,000 requests per second? What
happens when a database replica fails? Where will the bottlenecks appear?

Traditional approaches require either:

Production experience: Learn by breaking things in prod (not recommended)
Complex local setups: Docker compose files with dozens of services
Mental simulation: "I think this should work..."

I wanted something better—a visual, interactive playground where you can build
realistic distributed systems and watch them respond to load in real-time.

The Solution: A Browser-Based Simulation Engine

I built a System Design Simulator using React, TypeScript, and React Flow.
It lets you:

Drag-and-drop components (load balancers, databases, caches, message queues, etc.)
Configure their capacity, latency, failure rates, and replication
Connect them with edges that define read/write/async traffic flow
Run realistic simulations and observe metrics, bottlenecks, and costs

Architecture: Simulating Reality

Component Library (25+ Components)

The simulator includes authentic distributed system components, each with
realistic behavior.

The Simulation Engine: How It Works

The heart of the system is a tick-based simulation engine that processes
requests through your architecture 10 times per second.

Request Flow Processing

Here's what happens each tick (100ms):

function simulationTick(nodes, edges, tickDuration = 0.1) {
    // 1. Get all client nodes (traffic sources)
    const clients = nodes.filter((n) => n.data.componentType === "client");

    // 2. Build connection graph from edges
    const graph = buildConnectionGraph(edges);

    // 3. Check circuit breaker recovery times
    checkCircuitBreakerRecovery(nodes, currentTime);

    // 4. For each client, generate requests
    for (const client of clients) {
        const requestsPerTick = client.data.rps / 10;

        for (let i = 0; i < requestsPerTick; i++) {
            // Determine request type (read/write/async)
            const type = determineRequestType(client.data);

            // Process request through the system
            const result = processRequest(client, type, graph, nodes);

            // Track metrics
            if (result.success) {
                successfulRequests++;
                latencies.push(result.latency);
            } else {
                failedRequests++;
            }
        }
    }

    // 5. Calculate bottlenecks, costs, and health states
    return aggregateMetrics(nodes, latencies, costs);
}

Component Processing

Each component type has specialized logic. For example, a cache implements
realistic caching behavior:

// Cache hit/miss logic
if (requestType === "read") {
    const hitRate = cacheData.hitRate || 0.8;
    if (Math.random() < hitRate) {
        // Cache hit - return immediately with cache latency
        return { success: true, latency: cacheData.latency };
    }
    // Cache miss - continue to backend
}

// Write modes
if (requestType === "write") {
    if (cacheMode === "write-through") {
        // Synchronous write to cache AND backend
        totalLatency += cacheLatency + backendLatency;
    } else if (cacheMode === "write-behind") {
        // Async write to backend
        totalLatency += cacheLatency;
        queueBackgroundWrite();
    } else if (cacheMode === "cache-aside") {
        // Just invalidate cache, let app write to backend
        totalLatency += backendLatency;
    }
}

Realistic Failure Handling

The simulator implements production-grade resilience patterns:

Circuit Breaker Pattern:

// Check if component is available
function isNodeAvailable(node, currentTime) {
    const {
        circuitBreakerState,
        circuitBreakerLastTrip,
        circuitBreakerRecoveryTime,
    } = node.data;

    if (circuitBreakerState === "open") {
        // Circuit is open - check if recovery time passed
        if (currentTime - circuitBreakerLastTrip < circuitBreakerRecoveryTime) {
            return false; // Still open
        }
        // Move to half-open state
        return "half-open";
    }

    return circuitBreakerState !== "unhealthy";
}

Automatic Retry with Fallback:

// If primary node fails, try replicas
if (!success && retryEnabled) {
    const fallbackNodes = findFallbackNodes(failedNode, graph);

    for (const fallback of fallbackNodes) {
        if (isNodeAvailable(fallback)) {
            const retryResult = processRequest(fallback, requestType, graph);
            if (retryResult.success) {
                return {
                    ...retryResult,
                    retried: true,
                    latency: retryResult.latency + retryLatency,
                };
            }
        }
    }
}

Smart Routing and Load Balancing

The routing system filters edges based on request types:

all: Any request can flow through
read-only: Only read requests (cache/replica reads)
write-only: Only writes (primary database writes)
async-only: Only async requests (message queues)

Load Balancer Algorithms are accurately simulated:

function selectNextNode(algorithm, availableNodes) {
    switch (algorithm) {
        case "round-robin":
            return availableNodes[requestCount % availableNodes.length];

        case "least-connections":
            return availableNodes.reduce((min, node) =>
                node.data.currentLoad < min.data.currentLoad ? node : min
            );

        case "random":
            return availableNodes[
                Math.floor(Math.random() * availableNodes.length)
            ];
    }
}

Real-Time Cost Calculation

Every component has realistic AWS-like pricing:

Per-request costs: API Gateway, Lambda, CDN
Per-hour costs: EC2 instances, containers, databases
Per-GB costs: Storage, data transfer, logs

function calculateComponentCost(node, tickDuration, requestsThisTick) {
    const { costPerRequest, costPerHour, costPerGB } = node.data;

    let totalCost = 0;

    // Request-based costs
    totalCost += requestsThisTick * costPerRequest;

    // Compute costs (hourly costs converted to per-tick)
    const replicas = node.data.replicas || 1;
    totalCost += (costPerHour * replicas / 3600) * tickDuration;

    // Storage costs (component-specific)
    if (node.data.componentType === "database") {
        const dataGB = (requestsThisTick * 0.001) / 1024 / 1024;
        totalCost += dataGB * costPerGB;
    }

    return totalCost;
}

Key Features

1. Visual Architecture Builder

Drag components from the palette, connect them with edges, and see your
architecture come to life. React Flow provides smooth, interactive node
manipulation.

2. Granular Configuration

Every component is configurable:

Capacity: Requests per second it can handle
Latency: Response time in milliseconds
Failure Rate: Probability of random failures
Replicas: Number of instances for scaling
Component-specific settings: Cache hit rates, queue sizes, routing algorithms, etc.

3. Real-Time Metrics Dashboard

Watch live metrics update 10 times per second:

Throughput: Successful requests/second
Latency: Average and P99 percentile
Error Rate: Failed requests percentage
Request Breakdown: Read/write/async counts
Resilience Metrics: Retries, circuit breaker trips, failovers

4. Bottleneck Detection

The simulator automatically identifies bottlenecks and suggests fixes:

"Web Server 1 is overloaded (125% capacity). Consider adding replicas or increasing capacity."
"Cache has low hit rate (45%). Consider increasing cache size or adjusting TTL."
"Message Queue is at 90% capacity. Consider adding consumers."

5. Cost Analysis

See real-time cost breakdown:

Total cost per second/hour/month
Cost per component
Cost optimization suggestions

6. Scenario Management

Save and load architecture scenarios:

E-commerce Platform: Load balancer → API Gateway → App Servers → Cache → Database
Social Media Feed: CDN → API Gateway → Cache → Message Queue → Stream Processor
Video Streaming: CDN → Object Storage with multi-region failover

Technical Challenges and Solutions

Challenge 1: Request Path Tracking

Problem: How do you track a request as it flows through multiple components
without infinite loops?

Solution: Visited set and path tracking:

const visited = new Set<string>();
const path: string[] = [];

while (hasMoreNodes && !visited.has(currentNode.id)) {
    visited.add(currentNode.id);
    path.push(currentNode.id);

    // Process current node
    const result = processComponent(currentNode, requestType);

    // Get next nodes based on graph edges
    currentNode = selectNextNode(graph, currentNode);
}

Challenge 2: Realistic Load Distribution

Problem: How do you simulate realistic load when you have replicas and load
balancers?

Solution: Track current load per node and implement accurate load balancing
algorithms:

// Update load after each request
node.data.currentLoad = (node.data.currentLoad || 0) + 1;

// Check capacity before accepting request
const maxCapacity = node.data.capacity * node.data.replicas;
if (node.data.currentLoad >= maxCapacity) {
    return { success: false, reason: "capacity_exceeded" };
}

// Reset load counts each tick
afterTick(() => {
    nodes.forEach((n) => n.data.currentLoad = 0);
});

Challenge 3: P99 Latency Calculation

Problem: How do you efficiently calculate the 99th percentile latency?

Solution: Collect all latency samples in an array and calculate percentiles:

function calculateP99Latency(latencies: number[]): number {
    if (latencies.length === 0) return 0;

    const sorted = [...latencies].sort((a, b) => a - b);
    const p99Index = Math.floor(sorted.length * 0.99);

    return sorted[p99Index];
}

Challenge 4: State Management Complexity

Problem: Managing simulation state, metrics history, node updates, and UI
state in React.

Solution: Zustand store with clean separation of concerns:

// Single source of truth for entire application
const useSystemDesignStore = create<StoreState>((set) => ({
    // Architecture state
    nodes: [],
    edges: [],

    // Simulation state
    simulation: { isRunning: false, isPaused: false },
    metricsHistory: [],

    // Actions
    startSimulation: () => {
        const interval = setInterval(() => {
            const result = runSimulationTick();
            set((state) => ({
                nodes: result.updatedNodes,
                metricsHistory: [...state.metricsHistory, result.metrics],
            }));
        }, 100);

        set({ simulation: { isRunning: true, interval } });
    },
}));

What I Learned

1. Distributed Systems Are Complex

Building even a simulation of distributed systems highlighted how many failure
modes exist:

Network partitions
Cascading failures
Thundering herd problems
Split-brain scenarios
Data consistency vs availability tradeoffs

2. React Flow is Powerful

React Flow made building the node-based interface straightforward. Custom node
types, edge routing, and viewport controls work beautifully.

3. Performance Matters in Simulations

Initially, I processed requests one-by-one sequentially. For 1000 RPS, this
meant 100 requests per tick—slow! I optimized by:

Batching similar operations
Using Set for O(1) visited checks
Avoiding deep cloning where possible
Memoizing expensive calculations

4. Real-World Patterns Are Nuanced

Implementing patterns like circuit breakers and retry logic taught me:

Circuit breakers need states: closed → open → half-open
Retries need exponential backoff and jitter
Health checks must distinguish degraded vs unhealthy
Failover requires careful replica selection

Try It Yourself

The best way to learn distributed systems is to build (and break) them. With
this simulator, you can:

Test if your architecture handles Black Friday traffic
Experiment with different caching strategies
Compare costs between serverless and container-based architectures
Understand why Netflix uses chaos engineering
Prepare for system design interviews with realistic scenarios

Technical Stack

React 19: UI framework
TypeScript: Type safety and developer experience
React Flow: Node-based interface
Zustand: Lightweight state management
Vite: Lightning-fast build tool
Lucide Icons: Beautiful, consistent icons

Key Takeaways

Building this simulator taught me more about distributed systems than reading
dozens of blog posts. When you implement the actual behavior—request routing,
load balancing, failure handling, cost calculation—the concepts solidify.

For learners: This shows that you don't need a data center to understand
distributed systems. Browser simulations can teach fundamental concepts.

For architects: Prototyping architectures in a visual simulator before
building them can reveal bottlenecks and save costly mistakes.

For interviewers/interviewees: This demonstrates deep understanding of
distributed systems concepts through implementation, not just theory.

Conclusion

Distributed systems don't have to be abstract boxes on a whiteboard. With modern
web technologies, we can build interactive, visual simulations that make these
complex systems tangible and understandable.

Whether you're learning system design, preparing for interviews, or architecting
real systems, having a playground to experiment in is invaluable. And the best
part? It all runs in your browser—no Docker, no Kubernetes, no cloud bills.

Now go build something, break it, then build it better. That's how you truly
learn distributed systems.

system-design-sim.dev

Questions or ideas? Drop them in the comments below. I'd love to hear about
the architectures you build or features you'd like to see.

DEV Community