DEV Community

Cover image for How I simulated a Distributed System in the Browser
code_sherpa
code_sherpa

Posted on

How I simulated a Distributed System in the Browser

Building distributed systems is hard. Understanding their behavior under load is
even harder. What if you could design, configure, and stress-test a distributed
system architecture without spinning up a single server? That's exactly what I
built—a fully interactive distributed system simulator that runs entirely in
your browser.

The Problem: Learning System Design is Abstract

When preparing for system design interviews or architecting real-world systems,
we often sketch boxes and arrows on whiteboards. "Put a load balancer here, add
a cache there, maybe throw in some message queues for async processing." But how
do you really know if your design will handle 10,000 requests per second? What
happens when a database replica fails? Where will the bottlenecks appear?

Traditional approaches require either:

  • Production experience: Learn by breaking things in prod (not recommended)
  • Complex local setups: Docker compose files with dozens of services
  • Mental simulation: "I think this should work..."

I wanted something better—a visual, interactive playground where you can build
realistic distributed systems and watch them respond to load in real-time.

The Solution: A Browser-Based Simulation Engine

I built a System Design Simulator using React, TypeScript, and React Flow.
It lets you:

  • Drag-and-drop components (load balancers, databases, caches, message queues, etc.)
  • Configure their capacity, latency, failure rates, and replication
  • Connect them with edges that define read/write/async traffic flow
  • Run realistic simulations and observe metrics, bottlenecks, and costs

Architecture: Simulating Reality

Component Library (25+ Components)

The simulator includes authentic distributed system components, each with
realistic behavior.

The Simulation Engine: How It Works

The heart of the system is a tick-based simulation engine that processes
requests through your architecture 10 times per second.

Request Flow Processing

Here's what happens each tick (100ms):

function simulationTick(nodes, edges, tickDuration = 0.1) {
    // 1. Get all client nodes (traffic sources)
    const clients = nodes.filter((n) => n.data.componentType === "client");

    // 2. Build connection graph from edges
    const graph = buildConnectionGraph(edges);

    // 3. Check circuit breaker recovery times
    checkCircuitBreakerRecovery(nodes, currentTime);

    // 4. For each client, generate requests
    for (const client of clients) {
        const requestsPerTick = client.data.rps / 10;

        for (let i = 0; i < requestsPerTick; i++) {
            // Determine request type (read/write/async)
            const type = determineRequestType(client.data);

            // Process request through the system
            const result = processRequest(client, type, graph, nodes);

            // Track metrics
            if (result.success) {
                successfulRequests++;
                latencies.push(result.latency);
            } else {
                failedRequests++;
            }
        }
    }

    // 5. Calculate bottlenecks, costs, and health states
    return aggregateMetrics(nodes, latencies, costs);
}
Enter fullscreen mode Exit fullscreen mode

Component Processing

Each component type has specialized logic. For example, a cache implements
realistic caching behavior:

// Cache hit/miss logic
if (requestType === "read") {
    const hitRate = cacheData.hitRate || 0.8;
    if (Math.random() < hitRate) {
        // Cache hit - return immediately with cache latency
        return { success: true, latency: cacheData.latency };
    }
    // Cache miss - continue to backend
}

// Write modes
if (requestType === "write") {
    if (cacheMode === "write-through") {
        // Synchronous write to cache AND backend
        totalLatency += cacheLatency + backendLatency;
    } else if (cacheMode === "write-behind") {
        // Async write to backend
        totalLatency += cacheLatency;
        queueBackgroundWrite();
    } else if (cacheMode === "cache-aside") {
        // Just invalidate cache, let app write to backend
        totalLatency += backendLatency;
    }
}
Enter fullscreen mode Exit fullscreen mode

Realistic Failure Handling

The simulator implements production-grade resilience patterns:

Circuit Breaker Pattern:

// Check if component is available
function isNodeAvailable(node, currentTime) {
    const {
        circuitBreakerState,
        circuitBreakerLastTrip,
        circuitBreakerRecoveryTime,
    } = node.data;

    if (circuitBreakerState === "open") {
        // Circuit is open - check if recovery time passed
        if (currentTime - circuitBreakerLastTrip < circuitBreakerRecoveryTime) {
            return false; // Still open
        }
        // Move to half-open state
        return "half-open";
    }

    return circuitBreakerState !== "unhealthy";
}
Enter fullscreen mode Exit fullscreen mode

Automatic Retry with Fallback:

// If primary node fails, try replicas
if (!success && retryEnabled) {
    const fallbackNodes = findFallbackNodes(failedNode, graph);

    for (const fallback of fallbackNodes) {
        if (isNodeAvailable(fallback)) {
            const retryResult = processRequest(fallback, requestType, graph);
            if (retryResult.success) {
                return {
                    ...retryResult,
                    retried: true,
                    latency: retryResult.latency + retryLatency,
                };
            }
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Smart Routing and Load Balancing

The routing system filters edges based on request types:

  • all: Any request can flow through
  • read-only: Only read requests (cache/replica reads)
  • write-only: Only writes (primary database writes)
  • async-only: Only async requests (message queues)

Load Balancer Algorithms are accurately simulated:

function selectNextNode(algorithm, availableNodes) {
    switch (algorithm) {
        case "round-robin":
            return availableNodes[requestCount % availableNodes.length];

        case "least-connections":
            return availableNodes.reduce((min, node) =>
                node.data.currentLoad < min.data.currentLoad ? node : min
            );

        case "random":
            return availableNodes[
                Math.floor(Math.random() * availableNodes.length)
            ];
    }
}
Enter fullscreen mode Exit fullscreen mode

Real-Time Cost Calculation

Every component has realistic AWS-like pricing:

  • Per-request costs: API Gateway, Lambda, CDN
  • Per-hour costs: EC2 instances, containers, databases
  • Per-GB costs: Storage, data transfer, logs
function calculateComponentCost(node, tickDuration, requestsThisTick) {
    const { costPerRequest, costPerHour, costPerGB } = node.data;

    let totalCost = 0;

    // Request-based costs
    totalCost += requestsThisTick * costPerRequest;

    // Compute costs (hourly costs converted to per-tick)
    const replicas = node.data.replicas || 1;
    totalCost += (costPerHour * replicas / 3600) * tickDuration;

    // Storage costs (component-specific)
    if (node.data.componentType === "database") {
        const dataGB = (requestsThisTick * 0.001) / 1024 / 1024;
        totalCost += dataGB * costPerGB;
    }

    return totalCost;
}
Enter fullscreen mode Exit fullscreen mode

Key Features

1. Visual Architecture Builder

Drag components from the palette, connect them with edges, and see your
architecture come to life. React Flow provides smooth, interactive node
manipulation.

2. Granular Configuration

Every component is configurable:

  • Capacity: Requests per second it can handle
  • Latency: Response time in milliseconds
  • Failure Rate: Probability of random failures
  • Replicas: Number of instances for scaling
  • Component-specific settings: Cache hit rates, queue sizes, routing algorithms, etc.

3. Real-Time Metrics Dashboard

Watch live metrics update 10 times per second:

  • Throughput: Successful requests/second
  • Latency: Average and P99 percentile
  • Error Rate: Failed requests percentage
  • Request Breakdown: Read/write/async counts
  • Resilience Metrics: Retries, circuit breaker trips, failovers

4. Bottleneck Detection

The simulator automatically identifies bottlenecks and suggests fixes:

  • "Web Server 1 is overloaded (125% capacity). Consider adding replicas or increasing capacity."
  • "Cache has low hit rate (45%). Consider increasing cache size or adjusting TTL."
  • "Message Queue is at 90% capacity. Consider adding consumers."

5. Cost Analysis

See real-time cost breakdown:

  • Total cost per second/hour/month
  • Cost per component
  • Cost optimization suggestions

6. Scenario Management

Save and load architecture scenarios:

  • E-commerce Platform: Load balancer → API Gateway → App Servers → Cache → Database
  • Social Media Feed: CDN → API Gateway → Cache → Message Queue → Stream Processor
  • Video Streaming: CDN → Object Storage with multi-region failover

Technical Challenges and Solutions

Challenge 1: Request Path Tracking

Problem: How do you track a request as it flows through multiple components
without infinite loops?

Solution: Visited set and path tracking:

const visited = new Set<string>();
const path: string[] = [];

while (hasMoreNodes && !visited.has(currentNode.id)) {
    visited.add(currentNode.id);
    path.push(currentNode.id);

    // Process current node
    const result = processComponent(currentNode, requestType);

    // Get next nodes based on graph edges
    currentNode = selectNextNode(graph, currentNode);
}
Enter fullscreen mode Exit fullscreen mode

Challenge 2: Realistic Load Distribution

Problem: How do you simulate realistic load when you have replicas and load
balancers?

Solution: Track current load per node and implement accurate load balancing
algorithms:

// Update load after each request
node.data.currentLoad = (node.data.currentLoad || 0) + 1;

// Check capacity before accepting request
const maxCapacity = node.data.capacity * node.data.replicas;
if (node.data.currentLoad >= maxCapacity) {
    return { success: false, reason: "capacity_exceeded" };
}

// Reset load counts each tick
afterTick(() => {
    nodes.forEach((n) => n.data.currentLoad = 0);
});
Enter fullscreen mode Exit fullscreen mode

Challenge 3: P99 Latency Calculation

Problem: How do you efficiently calculate the 99th percentile latency?

Solution: Collect all latency samples in an array and calculate percentiles:

function calculateP99Latency(latencies: number[]): number {
    if (latencies.length === 0) return 0;

    const sorted = [...latencies].sort((a, b) => a - b);
    const p99Index = Math.floor(sorted.length * 0.99);

    return sorted[p99Index];
}
Enter fullscreen mode Exit fullscreen mode

Challenge 4: State Management Complexity

Problem: Managing simulation state, metrics history, node updates, and UI
state in React.

Solution: Zustand store with clean separation of concerns:

// Single source of truth for entire application
const useSystemDesignStore = create<StoreState>((set) => ({
    // Architecture state
    nodes: [],
    edges: [],

    // Simulation state
    simulation: { isRunning: false, isPaused: false },
    metricsHistory: [],

    // Actions
    startSimulation: () => {
        const interval = setInterval(() => {
            const result = runSimulationTick();
            set((state) => ({
                nodes: result.updatedNodes,
                metricsHistory: [...state.metricsHistory, result.metrics],
            }));
        }, 100);

        set({ simulation: { isRunning: true, interval } });
    },
}));
Enter fullscreen mode Exit fullscreen mode

What I Learned

1. Distributed Systems Are Complex

Building even a simulation of distributed systems highlighted how many failure
modes exist:

  • Network partitions
  • Cascading failures
  • Thundering herd problems
  • Split-brain scenarios
  • Data consistency vs availability tradeoffs

2. React Flow is Powerful

React Flow made building the node-based interface straightforward. Custom node
types, edge routing, and viewport controls work beautifully.

3. Performance Matters in Simulations

Initially, I processed requests one-by-one sequentially. For 1000 RPS, this
meant 100 requests per tick—slow! I optimized by:

  • Batching similar operations
  • Using Set for O(1) visited checks
  • Avoiding deep cloning where possible
  • Memoizing expensive calculations

4. Real-World Patterns Are Nuanced

Implementing patterns like circuit breakers and retry logic taught me:

  • Circuit breakers need states: closed → open → half-open
  • Retries need exponential backoff and jitter
  • Health checks must distinguish degraded vs unhealthy
  • Failover requires careful replica selection

Try It Yourself

The best way to learn distributed systems is to build (and break) them. With
this simulator, you can:

  • Test if your architecture handles Black Friday traffic
  • Experiment with different caching strategies
  • Compare costs between serverless and container-based architectures
  • Understand why Netflix uses chaos engineering
  • Prepare for system design interviews with realistic scenarios

Technical Stack

  • React 19: UI framework
  • TypeScript: Type safety and developer experience
  • React Flow: Node-based interface
  • Zustand: Lightweight state management
  • Vite: Lightning-fast build tool
  • Lucide Icons: Beautiful, consistent icons

Key Takeaways

Building this simulator taught me more about distributed systems than reading
dozens of blog posts. When you implement the actual behavior—request routing,
load balancing, failure handling, cost calculation—the concepts solidify.

For learners: This shows that you don't need a data center to understand
distributed systems. Browser simulations can teach fundamental concepts.

For architects: Prototyping architectures in a visual simulator before
building them can reveal bottlenecks and save costly mistakes.

For interviewers/interviewees: This demonstrates deep understanding of
distributed systems concepts through implementation, not just theory.

Conclusion

Distributed systems don't have to be abstract boxes on a whiteboard. With modern
web technologies, we can build interactive, visual simulations that make these
complex systems tangible and understandable.

Whether you're learning system design, preparing for interviews, or architecting
real systems, having a playground to experiment in is invaluable. And the best
part? It all runs in your browser—no Docker, no Kubernetes, no cloud bills.

Now go build something, break it, then build it better. That's how you truly
learn distributed systems.

Questions or ideas? Drop them in the comments below. I'd love to hear about
the architectures you build or features you'd like to see.

Top comments (0)