Himanshu Gupta

Posted on Jul 2

System Design: Designing LeetCode at Scale

#ai #leetcode #systemdesign #distributedsystems

LeetCode looks simple on the surface—browse coding problems, write code, and get instant feedback.

But behind that simple interface lies a distributed system capable of handling millions of developers, thousands of code submissions per second, and massive traffic spikes during coding contests.

In this article, we'll design a scalable version of LeetCode from scratch, covering everything from functional requirements to database choices, code execution, and leaderboard architecture.

Problem Statement

Design an online coding platform similar to LeetCode that allows users to:

Browse coding problems
Read problem statements
Write code in multiple programming languages
Submit solutions
Receive execution results within seconds
Participate in weekly coding contests
View live leaderboards

Functional Requirements

Our system should support the following features:

1. Browse Coding Problems

Users should be able to:

Search problems
Filter by difficulty
Filter by tags
Sort by popularity
Support pagination

2. View a Problem

Each problem contains:

Title
Description
Constraints
Sample Input
Sample Output
Hidden Test Cases
Supported Languages

3. Submit Solutions

Users should be able to:

Select a programming language
Write code
Submit code
Receive verdicts such as:
Accepted
Wrong Answer
Runtime Error
Compilation Error
Time Limit Exceeded (TLE)
Memory Limit Exceeded (MLE)

4. Contest Support

Support weekly and bi-weekly contests with:

Contest registration
Live rankings
Score calculation
Real-time leaderboard updates

Out of Scope

To keep the design focused, we'll ignore:

Authentication & Authorization
User Profiles
Payments
Notifications
Recommendation Engine
Analytics

Non-Functional Requirements

A production-grade coding platform must satisfy several quality attributes.

High Availability

During contests, thousands of users submit code simultaneously.

The platform should remain available even if individual services fail.

Low Latency

Users expect results within 2–5 seconds after submitting code.

Long delays lead to poor user experience.

Scalability

The system should support:

Millions of users
Thousands of concurrent submissions
Massive traffic spikes during contests

Horizontal scaling is preferred.

Security

Running arbitrary user code is extremely dangerous.

The execution environment must be:

Sandboxed
Isolated
Resource limited
Protected against malicious code

Fault Tolerance

No service should become a single point of failure.

The platform should continue operating even when individual components fail.

Capacity Estimation

Assume:

10 Million registered users
2 Million daily active users
100K concurrent users during contests
20K code submissions per minute

These numbers influence our infrastructure choices.

Core Entities

Problem

ProblemID
Title
Difficulty
Tags
Statement
Constraints
Hidden Test Cases

User

UserID
Username
Rating
Contest Rank

Submission

SubmissionID
ProblemID
UserID
Language
Code
Status
Execution Time
Memory Usage
Timestamp

Contest

ContestID
Start Time
End Time
Problems

Leaderboard

ContestID
UserID
Score
Penalty
Rank

REST APIs

Get Problems

GET /problems?page=1&difficulty=medium

Returns paginated problem lists.

Get Problem Details

GET /problems/{problemId}

Returns complete problem information.

Submit Solution

POST /problems/{problemId}/submit

Request:

{
  "language":"Java",
  "code":"..."
}

Response:

{
  "submissionId":"12345",
  "status":"Queued"
}

Submission Status

GET /submissions/{submissionId}

Returns:

Running
Accepted
Wrong Answer
Runtime Error

Contest Leaderboard

GET /contests/{contestId}/leaderboard

High-Level Architecture

                Users
                  │
            Load Balancer
                  │
             API Gateway
                  │
     ┌────────────┴─────────────┐
     │                          │
Problem Service         Submission Service
     │                          │
     │                    Message Queue
     │                          │
     │                  Code Execution Workers
     │                          │
     │                    Sandbox Containers
     │                          │
Database                Result Service
     │                          │
     └──────────────┬───────────┘
                    │
              Leaderboard Service

Why Use a Message Queue?

Code execution is time-consuming.

Instead of executing code synchronously:

User

↓

API

↓

Execute Code

↓

Response

We enqueue submissions.

User

↓

API

↓

Submission Queue

↓

Execution Workers

↓

Result

Benefits:

Better scalability
Retry failed executions
Load balancing
No API timeout

Popular choices:

Kafka
RabbitMQ
Amazon SQS

Code Execution Environment

This is the most critical component.

Running user code directly on servers is unsafe.

Instead:

Spin up isolated Docker containers
Apply CPU and memory limits
Restrict network access
Destroy the container after execution

Benefits:

Security
Isolation
Cost efficiency
Fast startup compared to Virtual Machines

Why Containers Instead of Virtual Machines?

Docker Containers	Virtual Machines
Lightweight	Heavy
Fast startup	Slow boot time
Better resource utilization	Higher resource usage
Lower cost	Higher cost
Easy horizontal scaling	More difficult to scale

For online judges, containers provide the best balance between performance and isolation.

Database Design

The platform stores different types of data.

Relational Database

Good for:

Users
Contests
Rankings
Transactions

Examples:

PostgreSQL
MySQL

NoSQL Database

Ideal for:

Problems
Test Cases
Submissions
Execution Logs

Examples:

DynamoDB
MongoDB
Cassandra

Since problem metadata rarely changes and submissions grow rapidly, NoSQL databases provide excellent scalability.

Caching

Popular problems are read far more often than they are updated.

We can cache:

Problem details
Test metadata
Contest information
Leaderboards

Using Redis significantly reduces database load and improves response times.

Live Leaderboard

Leaderboards are updated frequently during contests.

Instead of recalculating rankings after every submission:

Update scores asynchronously
Store rankings in Redis Sorted Sets
Push updates using WebSockets

This provides near real-time rankings with minimal latency.

Scaling Code Execution

Execution workers should scale independently.

During contests:

Normal Day

10 Workers

Contest

200 Workers

Using Kubernetes or container orchestration allows automatic scaling based on queue length.

Bottlenecks

Potential bottlenecks include:

Code execution workers
Message queue backlog
Database write throughput
Leaderboard updates
Large contest traffic spikes

Each component should be independently scalable.

Trade-offs

Every system design involves trade-offs.

Availability vs Consistency

We prioritize Availability.

A submission result delayed by a second is acceptable.

A platform outage during a contest is not.

This makes Eventual Consistency a practical choice for leaderboard updates.

Future Improvements

The platform can be extended with:

AI-powered code review
Code similarity detection
Plagiarism detection
Custom test cases
Interview mode
Company-specific problem sets
Multi-region deployment
Distributed execution clusters

Final Thoughts

Designing a platform like LeetCode is far more than storing coding problems.

It involves secure code execution, scalable infrastructure, asynchronous processing, distributed caching, and real-time leaderboards—all while maintaining low latency and high availability.

A robust design balances performance, security, and scalability to ensure developers receive instant feedback, even during the busiest coding contests.

As online coding platforms continue to evolve, incorporating AI-assisted code analysis, distributed execution environments, and intelligent recommendations will make these systems even more powerful.

How would you design the code execution engine? Would you choose Docker, Firecracker microVMs, or another sandboxing technology? Share your thoughts in the comments!

#systemdesign #leetcode #backend #softwarearchitecture #distributedsystems #microservices #cloud #programming #developers #coding