DEV Community

Himanshu Gupta
Himanshu Gupta

Posted on

System Design: Designing LeetCode at Scale

LeetCode looks simple on the surface—browse coding problems, write code, and get instant feedback.

But behind that simple interface lies a distributed system capable of handling millions of developers, thousands of code submissions per second, and massive traffic spikes during coding contests.

In this article, we'll design a scalable version of LeetCode from scratch, covering everything from functional requirements to database choices, code execution, and leaderboard architecture.


Problem Statement

Design an online coding platform similar to LeetCode that allows users to:

  • Browse coding problems
  • Read problem statements
  • Write code in multiple programming languages
  • Submit solutions
  • Receive execution results within seconds
  • Participate in weekly coding contests
  • View live leaderboards

Functional Requirements

Our system should support the following features:

1. Browse Coding Problems

Users should be able to:

  • Search problems
  • Filter by difficulty
  • Filter by tags
  • Sort by popularity
  • Support pagination

2. View a Problem

Each problem contains:

  • Title
  • Description
  • Constraints
  • Sample Input
  • Sample Output
  • Hidden Test Cases
  • Supported Languages

3. Submit Solutions

Users should be able to:

  • Select a programming language

  • Write code

  • Submit code

  • Receive verdicts such as:

  • Accepted

  • Wrong Answer

  • Runtime Error

  • Compilation Error

  • Time Limit Exceeded (TLE)

  • Memory Limit Exceeded (MLE)


4. Contest Support

Support weekly and bi-weekly contests with:

  • Contest registration
  • Live rankings
  • Score calculation
  • Real-time leaderboard updates

Out of Scope

To keep the design focused, we'll ignore:

  • Authentication & Authorization
  • User Profiles
  • Payments
  • Notifications
  • Recommendation Engine
  • Analytics

Non-Functional Requirements

A production-grade coding platform must satisfy several quality attributes.

High Availability

During contests, thousands of users submit code simultaneously.

The platform should remain available even if individual services fail.


Low Latency

Users expect results within 2–5 seconds after submitting code.

Long delays lead to poor user experience.


Scalability

The system should support:

  • Millions of users
  • Thousands of concurrent submissions
  • Massive traffic spikes during contests

Horizontal scaling is preferred.


Security

Running arbitrary user code is extremely dangerous.

The execution environment must be:

  • Sandboxed
  • Isolated
  • Resource limited
  • Protected against malicious code

Fault Tolerance

No service should become a single point of failure.

The platform should continue operating even when individual components fail.


Capacity Estimation

Assume:

  • 10 Million registered users
  • 2 Million daily active users
  • 100K concurrent users during contests
  • 20K code submissions per minute

These numbers influence our infrastructure choices.


Core Entities

Problem

ProblemID
Title
Difficulty
Tags
Statement
Constraints
Hidden Test Cases
Enter fullscreen mode Exit fullscreen mode

User

UserID
Username
Rating
Contest Rank
Enter fullscreen mode Exit fullscreen mode

Submission

SubmissionID
ProblemID
UserID
Language
Code
Status
Execution Time
Memory Usage
Timestamp
Enter fullscreen mode Exit fullscreen mode

Contest

ContestID
Start Time
End Time
Problems
Enter fullscreen mode Exit fullscreen mode

Leaderboard

ContestID
UserID
Score
Penalty
Rank
Enter fullscreen mode Exit fullscreen mode

REST APIs

Get Problems

GET /problems?page=1&difficulty=medium
Enter fullscreen mode Exit fullscreen mode

Returns paginated problem lists.


Get Problem Details

GET /problems/{problemId}
Enter fullscreen mode Exit fullscreen mode

Returns complete problem information.


Submit Solution

POST /problems/{problemId}/submit
Enter fullscreen mode Exit fullscreen mode

Request:

{
  "language":"Java",
  "code":"..."
}
Enter fullscreen mode Exit fullscreen mode

Response:

{
  "submissionId":"12345",
  "status":"Queued"
}
Enter fullscreen mode Exit fullscreen mode

Submission Status

GET /submissions/{submissionId}
Enter fullscreen mode Exit fullscreen mode

Returns:

  • Running
  • Accepted
  • Wrong Answer
  • Runtime Error

Contest Leaderboard

GET /contests/{contestId}/leaderboard
Enter fullscreen mode Exit fullscreen mode

High-Level Architecture

                Users
                  │
            Load Balancer
                  │
             API Gateway
                  │
     ┌────────────┴─────────────┐
     │                          │
Problem Service         Submission Service
     │                          │
     │                    Message Queue
     │                          │
     │                  Code Execution Workers
     │                          │
     │                    Sandbox Containers
     │                          │
Database                Result Service
     │                          │
     └──────────────┬───────────┘
                    │
              Leaderboard Service
Enter fullscreen mode Exit fullscreen mode

Why Use a Message Queue?

Code execution is time-consuming.

Instead of executing code synchronously:

User

↓

API

↓

Execute Code

↓

Response
Enter fullscreen mode Exit fullscreen mode

We enqueue submissions.

User

↓

API

↓

Submission Queue

↓

Execution Workers

↓

Result
Enter fullscreen mode Exit fullscreen mode

Benefits:

  • Better scalability
  • Retry failed executions
  • Load balancing
  • No API timeout

Popular choices:

  • Kafka
  • RabbitMQ
  • Amazon SQS

Code Execution Environment

This is the most critical component.

Running user code directly on servers is unsafe.

Instead:

  • Spin up isolated Docker containers
  • Apply CPU and memory limits
  • Restrict network access
  • Destroy the container after execution

Benefits:

  • Security
  • Isolation
  • Cost efficiency
  • Fast startup compared to Virtual Machines

Why Containers Instead of Virtual Machines?

Docker Containers Virtual Machines
Lightweight Heavy
Fast startup Slow boot time
Better resource utilization Higher resource usage
Lower cost Higher cost
Easy horizontal scaling More difficult to scale

For online judges, containers provide the best balance between performance and isolation.


Database Design

The platform stores different types of data.

Relational Database

Good for:

  • Users
  • Contests
  • Rankings
  • Transactions

Examples:

  • PostgreSQL
  • MySQL

NoSQL Database

Ideal for:

  • Problems
  • Test Cases
  • Submissions
  • Execution Logs

Examples:

  • DynamoDB
  • MongoDB
  • Cassandra

Since problem metadata rarely changes and submissions grow rapidly, NoSQL databases provide excellent scalability.


Caching

Popular problems are read far more often than they are updated.

We can cache:

  • Problem details
  • Test metadata
  • Contest information
  • Leaderboards

Using Redis significantly reduces database load and improves response times.


Live Leaderboard

Leaderboards are updated frequently during contests.

Instead of recalculating rankings after every submission:

  • Update scores asynchronously
  • Store rankings in Redis Sorted Sets
  • Push updates using WebSockets

This provides near real-time rankings with minimal latency.


Scaling Code Execution

Execution workers should scale independently.

During contests:

Normal Day

10 Workers

Contest

200 Workers
Enter fullscreen mode Exit fullscreen mode

Using Kubernetes or container orchestration allows automatic scaling based on queue length.


Bottlenecks

Potential bottlenecks include:

  • Code execution workers
  • Message queue backlog
  • Database write throughput
  • Leaderboard updates
  • Large contest traffic spikes

Each component should be independently scalable.


Trade-offs

Every system design involves trade-offs.

Availability vs Consistency

We prioritize Availability.

A submission result delayed by a second is acceptable.

A platform outage during a contest is not.

This makes Eventual Consistency a practical choice for leaderboard updates.


Future Improvements

The platform can be extended with:

  • AI-powered code review
  • Code similarity detection
  • Plagiarism detection
  • Custom test cases
  • Interview mode
  • Company-specific problem sets
  • Multi-region deployment
  • Distributed execution clusters

Final Thoughts

Designing a platform like LeetCode is far more than storing coding problems.

It involves secure code execution, scalable infrastructure, asynchronous processing, distributed caching, and real-time leaderboards—all while maintaining low latency and high availability.

A robust design balances performance, security, and scalability to ensure developers receive instant feedback, even during the busiest coding contests.

As online coding platforms continue to evolve, incorporating AI-assisted code analysis, distributed execution environments, and intelligent recommendations will make these systems even more powerful.


How would you design the code execution engine? Would you choose Docker, Firecracker microVMs, or another sandboxing technology? Share your thoughts in the comments!

#systemdesign #leetcode #backend #softwarearchitecture #distributedsystems #microservices #cloud #programming #developers #coding

Top comments (0)