LeetCode looks simple on the surface—browse coding problems, write code, and get instant feedback.
But behind that simple interface lies a distributed system capable of handling millions of developers, thousands of code submissions per second, and massive traffic spikes during coding contests.
In this article, we'll design a scalable version of LeetCode from scratch, covering everything from functional requirements to database choices, code execution, and leaderboard architecture.
Problem Statement
Design an online coding platform similar to LeetCode that allows users to:
- Browse coding problems
- Read problem statements
- Write code in multiple programming languages
- Submit solutions
- Receive execution results within seconds
- Participate in weekly coding contests
- View live leaderboards
Functional Requirements
Our system should support the following features:
1. Browse Coding Problems
Users should be able to:
- Search problems
- Filter by difficulty
- Filter by tags
- Sort by popularity
- Support pagination
2. View a Problem
Each problem contains:
- Title
- Description
- Constraints
- Sample Input
- Sample Output
- Hidden Test Cases
- Supported Languages
3. Submit Solutions
Users should be able to:
Select a programming language
Write code
Submit code
Receive verdicts such as:
Accepted
Wrong Answer
Runtime Error
Compilation Error
Time Limit Exceeded (TLE)
Memory Limit Exceeded (MLE)
4. Contest Support
Support weekly and bi-weekly contests with:
- Contest registration
- Live rankings
- Score calculation
- Real-time leaderboard updates
Out of Scope
To keep the design focused, we'll ignore:
- Authentication & Authorization
- User Profiles
- Payments
- Notifications
- Recommendation Engine
- Analytics
Non-Functional Requirements
A production-grade coding platform must satisfy several quality attributes.
High Availability
During contests, thousands of users submit code simultaneously.
The platform should remain available even if individual services fail.
Low Latency
Users expect results within 2–5 seconds after submitting code.
Long delays lead to poor user experience.
Scalability
The system should support:
- Millions of users
- Thousands of concurrent submissions
- Massive traffic spikes during contests
Horizontal scaling is preferred.
Security
Running arbitrary user code is extremely dangerous.
The execution environment must be:
- Sandboxed
- Isolated
- Resource limited
- Protected against malicious code
Fault Tolerance
No service should become a single point of failure.
The platform should continue operating even when individual components fail.
Capacity Estimation
Assume:
- 10 Million registered users
- 2 Million daily active users
- 100K concurrent users during contests
- 20K code submissions per minute
These numbers influence our infrastructure choices.
Core Entities
Problem
ProblemID
Title
Difficulty
Tags
Statement
Constraints
Hidden Test Cases
User
UserID
Username
Rating
Contest Rank
Submission
SubmissionID
ProblemID
UserID
Language
Code
Status
Execution Time
Memory Usage
Timestamp
Contest
ContestID
Start Time
End Time
Problems
Leaderboard
ContestID
UserID
Score
Penalty
Rank
REST APIs
Get Problems
GET /problems?page=1&difficulty=medium
Returns paginated problem lists.
Get Problem Details
GET /problems/{problemId}
Returns complete problem information.
Submit Solution
POST /problems/{problemId}/submit
Request:
{
"language":"Java",
"code":"..."
}
Response:
{
"submissionId":"12345",
"status":"Queued"
}
Submission Status
GET /submissions/{submissionId}
Returns:
- Running
- Accepted
- Wrong Answer
- Runtime Error
Contest Leaderboard
GET /contests/{contestId}/leaderboard
High-Level Architecture
Users
│
Load Balancer
│
API Gateway
│
┌────────────┴─────────────┐
│ │
Problem Service Submission Service
│ │
│ Message Queue
│ │
│ Code Execution Workers
│ │
│ Sandbox Containers
│ │
Database Result Service
│ │
└──────────────┬───────────┘
│
Leaderboard Service
Why Use a Message Queue?
Code execution is time-consuming.
Instead of executing code synchronously:
User
↓
API
↓
Execute Code
↓
Response
We enqueue submissions.
User
↓
API
↓
Submission Queue
↓
Execution Workers
↓
Result
Benefits:
- Better scalability
- Retry failed executions
- Load balancing
- No API timeout
Popular choices:
- Kafka
- RabbitMQ
- Amazon SQS
Code Execution Environment
This is the most critical component.
Running user code directly on servers is unsafe.
Instead:
- Spin up isolated Docker containers
- Apply CPU and memory limits
- Restrict network access
- Destroy the container after execution
Benefits:
- Security
- Isolation
- Cost efficiency
- Fast startup compared to Virtual Machines
Why Containers Instead of Virtual Machines?
| Docker Containers | Virtual Machines |
|---|---|
| Lightweight | Heavy |
| Fast startup | Slow boot time |
| Better resource utilization | Higher resource usage |
| Lower cost | Higher cost |
| Easy horizontal scaling | More difficult to scale |
For online judges, containers provide the best balance between performance and isolation.
Database Design
The platform stores different types of data.
Relational Database
Good for:
- Users
- Contests
- Rankings
- Transactions
Examples:
- PostgreSQL
- MySQL
NoSQL Database
Ideal for:
- Problems
- Test Cases
- Submissions
- Execution Logs
Examples:
- DynamoDB
- MongoDB
- Cassandra
Since problem metadata rarely changes and submissions grow rapidly, NoSQL databases provide excellent scalability.
Caching
Popular problems are read far more often than they are updated.
We can cache:
- Problem details
- Test metadata
- Contest information
- Leaderboards
Using Redis significantly reduces database load and improves response times.
Live Leaderboard
Leaderboards are updated frequently during contests.
Instead of recalculating rankings after every submission:
- Update scores asynchronously
- Store rankings in Redis Sorted Sets
- Push updates using WebSockets
This provides near real-time rankings with minimal latency.
Scaling Code Execution
Execution workers should scale independently.
During contests:
Normal Day
10 Workers
Contest
200 Workers
Using Kubernetes or container orchestration allows automatic scaling based on queue length.
Bottlenecks
Potential bottlenecks include:
- Code execution workers
- Message queue backlog
- Database write throughput
- Leaderboard updates
- Large contest traffic spikes
Each component should be independently scalable.
Trade-offs
Every system design involves trade-offs.
Availability vs Consistency
We prioritize Availability.
A submission result delayed by a second is acceptable.
A platform outage during a contest is not.
This makes Eventual Consistency a practical choice for leaderboard updates.
Future Improvements
The platform can be extended with:
- AI-powered code review
- Code similarity detection
- Plagiarism detection
- Custom test cases
- Interview mode
- Company-specific problem sets
- Multi-region deployment
- Distributed execution clusters
Final Thoughts
Designing a platform like LeetCode is far more than storing coding problems.
It involves secure code execution, scalable infrastructure, asynchronous processing, distributed caching, and real-time leaderboards—all while maintaining low latency and high availability.
A robust design balances performance, security, and scalability to ensure developers receive instant feedback, even during the busiest coding contests.
As online coding platforms continue to evolve, incorporating AI-assisted code analysis, distributed execution environments, and intelligent recommendations will make these systems even more powerful.
How would you design the code execution engine? Would you choose Docker, Firecracker microVMs, or another sandboxing technology? Share your thoughts in the comments!
#systemdesign #leetcode #backend #softwarearchitecture #distributedsystems #microservices #cloud #programming #developers #coding
Top comments (0)