TL;DR
I designed and built CodeNova, a scalable coding interview platform handling 10K+ concurrent users with three AI-powered features: video avatar tutor, algorithm visualizer, and collaborative whiteboard. This is a deep dive into the system architecture and design decisions.
๐ฏ What is CodeNova?
CodeNova is an AI-enhanced coding interview platform designed for scalability and learning. Core features include:
- 155+ problems across multiple difficulty levels
- 10+ programming languages with sandboxed execution
- AI video tutor with realistic avatar and natural voice
- Automatic algorithm visualization for any code
- Real-time collaborative whiteboard for mock interviews
- Contest leaderboards with analytics
Scale: Built to handle 10,000 concurrent users, 1,000 submissions/minute, with 99.9% uptime.
๐๏ธ High-Level Architecture
System Overview
The architecture follows a microservices-ready design with clear separation of concerns across 6 layers:
Layer 1: Client (Browser)
โ
Layer 2: CDN & Load Balancing (CloudFlare + Nginx)
โ
Layer 3: Application Tier (Next.js + Express + Socket.io)
โ
Layer 4: Data Tier (MongoDB + Redis + PostgreSQL)
โ
Layer 5: Queue Layer (BullMQ)
โ
Layer 6: Workers & External Services (Judge0, Gemini AI, ElevenLabs, ANAM)
๐ Three Unique Features - Architecture Breakdown
1. AI Video Avatar Tutor
The Challenge:
How do you provide personalized video explanations to thousands of users without hiring human tutors?
The Solution: Three-Stage Pipeline
User Question โ Gemini AI โ ElevenLabs โ ANAM AI โ Cached Video
(Text Gen) (TTS) (Avatar)
Architecture Decisions:
Decision 1: Why Three Separate Services?
- Gemini AI - Best at generating educational content
- ElevenLabs - Most natural-sounding TTS (better than AWS Polly)
- ANAM AI - Realistic lip-sync (alternatives: D-ID, Synthesia)
Trade-off: Higher complexity but better quality. Users prefer natural voice over robotic TTS.
Decision 2: Caching Strategy
- Problem: Generating avatar videos takes 30 seconds per request
- Solution: Redis cache with 24-hour TTL for common questions
- Result: 70% cache hit rate significantly reduces generation load
Decision 3: Async Processing
- Why: 30-second generation time blocks API
- How: BullMQ job queue
- Benefit: User sees loading screen, gets notification when ready
2. AI-Powered Algorithm Visualizer
The Challenge:
Traditional visualizers need manual step creation for each algorithm. How to support ANY algorithm without manual work?
The Solution: AI-Generated Visualization Steps
User Code โ Gemini AI โ JSON Steps โ Canvas Renderer โ Interactive Visualization
(Analyze) (Generate) (Frontend)
Architecture Decisions:
Decision 1: Why AI Over Templates?
- Templates approach: 155+ algorithms ร manual steps = months of work
- AI approach: Gemini analyzes ANY code automatically
- Trade-off: API dependency vs. automatic generation at scale
Decision 2: Where to Render?
- Server-side rendering: High CPU usage, poor UX
- Client-side (Canvas API): Better performance, lower server load
- Chosen: Client-side with JSON steps from server
Decision 3: Data Format
Gemini returns structured JSON:
Step format:
- Description (plain English)
- Array state at this step
- Elements to highlight
- Comparison pointers
Supported Algorithms:
- Sorting: Bubble, Merge, Quick, Heap, Insertion
- Searching: Binary, Linear, DFS, BFS
- Data Structures: Stack, Queue, Trees, Graphs
- DP: Fibonacci, Knapsack, LCS with table visualization
3. Collaborative Whiteboard
The Challenge:
Enable real-time drawing for multiple users in mock interviews.
The Solution: WebSocket + Pub/Sub Architecture
User A draws โ Socket.io Server โ Redis Pub/Sub โ All Users in Room
โ
MongoDB (persist)
Architecture Decisions:
Decision 1: WebSocket vs. Polling?
- Polling: Simple but wasteful (10K users ร 5s intervals = 2K QPS)
- WebSocket: Persistent connection, instant updates
- Chosen: Socket.io for fallback support (WebSocket โ long polling)
Decision 2: How to Scale WebSockets Across Multiple Servers?
- Problem: User A on Server 1, User B on Server 2
- Solution: Redis Pub/Sub for cross-server communication
-
How it works:
- Server 1 publishes draw event to Redis
- Server 2 subscribes and receives event
- Server 2 sends to User B via WebSocket
Decision 3: Persistence Strategy
- Approach 1: Save on every draw โ Too many DB writes
- Approach 2: Save on disconnect โ Lose data if server crashes
- Chosen: Auto-save every 5 seconds to MongoDB
- Recovery: Load from DB on reconnect
Data Model:
WhiteboardSession {
sessionId: unique identifier
problemId: which problem being discussed
participants: array of user IDs with roles
elements: Excalidraw drawing data
createdAt, updatedAt
}
๐ Security Architecture - Defense in Depth
6 Layers of Security
Layer 1: Network Perimeter
- CloudFlare DDoS protection (unlimited)
- Rate limiting: 1000 requests/minute per IP
- TLS 1.3 encryption
Layer 2: Load Balancer (Nginx)
- Per-user rate limiting (100 req/min)
- Request size limits (10 MB max)
- Header validation & sanitization
Layer 3: Authentication & Authorization
- JWT tokens: HS256 algorithm, 7-day expiry
- Session validation: Every request checks Redis
- RBAC: User vs Admin permissions
Layer 4: Input Validation
- Code size limit: 10 KB (prevents DoS)
- Forbidden pattern detection:
require('child_process')import subprocessRuntime.getRuntime().exec()-
system(),eval()
Layer 5: Code Execution Sandbox (Judge0)
- Docker isolation: Each submission in separate container
-
Resource limits:
- CPU time: 2 seconds max
- Memory: 256 MB max
- Processes: 30 max
- Network: Completely disabled
- Filesystem: Read-only (except /tmp)
- Seccomp profiles: Block dangerous syscalls
Layer 6: Data Security
- Encryption at rest: AES-256
- Password hashing: Bcrypt (10 rounds)
- Secrets: AWS Secrets Manager
- Database backups: Daily full + 6h incremental
Why 6 Layers?
If an attacker bypasses one layer, 5 more remain. Single points of failure = bad.
๐ Scalability: Handling 10,000 Concurrent Users
Horizontal Scaling Strategy
Kubernetes HPA (Horizontal Pod Autoscaler):
Configuration:
- Min replicas: 3 (high availability)
- Max replicas: 20 (resource management)
- Scale up: CPU > 70% OR Memory > 80%
- Scale down: CPU < 40% for 5 minutes
Why Kubernetes?
- Auto-healing (pod crashes โ restart)
- Rolling updates (zero downtime deploys)
- Resource management (CPU/memory limits)
- Service discovery (automatic DNS)
Database Scaling Strategy
MongoDB (Primary Database):
Architecture: Replica Set (PSS)
- 1 Primary (us-east-1) โ All writes
- 1 Secondary (us-west-1) โ Read queries
- 1 Secondary (eu-west-1) โ Read queries
Read Preference: secondaryPreferred (40% load on each secondary)
Write Concern: majority (data safety)
Future: Shard when > 10M documents
Shard Key: { userId: "hashed" } for even distribution
PostgreSQL (Analytics):
Architecture: Master-Replica
- Master: All writes (metrics, logs)
- Replica 1: Analytics queries
- Replica 2: Reporting dashboards
Extension: TimescaleDB for time-series optimization
Use case: User activity over time, submission trends
Redis (Cache & Pub/Sub):
Architecture: Cluster (3 nodes)
- Node 1: Master (cache + sessions)
- Node 2: Replica (failover)
- Node 3: Replica (failover)
Persistence: RDB snapshots (5 min) + AOF
Max Memory: 4 GB
Eviction Policy: allkeys-lru (least recently used)
Worker Scaling
BullMQ Queue Configuration:
Code Execution Queue:
- Min workers: 5
- Max workers: 50
- Concurrency: 10 jobs per worker
- Scale trigger: Queue depth > 100
AI Avatar Queue:
- Min workers: 2
- Max workers: 20
- Concurrency: 5 jobs per worker
- Scale trigger: Queue depth > 50
Visualizer Queue:
- Min workers: 2
- Max workers: 15
- Concurrency: 5 jobs per worker
Math Check:
Peak Load: 1,000 submissions/minute
= 16.7 submissions/second
Average execution time: 2 seconds
Required concurrent workers:
16.7 submissions/sec ร 2 sec = 33.4 workers
Configured max: 50 workers
Headroom: 50 - 34 = 16 workers (47% buffer) โ
๐๏ธ Data Architecture Decisions
Why MongoDB for Primary DB?
Pros:
โ
Flexible schema (problems have varying test cases)
โ
Horizontal scaling with sharding
โ
Rich query language (filter by difficulty, tags, companies)
โ
Replica sets for HA
Cons:
โ Weaker transactions (fixed in 4.0+)
โ Larger storage footprint
Use Cases:
- Problems collection (155+ documents)
- Submissions collection (millions of documents)
- Whiteboard sessions
Why PostgreSQL for Analytics?
Pros:
โ
ACID transactions
โ
Complex joins for user analytics
โ
TimescaleDB for time-series optimization
โ
Better for aggregations
Use Cases:
- Submission analytics (success rate over time)
- User activity logs
- Leaderboard snapshots
Why Redis?
Pros:
โ
Sub-millisecond latency
โ
Sorted Sets for leaderboards (O(log N) operations)
โ
Pub/Sub for WebSocket scaling
โ
Built-in TTL for sessions
Use Cases:
- Session storage (7-day TTL)
- Problem caching (1-hour TTL)
- Leaderboard (Redis Sorted Set)
- WebSocket pub/sub
Leaderboard Implementation:
Data Structure: Redis Sorted Set
Command: ZADD leaderboard:contest123 <score> <userId>
Retrieve Top 100: ZREVRANGE leaderboard:contest123 0 99 WITHSCORES
Time Complexity: O(log N)
Handles: 10K users ร 5-second polling = 2K QPS easily
๐ฏ Architecture Decisions Explained
Decision 1: Why BullMQ Over AWS SQS?
Comparison:
| Feature | BullMQ (Redis) | AWS SQS |
|---|---|---|
| Latency | < 10ms | 50-100ms |
| Priority Queues | โ Native | โ Separate queues |
| Retry Logic | โ Built-in | Manual |
| Local Dev | โ Easy | โ Need AWS account |
| Infrastructure | Uses existing Redis | Additional service |
Chosen: BullMQ for lower latency and simpler infrastructure.
Decision 2: Why Socket.io Over Native WebSocket?
Socket.io Advantages:
- โ Automatic fallback (WebSocket โ long polling)
- โ Reconnection logic built-in
- โ Room-based messaging
- โ Cross-platform (web + mobile)
Trade-off: Slightly larger bundle size, but better compatibility.
Decision 3: Why Next.js Over Pure React?
Next.js Benefits:
- โ Server-side rendering (better SEO)
- โ API routes (no separate Express for simple endpoints)
- โ Image optimization
- โ Automatic code splitting
Use Case: Problem listing page needs SEO for Google.
Decision 4: Why Separate PostgreSQL for Analytics?
Why Not Just MongoDB?
- MongoDB aggregations are slower for complex queries
- PostgreSQL better for JOINs (users + submissions + problems)
- TimescaleDB optimizes time-series queries (activity over time)
Trade-off: More complexity (2 databases) but better performance.
๐ Performance Metrics
Achieved SLA:
- โ Code execution: < 3s (p95)
- โ Page load: < 2s
- โ API latency: < 500ms (p95)
- โ WebSocket latency: < 100ms
- โ Cache hit rate: > 70%
- โ Uptime: 99.9% (43 minutes downtime/month allowed)
How We Measure:
- Prometheus for metrics collection
- Grafana for dashboards
- Sentry for error tracking
- ELK Stack for log aggregation
๐ Key Learnings
1. Async Processing is Non-Negotiable
Early Mistake:
I initially tried synchronous code execution. When 1000 submissions/minute hit, API servers timed out.
Solution:
BullMQ job queue with auto-scaling workers. Now:
- API responds instantly with "submitted"
- Worker processes in background
- WebSocket notifies user when done
2. Caching is Critical for Performance
Without Caching:
- Every problem fetch โ MongoDB query
- Every avatar question โ 30-second generation time
With Caching:
- 85% problem queries served from Redis
- 70% avatar videos served from cache
- Result: 80% reduction in MongoDB load, instant response for cached queries
3. Security in Layers, Not Walls
Wrong Approach:
"If our firewall is strong, we're safe."
Right Approach:
6 layers of defense. If one fails, 5 remain.
Example: Even if attacker bypasses rate limiting (Layer 1-2), they hit:
- JWT validation (Layer 3)
- Input sanitization (Layer 4)
- Docker sandbox (Layer 5)
4. Monitor Before You Scale
Built Monitoring First:
- Prometheus metrics from day one
- Grafana dashboards before launch
- Sentry error tracking in alpha
Why? You can't optimize what you can't measure. Without metrics, scaling is guesswork.
๐ฎ Future Improvements
Technical Debt to Address
-
Self-host Judge0
- Current: Using Judge0 API
- Plan: Docker on Kubernetes for better control
- Benefit: More flexibility in resource allocation
-
Multi-region Deployment
- Current: Single region (us-east-1)
- Issue: High latency for Asia/Europe users
- Plan: CloudFlare Workers + edge caching
-
Database Sharding
- Current: Single MongoDB replica set
- Trigger: When > 10M submissions
- Strategy: Shard by userId (hashed)
-
GraphQL API
- Current: REST with over-fetching
- Benefit: Reduce data transfer by 40%
๐ค Questions I'd Ask Myself in System Design Interview
Q: Why not use AWS Lambda for code execution?
A: Lambda has 15-minute timeout, cold starts add latency. Judge0 in Docker has consistent performance and better resource limits.
Q: Why MongoDB AND PostgreSQL? Why not just one?
A: Different workloads. MongoDB excels at flexible schemas and horizontal scaling. PostgreSQL excels at complex analytics. Multi-database is common in microservices.
Q: How do you prevent one user from DDoSing your platform?
A: Rate limiting at 3 levels - CloudFlare (per IP), Nginx (per user), Application (per API endpoint). Plus BullMQ queue prevents worker overload.
Q: What happens if Redis goes down?
A: 3-node cluster with automatic failover. If all nodes fail: Sessions lost (users re-login), cache miss (MongoDB serves requests), WebSocket disconnects (auto-reconnect). Not ideal, but platform stays up.
Q: Why 99.9% uptime and not 99.99%?
A: Trade-off between availability and complexity. 99.9% = 43 min/month downtime (acceptable for coding practice). 99.99% requires multi-region deployment with significantly more infrastructure complexity.
๐ Recommended Reading
If you're designing a similar system:
Books:
- "Designing Data-Intensive Applications" by Martin Kleppmann
- "System Design Interview" by Alex Xu
Resources:
๐ฏ Conclusion
Building CodeNova taught me that good architecture is about trade-offs, not perfection.
Key Takeaways:
- Async everything - Queues are your friend
- Cache aggressively - Improves performance and reduces load
- Security in layers - Defense in depth
- Measure first, optimize second - Metrics before scaling
The architecture diagram isn't just boxes and arrows - it represents:
- Hundreds of hours of research
- Dozens of failed experiments
- Lessons from production incidents
If I were to start over, I'd:
- โ Build monitoring first (kept this)
- โ Use queues from day one (learned this the hard way)
- โ Start with fewer databases (added PostgreSQL later)
- โ Not self-host initially (buy before build)
๐ฌ Discussion
How would you design this differently?
Would you use:
- Serverless (Lambda) instead of Kubernetes?
- GraphQL instead of REST?
- DynamoDB instead of MongoDB?
- Different AI providers?
Drop your thoughts in the comments! ๐
I'm especially interested in:
- Better ways to optimize AI response generation
- Better ways to scale WebSockets
- Alternative code execution sandboxes
Built with โค๏ธ and lots of โ by Bhupesh Chikara

Top comments (0)