I've run online exam systems before.
For a middle school with around 1,000 students, I needed a database server with 4 CPUs and 8GB RAM, plus an app server with 2 CPUs and 8GB RAM. That's a lot of hardware just for 1,000 users.
The system worked, but I kept wondering — what if I had 10,000 students? Would I need 10x the resources?
So I built two versions of an exam submission system and threw 10,000 concurrent users at them. One used direct database inserts. One used a message queue.
Direct insert failed 28% of requests. Queue-based succeeded 99.7% of the time — using way less hardware.
The Problem
If you've run an online exam system, you know how this goes. Testing with 10 users works perfectly. QA with 50 concurrent users is fine. Then exam day hits with 1,000+ students and you're just hoping the server survives.
Running exams for around 1,000 middle school students taught me that you need serious hardware just to handle everyone hitting the system at the same time. They all log in together. They all submit together. And when it lags, they all refresh together.
Here's what actually happened during one of our exams:
The moment the exam started, all 1,000 students tried to log in simultaneously. But it wasn't just them — we had no idea who else was hitting the system. Curious parents checking if the portal was up? Random people who found the URL? Someone trying to probe for vulnerabilities? All of it piled on top of our legitimate traffic.
The result? CPU spiked to 80%. RAM hit 80%. The login requests were stacking up faster than the server could process them.
I had to make an embarrassing call: ask teachers to stagger logins by class, waiting 10-15 minutes between groups. Imagine telling anxious students "please wait, the system can't handle everyone at once." Not a great look.
And that was with "only" 1,000 users on a 4 CPU / 8GB database server.
But here's what kept me up at night: What happens at 10,000 users?
At that scale, your API receives 3,000 requests per second. A typical PostgreSQL connection pool maxes out at 100 connections.
Do the math: 3,000 requests competing for 100 database connections.
I had to see this for myself.
The Test Setup
I built two identical exam submission APIs in Go:
| Component | Direct Insert (Port 8080) | Queue-Based (Port 8081) |
|---|---|---|
| API Server | Go + Fiber | Go + Fiber |
| Database | PostgreSQL (1 CPU, 512MB) | PostgreSQL (1 CPU, 512MB) |
| Message Queue | None | NATS JetStream |
| Workers | None | 10 concurrent workers |
Notice the database specs: 1 CPU, 512MB RAM. My real exam system needed 4 CPUs and 8GB RAM for just 1,000 students. I wanted to see if architecture choices could make up for limited hardware.
Load Test Configuration
- Concurrent Users: 10,000 students
- Peak Load: 3,000 requests/second
- Duration: 5 minutes
- Tool: k6 load testing
Same traffic. Same database. Two different architectures.
Approach 1: Direct Database Insert
The straightforward approach. User submits → API writes to database → Return response.
3,000 requests fighting for 100 database connections. The other 2,900? They wait. And wait. And eventually... timeout.
Approach 2: Queue-Based Architecture
The API doesn't talk to the database at all. It just publishes a message to a queue and responds immediately. Workers consume messages in the background at whatever pace the database can handle.
The queue acts as a shock absorber. Traffic spikes hit the queue, not your database.
Results
I expected the queue to win. I didn't expect it to be a blowout.
Response Time Comparison
| Metric | Direct Insert | Queue-Based | Improvement |
|---|---|---|---|
| Average | 487ms | 8ms | 60x faster |
| P50 (median) | 392ms | 6ms | 65x faster |
| P95 | 1,245ms | 18ms | 69x faster |
| P99 | 2,891ms | 34ms | 85x faster |
| Maximum | 8,432ms | 127ms | 66x faster |
Success Rate Comparison
| Approach | Successful | Failed | Success Rate |
|---|---|---|---|
| Direct Insert | 12,459 | 4,831 | 72.1% |
| Queue-Based | 17,234 | 56 | 99.7% |
With direct insert, 28% of exam submissions failed. In a real exam scenario, that's potentially thousands of students who need to resubmit, file complaints, or worse — lose their work entirely.
Database Resource Usage
| Metric | Direct Insert | Queue-Based |
|---|---|---|
| DB Connections | 98-100 (maxed out) | 12-18 |
| CPU Usage | 95-100% | 35-60% |
| Query Queue | Backed up | Smooth |
The direct approach pushed the database to its absolute limit. The queue approach? The database barely noticed.
Why Did Direct Insert Fail?
Analyzing the 28% failure rate, here's the breakdown:
| Error Type | Percentage | What Happened |
|---|---|---|
| Connection Timeout | 52% | Connection pool exhausted, requests waited > 10s |
| Database Deadlock | 23% | Too many concurrent writes, lock contention |
| Connection Refused | 18% | Max connections reached, new ones rejected |
| Context Canceled | 7% | Users gave up waiting |
The connection pool was the bottleneck. No matter how fast your database is, it can only handle so many concurrent connections.
The User Experience Difference
Direct Insert: The Frustrating Experience
User clicks "Submit"
→ Spinner...
→ Still waiting...
→ 3 seconds pass...
→ "Connection timeout" error
(28% of the time)
Average wait: 487ms (but often 1-3 seconds, sometimes 8+ seconds)
Queue-Based: The Smooth Experience
User clicks "Submit"
→ "Submission received!"
(Processing happens in background)
(User continues immediately)
Average wait: 8ms (consistently fast)
What This Costs in Practice
My original setup for ~1,000 users:
- Database: 4 CPU, 8GB RAM
- App Server: 2 CPU, 8GB RAM
- Monthly cost: around $200-300
- Stress level during exams: high
The queue-based test with 10,000 users:
- Database: 1 CPU, 512MB RAM
- Everything else: minimal
- Success rate: 99.7%
- The database didn't even notice the load
That's 10x the users with way less hardware.
Business Impact Comparison
| Aspect | Direct Insert | Queue-Based |
|---|---|---|
| User Errors | 28% see failures | 0.3% see failures |
| Infrastructure | Need bigger DB for spikes | Smaller DB handles more |
| Exam Day Stress | Engineers on-call | System stays calm |
| Scalability | Linear cost increase | Efficient scaling |
When to Use What
Queues aren't always the answer.
Direct insert works fine when traffic is low and predictable, you need data in the database immediately (like real-time dashboards), or you're doing mostly reads with occasional writes.
Queues make sense when you have traffic spikes, lots of concurrent writes, users who care about response time, or you can't afford to lose data. The queue gives you a buffer — it absorbs the spike and lets workers process at whatever pace the database can handle.
Think exam submissions, checkout during sales, ticket purchases, push notifications — anything where everyone hits the system at once.
How the Queue Architecture Works
Here's how it works under the hood:
1. API Layer (Fast Response)
The API receives the exam submission and publishes it to NATS JetStream. No database call here — just a message publish (~2ms).
// Pseudocode
func submitExam(c *fiber.Ctx) error {
submission := parseRequest(c)
// Publish to queue (non-blocking, ~2ms)
err := nats.Publish("exam.submissions", submission)
if err != nil {
return err
}
// Immediately respond to user
return c.JSON(Response{Status: "accepted", ID: submission.ID})
}
2. Message Queue (Shock Absorber)
NATS JetStream stores messages durably. Even if workers are slow, messages aren't lost. The queue absorbs traffic spikes and releases them at a steady pace.
3. Workers (Controlled Processing)
10 concurrent workers pull messages from the queue. Each worker processes in batches of 100 records — reducing database round trips.
// Pseudocode
func worker() {
for {
messages := queue.FetchBatch(100) // Get 100 messages
submissions := parseAll(messages)
db.BatchInsert(submissions) // Single DB call for 100 records
queue.AckAll(messages)
}
}
4. Database (Happy and Healthy)
Instead of 3,000 concurrent connections, the database sees a steady stream from 10 workers. CPU stays at 35-60%. No deadlocks. No timeouts.
What I Learned
Connection pools have hard limits. Doesn't matter how fast your database is — 3,000 requests can't share 100 connections without waiting. I learned this the hard way.
Queues trade immediate consistency for reliability. The user gets a fast "accepted" response, but the data isn't in the database yet. For most write-heavy workloads, that tradeoff is worth it.
The performance difference is huge. Not 2x or 3x — we're talking 60-85x faster response times.
Queues handle spikes gracefully. During a traffic spike, the queue just grows. Nothing crashes. When traffic drops, workers catch up. No 2 AM pages.
Architecture matters more than hardware. My 1,000-user system needed 4 CPU and 8GB RAM. This queue-based test handled 10,000 users on 1 CPU and 512MB.
Wish I'd known this earlier. Could've saved money and stress.
Wrapping Up
Queues aren't always the answer. But for traffic spikes and heavy writes? Worth considering.
Looking back at my exam systems with their beefy servers, I wish I'd tried this sooner. Usually you trade performance for cost — this gave me both.
Next time you're building something write-heavy, ask yourself: what happens when everyone hits it at once?
Code is on GitHub: jufianto/blog-resource/exam-app-queue
Further Reading
Have you dealt with database bottlenecks under load? I'd like to hear about it.


Top comments (0)