---
title: "Ktor at 50K Connections: Coroutines vs Virtual Threads on a $20 VPS"
published: true
description: "A hands-on walkthrough benchmarking Ktor coroutines against JVM virtual threads at 50K concurrent connections — and why a single VPS beats Kubernetes for early-stage startups."
tags: kotlin, architecture, performance, cloud
canonical_url: https://blog.mvpfactory.co/ktor-50k-connections-coroutines-vs-virtual-threads
---
## What You Will Learn
In this workshop, we will benchmark two high-concurrency models available in Ktor — **Kotlin coroutines** and **JDK 21 virtual threads (Project Loom)** — serving 50K concurrent connections on a single 4-core VPS. By the end, you will have a working Ktor configuration, understand the memory and concurrency tradeoffs between both models, and know exactly when a $20 VPS stops being enough.
Let me show you a pattern I use in every project that saves real money at the early stage.
## Prerequisites
- JDK 21+ installed
- Ktor 2.3.x project (Netty engine)
- A VPS with at least 4 vCPU and 8 GB RAM (Hetzner CPX31 or equivalent)
- Familiarity with Kotlin coroutines basics
## Step 1: Tune the OS
Before touching application code, raise the OS limits. This is the actual bottleneck most people miss.
bash
ulimit -n 131072
sysctl -w net.core.somaxconn=65535
sysctl -w net.ipv4.ip_local_port_range="1024 65535"
Without these, your server will reject connections long before Ktor or the JVM become the limiting factor.
## Step 2: Configure Your Ktor Server
Here is the minimal setup to get this working:
kotlin
embeddedServer(Netty, port = 8080) {
install(ContentNegotiation) { json() }
install(Compression) { gzip() }
}.start(wait = true)
Launch with ZGC for predictable low-latency GC:
plaintext
-XX:+UseZGC -XX:MaxRAMPercentage=75.0
Netty's default pipeline handles high connection counts once OS limits are raised. If you need an explicit ceiling, add a `ChannelHandler` that tracks active connections and rejects beyond your threshold — there is no built-in Ktor property for this.
## Step 3: Understand the Memory Difference
All figures come from k6 load tests against a Hetzner CPX31 (4 vCPU AMD, 8 GB RAM, Ubuntu 22.04). The test held 50K concurrent WebSocket connections idle with a keep-alive ping every 30 seconds, plus sustained 2K RPS of JSON GET requests. Memory measured as RSS via `ps` and heap via JMX after 10-minute steady state with ZGC.
| Metric | Coroutines | Virtual Threads | Platform Threads |
|---|---|---|---|
| Stack memory per task | ~256 bytes–few KB | ~1–2 KB | ~1 MB |
| 50K idle connections (RSS) | ~1.2 GB | ~2.5 GB | ~50 GB (infeasible) |
| Context switch cost | Continuation resume (ns) | Carrier mount/unmount (μs) | Full OS switch (μs) |
Coroutines win because they are compiler-transformed state machines on the heap — no stack frames until they resume. Virtual threads carry a growable stack: far lighter than platform threads, but measurably heavier than a suspended coroutine.
## Step 4: Use Structured Concurrency
kotlin
get("/dashboard/{userId}") {
val (profile, metrics) = coroutineScope {
val p = async { userService.getProfile(userId) }
val m = async { analyticsService.getMetrics(userId) }
p.await() to m.await()
}
call.respond(DashboardResponse(profile, metrics))
}
If the client disconnects, both coroutines cancel automatically. With virtual threads, you wire up `ExecutorService` shutdown logic manually. JEP 453 (Structured Concurrency) aims to close this gap, but it is still in preview as of JDK 23.
## Gotchas
**The thread-pinning trap.** The docs do not mention this, but `synchronized` blocks pin virtual threads to carrier threads. Kotlin generates `synchronized` in places you might not expect: `lazy` delegates, certain `companion object` initializations, and some coroutine internals.
kotlin
// BAD: This PINS a virtual thread to its carrier
val cached: ExpensiveResource by lazy {
loadFromDatabase() // synchronized under the hood
}
// GOOD: Suspend-aware double-checked lock
private val mutex = Mutex()
private var cached: ExpensiveResource? = null
suspend fun getResource(): ExpensiveResource {
cached?.let { return it }
return mutex.withLock {
cached ?: loadFromDatabase().also { cached = it }
}
}
Here is the gotcha that will save you hours: if you do use virtual threads, run staging with `-Djdk.tracePinnedThreads=short` to detect pinning before production.
**Over-engineering infrastructure.** A $20 Hetzner VPS handles 50K connections. GKE Autopilot runs ~$150/month, EKS ~$190/month — a 7–10x cost difference. For a typical mobile backend (REST + JSON, health checks, push notification registration), the single VPS handles the load cleanly under 100K DAU. On a single node, use systemd watchdog for auto-restart, blue-green deploys via nginx upstream switching, and journald + Vector for log shipping.
**Ignoring your own health.** Seriously — during long benchmark sessions and deploy marathons, I keep [HealthyDesk](https://play.google.com/store/apps/details?id=com.healthydesk) running for break reminders and guided desk exercises. Your server handles 50K connections; your spine should not have to.
## Conclusion
Default to Ktor coroutines over virtual threads. Memory efficiency is measurably better, structured concurrency is built in, and you sidestep thread-pinning bugs. Delay Kubernetes until you actually exceed single-node capacity — measure your real DAU first, not what you hope it will be.
If you do use virtual threads, audit every `lazy`, `companion object`, and lock in your codebase. Replace with `Mutex`-guarded patterns and trace pinned threads in staging.
**Further reading:** [JEP 453: Structured Concurrency (Preview)](https://openjdk.org/jeps/453) tracks how virtual threads are closing the gap that coroutines handle today.
Top comments (0)