Manan Shukla

Posted on Mar 3

How Tokio in Rust Actually Works: Runtimes, Tasks, and the Magic Behind `async fn`

#rust #tokio #programming #learning

If you've written async Rust, you've typed #[tokio::main] dozens of times. You've stared at it. You've trusted it. You've never once questioned what it actually does — because it works, and we don't interrogate things that work.

Until they don't. And then you're at 2am, runtime panicking, no idea why, Googling "tokio block_on nested" like a person who has made some choices.

This post is for that version of you. We're tearing open the black box — how Tokio's runtime is structured, what happens when you await something, how the scheduler keeps your CPU cores busy, and what that macro expands into. By the end, async Rust should feel a lot less like voodoo and a lot more like something you actually understand.

The Problem with Threads (Or: Why We Can't Have Nice Things)

Let's start with why async exists at all. OS threads are the classic way to handle concurrency, and they're genuinely great. The problem is they're also kind of... needy.

// Spawning OS threads — simple, familiar, a little reckless
use std::thread;

fn main() {
    let handles: Vec<_> = (0..10_000)
        .map(|i| {
            thread::spawn(move || {
                thread::sleep(std::time::Duration::from_secs(1));
                println!("Thread {} done", i);
            })
        })
        .collect();

    for h in handles {
        h.join().unwrap();
    }
}

Go ahead, run it. On most Linux systems, 10,000 threads will burn through 80–160 MB of RAM just for stacks — and that's before they do a single useful thing. The kernel also has to context-switch between all of them: saving registers, flushing caches, pretending each thread is the most important thing in the world. It adds up fast.

Each thread is basically a golden retriever. Full of energy, needs its own space, demands constant attention, and if you have 10,000 of them your house (server) is destroyed.

Now the async version:

// Spawning Tokio tasks — same result, dramatically less chaos
#[tokio::main]
async fn main() {
    let handles: Vec<_> = (0..10_000)
        .map(|i| {
            tokio::spawn(async move {
                tokio::time::sleep(std::time::Duration::from_secs(1)).await;
                println!("Task {} done", i);
            })
        })
        .collect();

    for h in handles {
        h.await.unwrap();
    }
}

Same result. But Tokio tasks aren't OS threads — they're tiny state machines living entirely in user space. A task waiting on I/O consumes almost nothing. No kernel involvement, no stack sitting idle eating RAM, no context switching. It's like replacing 10,000 golden retrievers with 10,000 cats — they'll all just sit there doing nothing until something interesting happens, and that's perfectly fine.

The trade-off: async is more complex to understand. Worth it? For I/O-bound workloads, absolutely.

How Tokio Actually Works

The Reactor Model (A.K.A. "Don't call us, we'll call you")

Tokio is built on a reactor pattern. Instead of a thread sitting there blocking on an I/O call like an anxious person refreshing their email, Tokio registers interest with the OS and parks the thread. When data actually arrives, the OS taps Tokio on the shoulder, and Tokio wakes the right task.

The OS-level mechanism differs by platform:

Linux: epoll
macOS/BSD: kqueue
Windows: IOCP (I/O Completion Ports)

Tokio wraps all of this via the mio crate, so you get one clean async API regardless of platform. You don't need to think about any of this — but it's satisfying to know it exists.

Here's the rough mental model:

Your async tasks
      │
      ▼
Tokio Scheduler  ◄──── wakes tasks when ready
      │
      ▼
  I/O Driver (mio)
      │
      ▼
  OS (epoll / kqueue / IOCP)
      │
      ▼
  Network, Files, Timers

When a task awaits a network read that isn't ready yet, it doesn't spin. It tells the I/O driver "wake me when this socket has data," and the thread moves on to run something else. This is how a single thread can juggle thousands of concurrent connections without breaking a sweat.

The `Future` Trait and the Poll Mechanism (The "Are We There Yet?" of Programming)

Every async fn in Rust compiles down to a type that implements the Future trait. You can think of a Future as a state machine that Tokio pokes repeatedly until it's done.

pub trait Future {
    type Output;
    fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output>;
}

Poll has two variants:

Poll::Ready(value) — done! Here's your value.
Poll::Pending — not yet, come back later

Tokio is essentially a machine that calls poll() on tasks and handles the Pending responses politely instead of panicking. But here's the key question: how does Tokio know when "later" is? That's what the Waker is for.

The Context passed to poll carries a Waker — basically a callback. When your future can't make progress (waiting on a TCP read, for example), it clones the Waker and stores it somewhere safe. When the OS eventually signals that data has arrived, Tokio uses that stored Waker to re-queue the task. It's like leaving your number at a restaurant instead of standing in the doorway blocking everyone.

Let's demystify this completely by hand-rolling a minimal future:

use std::future::Future;
use std::pin::Pin;
use std::sync::{Arc, Mutex};
use std::task::{Context, Poll, Waker};
use std::thread;
use std::time::Duration;

// A future that resolves after a background thread finishes some "work"
// (in this case, sleeping, which is relatable)
struct DelayedValue {
    value: Arc<Mutex<Option<u32>>>,
    waker: Arc<Mutex<Option<Waker>>>,
}

impl DelayedValue {
    fn new(delay: Duration, value: u32) -> Self {
        let shared_value = Arc::new(Mutex::new(None));
        let shared_waker: Arc<Mutex<Option<Waker>>> = Arc::new(Mutex::new(None));

        let v = shared_value.clone();
        let w = shared_waker.clone();

        thread::spawn(move || {
            thread::sleep(delay);
            *v.lock().unwrap() = Some(value);

            // "Hey Tokio, I'm ready now" — this is the entire magic
            if let Some(waker) = w.lock().unwrap().take() {
                waker.wake();
            }
        });

        DelayedValue {
            value: shared_value,
            waker: shared_waker,
        }
    }
}

impl Future for DelayedValue {
    type Output = u32;

    fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
        let value = self.value.lock().unwrap();

        if let Some(v) = *value {
            Poll::Ready(v) // Done!
        } else {
            // Leave our number with the waker so we get called back
            *self.waker.lock().unwrap() = Some(cx.waker().clone());
            Poll::Pending // "Not yet, but I promise I'll call"
        }
    }
}

#[tokio::main]
async fn main() {
    let result = DelayedValue::new(Duration::from_millis(100), 42).await;
    println!("Got: {}", result); // Got: 42
}

That's the whole trick. Return Pending, store the waker, call waker.wake() when ready. No magic, no wizardry — just a very disciplined callback system. Every tokio::time::sleep, every TcpStream::read, every channel.recv() is doing exactly this under the hood.

The Task Scheduler and Work Stealing (Robin Hood, but for CPU Cores)

Tokio's multi-threaded runtime spawns one worker thread per logical CPU core. Each worker has its own local task queue. When you call tokio::spawn(...), the new task lands in the spawning thread's queue.

Now here's where it gets elegant: what happens when one thread's queue is empty while another is drowning in work? Work stealing. An idle thread reaches into a busy thread's queue and takes tasks from the back — quietly, without asking, like borrowing someone's chips at the office.

Thread 0 queue: [Task A, Task B, Task C]  ← very busy
Thread 1 queue: []                         ← bored, about to steal
Thread 2 queue: [Task D]
Thread 3 queue: [Task E, Task F]

After stealing:
Thread 0: [Task A, Task B]
Thread 1: [Task C]  ← got something to do now

This is completely invisible to you as a developer, which is the point. You don't partition work across threads. You don't think about which core handles which connection. You just spawn tasks, and Tokio distributes the load. It's the async equivalent of "it just works" — and in this case it actually does.

Unwrapping `#[tokio::main]` (Spoiler: It's Not That Deep)

Let's look at what the macro actually expands to. Install cargo-expand and try it yourself:

cargo install cargo-expand
cargo expand

You write this:

#[tokio::main]
async fn main() {
    println!("Hello from async!");
}

And the macro generates roughly this:

fn main() {
    tokio::runtime::Builder::new_multi_thread()
        .enable_all()
        .build()
        .unwrap()
        .block_on(async {
            println!("Hello from async!");
        })
}

That's it. Genuinely. That's the whole macro. It builds a multi_thread runtime, enables all the drivers (I/O, time), and calls .block_on() which runs your async main and blocks the OS thread until it completes.

block_on is the bridge between the synchronous world (your process entry point) and the async world. Everything inside the closure is async. Everything outside is normal, boring, synchronous Rust. The macro just spares you from writing the boilerplate every time — which is kind, because nobody wants to write that every time.

Runtime Flavours — Pick Your Fighter

Tokio ships two runtime flavours out of the box, plus a builder for when you want to get weird about it.

`multi_thread` — The Default

// This:
#[tokio::main]
async fn main() { /* ... */ }

// Is the same as this:
#[tokio::main(flavor = "multi_thread", worker_threads = 4)]
async fn main() { /* ... */ }
// (worker_threads defaults to your CPU count if you omit it)

Multiple worker threads, work-stealing scheduler, I/O tasks running in parallel. This is what you want for servers, gRPC services, anything handling concurrent I/O.

The default worker_threads count equals your number of logical CPU cores. On a 4-core machine you get 4 workers. On a 32-core beast you get 32. You can override it, but the default is usually sensible.

When to use it: Almost always. If you're writing a service that handles requests, use this.

`current_thread` — The Minimalist

#[tokio::main(flavor = "current_thread")]
async fn main() {
    // Everything runs on this one thread. Just this one. Cooperatively.
}

No parallelism. Tasks take turns on a single thread. It's cheaper to set up, has zero synchronization overhead, and works in environments that simply don't support multiple threads — like WebAssembly.

Here's the gotcha that will ruin your day if you forget it: any blocking call blocks everything. With multi_thread, one frozen thread leaves the others running. With current_thread, if one task decides to call std::fs::read_to_string synchronously, every single other task in your runtime grinds to a halt and stares at the ceiling until it finishes. Every. Single. One.

// current_thread + blocking call = everything stops
#[tokio::main(flavor = "current_thread")]
async fn main() {
    tokio::spawn(async {
        std::thread::sleep(std::time::Duration::from_secs(5)); // 💀 blocks all tasks
    });

    // This won't run for 5 seconds. You wanted async. You did not get async.
    tokio::spawn(async {
        println!("I thought I'd run concurrently... I was wrong");
    });
}

When to use it: CLI tools, scripts, tests, WASM, embedded. Anywhere you want async APIs without the overhead of a thread pool.

`Runtime::Builder` — For When You Want Full Control

Sometimes the macro isn't enough. You want to name your threads so they show up nicely in htop. You want a specific stack size. You want lifecycle hooks. This is where Runtime::Builder comes in:

use tokio::runtime::Builder;

fn main() {
    let runtime = Builder::new_multi_thread()
        .worker_threads(8)                    // Don't let Tokio decide
        .thread_name("my-worker")             // Show up nicely in profilers
        .thread_stack_size(3 * 1024 * 1024)   // 3 MB instead of 8 MB — save some RAM
        .on_thread_start(|| println!("Worker online"))
        .on_thread_stop(|| println!("Worker going home"))
        .enable_io()
        .enable_time()
        .build()
        .unwrap();

    runtime.block_on(async {
        // Your app here
    });
}

A particularly powerful pattern is running two separate runtimes — one for I/O tasks and one for CPU-heavy work. Think of it like having a front-of-house team handling requests and a kitchen team doing the actual cooking. You don't want them stealing each other's threads:

use tokio::runtime::Builder;
use std::thread;

fn main() {
    // Front of house: handles all the async I/O
    let io_runtime = Builder::new_multi_thread()
        .worker_threads(16)
        .thread_name("io-worker")
        .enable_all()
        .build()
        .unwrap();

    // Kitchen: CPU-intensive work, fewer threads (match physical cores)
    let cpu_runtime = Builder::new_multi_thread()
        .worker_threads(4)
        .thread_name("cpu-worker")
        .build()
        .unwrap();

    let io_handle = io_runtime.handle().clone();

    thread::spawn(move || {
        cpu_runtime.block_on(async move {
            // Heavy compute here — dispatch I/O back to the other runtime
            io_handle.spawn(async { /* network calls, DB queries, etc. */ });
        });
    });

    io_runtime.block_on(async {
        // Main entry point
    });
}

This is the kind of thing you reach for when building a database engine, a media transcoder, or anything where "just use one runtime" stops being good enough.

Async vs. Threads: An Honest Side-by-Side

Let's put both approaches next to each other for the same job: handling 1,000 concurrent "connections" that each wait 100ms.

With OS threads:

use std::thread;
use std::time::Duration;

fn handle_connection(id: u32) {
    // The whole thread just... sits here. Blocking. Thinking about its life choices.
    thread::sleep(Duration::from_millis(100));
    println!("Connection {} handled", id);
}

fn main() {
    let handles: Vec<_> = (0..1_000)
        .map(|i| thread::spawn(move || handle_connection(i)))
        .collect();

    for h in handles {
        h.join().unwrap();
    }
}
// ~1000 threads. Heavy. Memory-hungry. Gets the job done, but at a cost.

With Tokio:

use tokio::time::{sleep, Duration};

async fn handle_connection(id: u32) {
    // Yields control here — other tasks run while we wait
    sleep(Duration::from_millis(100)).await;
    println!("Connection {} handled", id);
}

#[tokio::main]
async fn main() {
    let handles: Vec<_> = (0..1_000)
        .map(|i| tokio::spawn(handle_connection(i)))
        .collect();

    for h in handles {
        h.await.unwrap();
    }
}
// ~1000 tasks. Tiny footprint. Scales to 100,000 without breaking a sweat.

Both complete in about 100ms. The difference is in what they consume getting there.

When Threads Are Actually the Right Answer

I want to be honest with you: async is not always better. For CPU-bound work — compressing files, running cryptography, processing images — async can actively hurt you. A heavy computation holds a thread and can't yield, which means it's blocking the scheduler from running other tasks on that thread. You've accidentally built a traffic jam.

For CPU work, the right tools are:

tokio::task::spawn_blocking — ships the work to a dedicated blocking thread pool, away from your async workers
rayon — a data-parallelism library built specifically for this

// ❌ Don't do this — you're holding up the async scheduler
tokio::spawn(async {
    let result = crunch_massive_dataset(); // Blocks the thread. Other tasks: 😤
    result
});

// ✅ Do this instead
tokio::spawn(async {
    let result = tokio::task::spawn_blocking(|| {
        crunch_massive_dataset() // Runs on a separate blocking thread pool
    }).await.unwrap();
    result
});

spawn_blocking is your escape hatch. Use it whenever you need to call synchronous, potentially slow code from inside an async context.

The Mental Model, All Together

Here's the complete picture of what happens every time you await something:

You write async fn — Rust compiles it into a state machine implementing Future
tokio::spawn drops that future into the scheduler's task queue
A worker thread picks it up and calls poll()
If the future needs to wait (I/O, timer, channel), it stores a Waker and returns Poll::Pending
The I/O driver watches the underlying OS event (epoll/kqueue/IOCP)
When the event fires, the driver calls waker.wake(), re-queuing the task
A worker thread (could be a different one — work stealing!) picks it up and polls again
Repeat until Poll::Ready — task is complete, handle is resolved

And here's your quick reference for picking the right tool:

Scenario	Reach For
Concurrent I/O (HTTP, sockets, DB)	`tokio::spawn` + `async fn`
CPU-bound work (compute, encoding)	`spawn_blocking` or `rayon`
"I understand async but don't want it right now"	`std::thread`
Single-threaded env / WASM / CLI	`current_thread` flavor
Fine-grained runtime control	`Runtime::Builder`
Simple async binary	`#[tokio::main]` and don't overthink it

Wrapping Up

Tokio isn't magic. It's a well-engineered scheduler sitting on top of OS primitives you already know. #[tokio::main] is just Runtime::new().block_on(...). Tasks are futures being polled by worker threads. The work-stealing scheduler is why you don't have to think about load balancing. And the difference between current_thread and multi_thread is the difference between a fine Tuesday and a very confusing production incident.

Understanding this doesn't mean you'll reimplement Tokio — please don't. But it does mean that next time something goes sideways in your async code, you'll have a mental model to debug against instead of just staring at the error and hoping it fixes itself.

(It won't fix itself. It never does.)

Next up in this series: Shared State in Rust — Mutex, RwLock, and ArcSwap. What they are, when to use each, and how they actually perform under real contention. Spoiler: the answer is not always "just use a Mutex."

DEV Community

How Tokio in Rust Actually Works: Runtimes, Tasks, and the Magic Behind `async fn`

The Problem with Threads (Or: Why We Can't Have Nice Things)

How Tokio Actually Works

The Reactor Model (A.K.A. "Don't call us, we'll call you")

The `Future` Trait and the Poll Mechanism (The "Are We There Yet?" of Programming)

The Task Scheduler and Work Stealing (Robin Hood, but for CPU Cores)

Unwrapping `#[tokio::main]` (Spoiler: It's Not That Deep)

Runtime Flavours — Pick Your Fighter

`multi_thread` — The Default

`current_thread` — The Minimalist

`Runtime::Builder` — For When You Want Full Control

Async vs. Threads: An Honest Side-by-Side

When Threads Are Actually the Right Answer

The Mental Model, All Together

Wrapping Up

Top comments (0)

The Problem with Threads (Or: Why We Can't Have Nice Things)

How Tokio Actually Works

The Reactor Model (A.K.A. "Don't call us, we'll call you")

The Future Trait and the Poll Mechanism (The "Are We There Yet?" of Programming)

The Task Scheduler and Work Stealing (Robin Hood, but for CPU Cores)

Unwrapping #[tokio::main] (Spoiler: It's Not That Deep)

Runtime Flavours — Pick Your Fighter

multi_thread — The Default

current_thread — The Minimalist

Runtime::Builder — For When You Want Full Control

Async vs. Threads: An Honest Side-by-Side

When Threads Are Actually the Right Answer

The Mental Model, All Together

Wrapping Up

The `Future` Trait and the Poll Mechanism (The "Are We There Yet?" of Programming)

Unwrapping `#[tokio::main]` (Spoiler: It's Not That Deep)

`multi_thread` — The Default

`current_thread` — The Minimalist

`Runtime::Builder` — For When You Want Full Control