author: Sospeter Kinyanjui
intro
For the longest time, in linux there just existed epoll, a I/O notification facility that enables applications to make read/write system calls to the linux kernel. epoll first emerged in the Linux Kernel 2.5.44 back in 2002 which later gained mainstream adoption at 2.6 on 2003. epoll uses the readiness-model to make system calls to the kernel, epoll_create, epoll_ctl and epoll_wait. The kernel notifies applications when resources are available in order for apps to submit tasks. This makes it O(1) complexity, which means it's speed stays the same whether you're watching 10 connections or 10,000 connections. And since making system calls are the only way for applications to communicate with the kernel to access computer resources, we might consider this behavior inefficient. And that presents the syscall tax problem, making expensive context switching system calls for every connections while switching between user mode and kernel mode. Not until 2019 was io_uring Linux Kernel System call for I/O asynchronous operations came.
First let's define asynchronous execution: it's the ability of an application to start a long-running task and continue responsive execution rather than waiting for that task to finish. This makes better utilization of computer resources. Considering epoll was event-driven, mimicking such behavior in programs can be considered an illusion. io_uring aimed at making less system calls by batching multiple read/write requests then making one. as a result, multiple read and write operations do no have to wait for each other.
definition and implementation
Technically, io_uring exposes three calls: io_uring_setup(2), io_uring_enter(2) and io_uring_register(2).
A call to io_uring_setup initializes the io_uring context by creating the submission queue (SQ) and completion queue (CQ), returning a file descriptor to the application. It takes parameters to configure it's primary data structures, rings, their sizes, the head, tail, ring_mask and ring_entries. As you can imagine a ring follows the FIFO thing. The first task on the sequence queue will be the first to be passed to the Completion Queue Entry. The CQEs are also structured similarly. A call to io_uring_setup leads to a creation of a shared memory with the size specified comprising of SEQs and CQEs sections. For SEQs the user/application space only have writing permissios while kernel permissions are only granted for reading. For CQEs the kernel only has writing permissions while the user/application has just reading permissions. Apparently to keep things performant, they use the Single Producer / Single Consumer logic.
A call to io_uring_enter is the engine starter for the whole operation. It tells the kernel: "I've put some SQEs in the ring; go process them." This is how it looks like in the C structure:
#include <linux/io_uring.h>
int io_uring_enter(unsigned int fd, unsigned int to_submit,
unsigned int min_complete, unsigned int flags,
sigset_t *sig);
If io_uring_enter is the engine starter, io_uring_register is the "VIP pass" for your data. It is a system call used to pre-register resources (like memory buffers or files) with the kernel. By doing this, you remove the overhead of the kernel having to "look up" or "map" these resources every time you perform an I/O operation. With those three pieces, asynchronous programming in linux has never been the same again.
There are existing C wrappers around these kernel level calls, liburing and libc does a great job. But there is another hand in play for leveraging io_uring's capabilities in building asynchronous applications, Rust. Just imagine that, it solves the fundamental conflict: Completion Based I/O (the whole concept of io_uring) but with memory safety. In io_uring, the kernel takes control of your memory buffers to perform I/O asynchronously. This creates a "danger zone" where both the kernel and your application could try to access the same memory at once. The Rust compiler ensures your code cannot even touch that buffer until the kernel returns it in a Completion Queue Entry (CQE). Existing Rust wrappers for this Linux Kernel I/O API are some including tokio's io-uring, tokio-uring and glammio among others.
there's more
Implementing a Completion-Based mechanism for I/O operations at the kernel level can be more useful in today's real world not just for linux. Yeah, Windows has had such technology, it's called I/O completion ports (IOCP). Even though iocp is asynchronous it still requires a system call for every request which makes the 'tax' a little higher compared to io_uring's. The standard way for doing this in MacOS is by using the kqueue API. It is still readiness-based. You have to call kevent to find out what is ready, then make separate system calls to actually perform the I/O. It suffers from the same syscall overhead that io_uring was designed to kill. Which makes io_uring a true asynchronous programming linux system calls API.
But here's the thing about finding technological solutions, you don't have to solve everything. Someone I once looked up to once told me, "Be the best cog, but keep in mind you're not the only one in the machine." From my own point of view, I think the real problem lies in creating an asynchronous runtime that is cross-platform and completion-based. Such a technology exist, we have compio, a Rust framework for asynchronous I/O operations. What makes it less perfect are two things: it's arguably doesn't have zero-cost abstraction and it uses fixed buffers(part of io_urings's design) which are immutable references. Most of the Rust ecosystem is built on top of std::io::Read and std::io::Write traits which take this mutable references. Compio? It primarily emphasizes the ownership of buffers rather than borrowing them. Which as we've seen aligns with io_uring completion-based model, but then it poses a real problem with integration with the rest of the ecosystem.
But again, like I said, we just have to be here, implementing one solution at a time. By believing in ourselves even when it seems impossible. Until next time, peace, focus, desire.
You can checkout others of my blogs.
Follow me on:
On github: https://github.com/Sospeter-M-Kinyanjui
On LinkedIn: https://www.linkedin.com/in/sospeter-kinyanjui-6a091b2a5/

Top comments (0)