io_uring
is a new asynchronous I/O API for Linux created by Jens Axboe from Facebook.
It aims to provide an API without the limitations of similar interfaces
-
read(2)
/write(3)
are synchronous -
aio_read(3)
/aio_write(3)
provide asynchronous functionality, but only supports with files opened withO_DIRECT
or in unbuffered mode -
select(2)
/poll(2)
/epoll(7)
work well with socks but do not behave as expected with regular files (always “ready”)
To have a more consistency API between file descriptors (sockets and regular files) we can use libuv
(will probably explore it in the future) or liburing/io_uring
(the star of the show).
How does it work?
As the name suggests, it uses ring buffers as the main interface for kernel-user space communication.
There are two ring buffers, one for submission of requests (submission queue or SQ) and the other that informs you about completion of those requests (completion queue or CQ).
These ring buffers are shared between kernel and user space.
- Set ring buffers up with
io_uring_setup()
and then map them into user space with twommap(2)
calls - Create a submission queue entry (SQE) describing what operation you want to perform (read or write a file, accept client connections, etc.) and add it to SQ
- Call
io_uring_enter()
syscall to signal SQEs are ready to be processed- Multiple SQEs can be added before making the syscall
-
io_uring_enter()
can also wait for requests to be processed by the kernel before it returns, so you know you’re ready to read off the completion queue for results
- Requests are processed by the kernel and completion queue events (CQEs) are added to the CQ
- Read CQEs off the head of the completion queue ring buffer. There is one CQE corresponding to each SQE and it contains the status of that particular request
Ordering in the CQ may not correspond to the request order in the SQ. This may happen because all requests are performed in parallel, and their results will be added to the CQ as they become available. This is done for performance reasons. If a file is on an HDD and another on an SSD, we don’t want the HDD request to block the faster SSD request.
There is a polling mode available, in which the kernel polls for new entries in the submission queue. This avoids the syscall overhead of calling io_uring_enter()
every time you submit entries for processing.
Because of the shared ring buffers between the kernel and user space, io_uring can be a zero-copy system.
How to use it?
Most sources indicate that the kernel interface was adopted in Linux kernel version 5.1. But from what I saw in the linux git, the linux/io_ring
is only present in linux 6.0 (does anyone know where it might be declared in previous versions?).
There is also a liburing
library that provides an API to interact with the kernel interface easily from userspace.
I will eventually try to interact with io_uring
using Go, so keep an eye on future articles if that interests you.
Top comments (0)