DEV Community

Discussion on: Linux buffered write latency

 
fritshooglandyugabyte profile image
Frits Hoogland

IO is a fascinating topic :-D

There is an important caveat you touched upon: direct IO.

The concept of direct IO is to prevent an IO (read or write) from being managed by an operating system cache layer, and read or write directly from or to the block device. There are many different reasons why you would want that. For the sake of this answer, one of the differences of direct IO from buffered IO is that it doesn't require a page in the page cache to be allocated, and therefore is exempt from throttling. In fact, direct IO bypasses that layer of code in the kernel entirely.

That means that you can have a situation where lots of processes writing via buffered IO and get throttled, because the kernel must prevent it from getting flooded, whilst another process doing direct IO is happily performing IO.

But direct IO requires careful thought: it is not a silver bullet, instead it's a double edged sword: when performing low levels of writes with enough available memory, writes perform at really low latency, because it's a memory copy (this is simplified), not a memory to disk copy. If you enable direct IO, writes will require the writes to be written to the block device 'physically', and thus will always incur the latency of the block device.

Like I explained in the article, linux does not really have a concept of a page cache, but instead stores different types of pages with variable limits, and applies special rules for dirty pages. In fact, buffered IO is contesting for memory just like applications allocating memory, and if cached pages get a higher touch count than other pages, such as memory mapped pages from an executable of a daemon, and memory allocation get to a certain limit, it can make these lesser used pages be 'swapped'.

This is a reason why you might see unexplainable swap allocation on a carefully tuned server when it performs backups: a backup copies all datafiles into a backup file, allocating pages for it, and then when the backup is copied into its backup location, these backup file pages are touched again, and thus given higher priority, which might make some never used pages of executables, such as bootstrap code, be swapped to disk.