amir

Posted on May 22

How I Analyzed the Linux Kernel's Deadliest Logic Bug: A Deep Dive into Dirty Pipe (CVE-2022-0847)

#linux #security #kernel #c

As developers, we often think of kernel exploits as highly complex assembly-level wizardry, heap grooming, or race-condition battles. But recently, I decided to sit down, pull up the Linux kernel source code, and trace the infamous Dirty Pipe vulnerability, CVE-2022-0847, line by line.

What I found was mind-blowing: a simple, uninitialized struct member in the core memory-management path allowed an unprivileged local user to write into read-only files through the Page Cache.

No race conditions.

No classic memory corruption.

No heap spraying.

Just one stale flag in a reused kernel structure.

This is my technical post-mortem and step-by-step code analysis of how this elegant logic bug worked.

The Conceptual Backstory: Page Cache, Pipes, and `splice()`

Before looking at the buggy code, we need to understand the three Linux kernel mechanisms that collided to create Dirty Pipe:

The Page Cache
Pipe buffers
The splice() system call

1. The Page Cache: RAM as a Disk Mirror

To avoid slow disk reads, Linux keeps recently accessed file data in memory. This memory-backed representation is called the Page Cache.

When multiple processes read the same file, for example /etc/passwd, the kernel does not necessarily load separate copies for every process. Instead, it can map those processes to the same physical memory page that represents the file's cached content.

Normally, if a process tries to write to a page without write permission, the kernel's Copy-on-Write mechanism protects the original data:

The original page remains unchanged.
A private copy is created.
The process writes to that private copy.
The read-only backing file remains safe.

That is the expected contract.

Dirty Pipe broke that contract.

2. The Pipe Buffer

In Linux, a pipe is implemented as a circular ring of buffers represented internally by struct pipe_inode_info.

Each slot in that ring is a struct pipe_buffer, defined in include/linux/pipe_fs_i.h:

struct pipe_buffer {
    struct page *page;
    unsigned int offset, len;
    const struct pipe_buf_operations *ops;
    unsigned int flags; // <-- the field that matters here
    unsigned long private;
};

The important field is:

unsigned int flags;

When data is written to a pipe, the kernel may allocate page-sized buffers, usually 4 KB. If the write does not fill the whole page, the kernel can mark that buffer as mergeable by setting:

PIPE_BUF_FLAG_CAN_MERGE

That flag tells the kernel:

New writes may be appended into the remaining space of this existing pipe buffer instead of allocating a new one.

That behavior is perfectly valid for normal anonymous pipe pages.

The problem appears when a pipe buffer stops pointing to a normal anonymous pipe page and starts pointing to a page from the Page Cache.

3. The `splice()` Syscall: Zero-Copy Magic

The splice() system call is a Linux performance optimization. It moves data between file descriptors and pipes without copying data back and forth through user space.

Instead of doing this:

file -> kernel buffer -> user space -> kernel pipe buffer

splice() can do something closer to this:

file page cache -> pipe buffer reference

That is powerful because it avoids unnecessary copying.

But it also means a pipe buffer can reference a page that belongs to the Page Cache of a file.

Internally, one of the relevant functions is:

copy_page_to_iter_pipe()

This function creates a pipe buffer that references the page containing file data.

That is where the bug lived.

Digging Into the Code: The Bug in `lib/iov_iter.c`

When splice() is used to map file data into a pipe, the kernel executes code similar to this vulnerable version of copy_page_to_iter_pipe():

static size_t copy_page_to_iter_pipe(struct page *page, size_t offset, size_t bytes,
                                     struct iov_iter *i)
{
    // ... validation steps ...
    struct pipe_inode_info *pipe = i->pipe;
    struct pipe_buffer *buf = &pipe->bufs[head & mask];

    buf->ops = &page_cache_pipe_buf_ops;
    get_page(page);
    buf->page = page;
    buf->offset = offset;
    buf->len = bytes;
    // What is missing here?

    pipe->head = head + 1;
    return bytes;
}

The missing line is the entire bug:

buf->flags = 0;

buf->flags was never initialized or cleared.

Because pipes are implemented as circular rings, the kernel reuses old pipe_buffer structures. If a previous operation left PIPE_BUF_FLAG_CAN_MERGE set, that stale flag could remain active when the same buffer slot was reused for Page Cache-backed file data.

That means a buffer referencing a read-only file page could accidentally still look mergeable.

That is the core of Dirty Pipe.

The Intersection of Two Commits

One thing I found especially interesting is that Dirty Pipe was not born from one obviously dangerous commit.

It came from the interaction of two separate changes:

1. Commit `241699cd72a8` — October 2016

This introduced the new pipe-backed iov_iter subsystem and added copy_page_to_iter_pipe().

The function did not initialize buf->flags.

At that time, this was not immediately exploitable because the dangerous merge flag did not exist yet.

2. Commit `f6dd975583bd` — May 2020

This added PIPE_BUF_FLAG_CAN_MERGE.

Suddenly, an old uninitialized field became security-critical.

That is the scary engineering lesson:

A harmless-looking initialization bug can become a critical vulnerability years later when another subsystem evolves.

Step-by-Step: How the Exploit Mechanics Worked

At a high level, the exploit forced the kernel into a bad state:

Prepare a pipe so all its internal buffer slots have PIPE_BUF_FLAG_CAN_MERGE set.
Drain the pipe so it becomes logically empty.
Use splice() to attach a read-only file's Page Cache page to a reused pipe buffer.
Because buf->flags was not cleared, the stale merge flag remains.
A later write to the pipe is merged into the Page Cache page.

The result: the in-memory cached representation of a read-only file is modified.

The disk file itself is not directly overwritten. The modification happens in the Page Cache.

Stage 1: Polluting the Pipe Buffers

The first step is to fill the pipe. This causes the kernel to allocate pipe buffers and mark them mergeable.

A simplified version looks like this:

int p[2];
pipe(p);

int capacity = fcntl(p[1], F_GETPIPE_SZ);
char dummy = 'A';

for (int r = capacity; r > 0; ) {
    int n = r > sizeof(dummy) ? sizeof(dummy) : r;
    write(p[1], &dummy, n);
    r -= n;
}

After this stage, the internal pipe buffer slots have been used and may contain PIPE_BUF_FLAG_CAN_MERGE.

Stage 2: Draining the Pipe

Next, the pipe is drained:

for (int r = capacity; r > 0; ) {
    int n = r > sizeof(dummy) ? sizeof(dummy) : r;
    read(p[0], &dummy, n);
    r -= n;
}

Now the pipe is logically empty.

But the kernel's internal pipe_buffer metadata is still there, ready to be reused.

The stale flags may still exist in those reused slots.

Stage 3: Splicing File Data into the Pipe

Then splice() is used to move data from a target file into the pipe without copying it through user space:

int fd = open("/path/to/read-only-file", O_RDONLY);
loff_t offset = 0;

splice(fd, &offset, p[1], NULL, 1, 0);

Behind the scenes, the kernel creates a pipe buffer that references the file's Page Cache page.

But because buf->flags was not cleared, the buffer may still have the old merge flag.

Now we have a dangerous state:

pipe_buffer.page  -> file Page Cache page
pipe_buffer.flags -> PIPE_BUF_FLAG_CAN_MERGE

That should never happen.

Stage 4: Writing into the Pipe

A subsequent write to the pipe is then treated as mergeable.

The kernel thinks it is appending data into a normal anonymous pipe page.

In reality, the buffer points to a file-backed Page Cache page.

So the write lands inside the cached file page.

That is why Dirty Pipe could modify the in-memory contents of files that the attacker should not have been able to write.

Why Dirty Pipe Was So Dangerous

Dirty Pipe was terrifying because it was not a fragile exploit.

No Race Condition

Dirty COW, CVE-2016-5195, depended on winning a race condition. Dirty Pipe did not.

There was no timing window to win.

No Classic Memory Corruption

This was not a buffer overflow or heap corruption bug.

The kernel was following its own logic, but that logic was operating on stale state.

High Reliability

Once the vulnerable state was created, the behavior was deterministic.

Page Cache Impact

The modification happened in memory through the Page Cache. That means the on-disk file might remain unchanged, but programs reading the file could observe the modified cached version.

Dirty Pipe vs Dirty COW

Dirty Pipe and Dirty COW are often compared because both involve unexpected writes related to file-backed memory.

But the exploit style is very different.

Feature	Dirty COW	Dirty Pipe
CVE	CVE-2016-5195	CVE-2022-0847
Bug type	Race condition	Uninitialized/stale state logic bug
Reliability	Timing-dependent	Highly deterministic
Main mechanism	Copy-on-Write race	Stale `PIPE_BUF_FLAG_CAN_MERGE`
Kernel area	Memory management	Pipes, Page Cache, `splice()`

Dirty Pipe is a great reminder that not all dangerous vulnerabilities look like obvious memory corruption.

Sometimes the bug is just one field that was not reset.

The Upstream Fix

The fix was surprisingly small.

In the patched version, the kernel explicitly clears the flags when creating a new pipe buffer:

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index b0e0acdf96c15e..6dd5330f7a9957 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -414,6 +414,7 @@ static size_t copy_page_to_iter_pipe(struct page *page, size_t offset, size_t bytes,
         return 0;

     buf->ops = &page_cache_pipe_buf_ops;
+    buf->flags = 0;
     get_page(page);
     buf->page = page;
     buf->offset = offset;

One line.

One field.

A huge security impact.

Key Developer Takeaways

Analyzing Dirty Pipe gave me a stronger appreciation for defensive engineering in low-level systems.

1. Always Initialize Reused Structures

If a structure is reused, every stateful field should be explicitly initialized.

Relying on previous state is dangerous.

In kernel code, stale state is not just a bug. It can become a privilege escalation.

2. Flags Are Security Boundaries

A single bit can completely change how the kernel interprets memory.

PIPE_BUF_FLAG_CAN_MERGE looked like a performance optimization flag, but in the wrong context it became a security boundary bypass.

3. Subsystem Interactions Matter

The original missing initialization existed for years.

It became dangerous only after another feature introduced a new meaning for the stale field.

This is why reviewing only the changed file is not enough.

When adding new flags, modes, or state transitions, we should audit every path that creates, recycles, or reuses the structure.

4. Logic Bugs Can Be More Reliable Than Memory Corruption

Dirty Pipe was not powerful because it crashed the kernel or corrupted random memory.

It was powerful because the kernel's internal state machine became logically inconsistent.

That kind of bug can be easier to exploit and harder to detect.

5. Defensive Coding Is Not Optional in Systems Programming

In application code, forgetting to initialize a field may cause a weird UI bug or a failed request.

In kernel code, it may let an unprivileged user modify read-only file content.

That difference is why explicit initialization, careful invariants, and subsystem-level reviews are essential.

Exploit Discussion: Why I Will Not Weaponize It Here

At this point, it is tempting to drop a full copy-paste exploit and call the analysis complete.

Dirty Pipe is not just an academic bug. It is a real local privilege escalation vulnerability that can be used to modify sensitive files, abuse SUID binaries, and turn limited local execution into root-level impact on vulnerable systems.

So instead of publishing a weaponized exploit, I prefer to focus on the part that actually matters for experienced engineers: understanding the primitive, validating exposure safely, and reducing the blast radius.

The important idea is this:

Dirty Pipe gives an attacker a write primitive into the Page Cache under very specific conditions.

That is enough to explain the risk without handing someone a ready-made privilege escalation chain.

Safe Validation: How to Check Exposure Without Exploiting the Machine

The first thing I would check is the running kernel version.

uname -a
uname -r

Dirty Pipe affected Linux kernel versions starting from 5.8 and was fixed in patched kernel releases such as:

5.16.11
5.15.25
5.10.102

The exact package version depends on the distribution, because vendors often backport security fixes without changing the upstream kernel version in an obvious way.

That is why I do not rely only on uname -r in production. I also check the distribution security advisories and installed kernel changelog.

On Debian or Ubuntu-based systems:

apt list --installed | grep linux-image
apt changelog linux-image-$(uname -r)

On RHEL, Rocky, AlmaLinux, or Fedora-based systems:

rpm -q kernel
rpm -q --changelog kernel | grep -i CVE-2022-0847 -A 5

The goal here is not to exploit the host.

The goal is to answer one operational question:

Is this system running a kernel package that contains the Dirty Pipe fix?

Mitigation: The Real Fix Is a Kernel Update

There is no clever application-level patch that fully fixes Dirty Pipe.

The bug lives in the kernel.

So the primary mitigation is simple:

sudo apt update
sudo apt full-upgrade
sudo reboot

Or on RHEL-like systems:

sudo dnf update kernel
sudo reboot

After rebooting, always verify the active kernel:

uname -r

Installing a fixed kernel is not enough if the machine is still booted into the vulnerable one.

This is a common production mistake: the package is patched, the vulnerability scanner looks cleaner, but the running kernel is still old because nobody rebooted the host.

Reducing the Attack Surface

Dirty Pipe requires local code execution.

That local execution can come from many places:

an SSH account
a compromised web application
a CI/CD runner
an untrusted container workload
a shared development server
a low-privileged service user

So while patching is the real fix, reducing local execution paths is still important.

A few practical checks I usually care about:

# Users with interactive shells
cat /etc/passwd | grep -E '/bin/bash|/bin/sh|/bin/zsh'

# Users with sudo-like access
getent group sudo
getent group wheel

# Recently created users
sudo awk -F: '$3 >= 1000 { print $1, $3, $6, $7 }' /etc/passwd

If a user does not need shell access, remove it.

sudo usermod -s /usr/sbin/nologin username

If an old account should no longer authenticate, lock it.

sudo passwd -l username

None of this replaces patching.

But it reduces the number of places an attacker can start from.

Containers: Do Not Forget the Host Kernel

One of the most important operational lessons from Dirty Pipe is that containers do not bring their own kernel.

A container shares the host kernel.

So if the host kernel is vulnerable, a containerized workload may still be dangerous, especially when combined with weak isolation, excessive capabilities, or sensitive host mounts.

For production workloads, I would avoid patterns like this unless there is a very strong reason:

docker run --privileged ...

A safer baseline looks more like this:

docker run \
  --read-only \
  --cap-drop=ALL \
  --security-opt no-new-privileges \
  image-name

Also be careful with host mounts:

-v /:/host
-v /etc:/host/etc
-v /var/run/docker.sock:/var/run/docker.sock

Those mounts can turn a local container compromise into a much more serious host-level problem.

Dirty Pipe is a kernel bug, but real incidents usually happen through chains.

The kernel bug is one link.

Bad container isolation can be another.

Monitoring Sensitive Files

Dirty Pipe modifies data through the Page Cache, which makes the behavior unusual.

Still, sensitive files are the obvious places defenders should care about:

/etc/passwd
/etc/shadow
/etc/group
/etc/sudoers
/root/.ssh/authorized_keys

On Linux, auditd can help monitor write attempts and metadata changes:

sudo auditctl -w /etc/passwd -p wa -k passwd_changes
sudo auditctl -w /etc/shadow -p wa -k shadow_changes
sudo auditctl -w /etc/group -p wa -k group_changes
sudo auditctl -w /etc/sudoers -p wa -k sudoers_changes

Then search the audit logs:

sudo ausearch -k passwd_changes
sudo ausearch -k shadow_changes
sudo ausearch -k group_changes
sudo ausearch -k sudoers_changes

For file integrity monitoring, tools like AIDE can also help:

sudo apt install aide
sudo aideinit
sudo cp /var/lib/aide/aide.db.new /var/lib/aide/aide.db
sudo aide --check

This is not a perfect Dirty Pipe detector.

But it is part of a healthy defensive baseline.

My Practical Takeaway for Security Engineers

When I look at Dirty Pipe from a defender's perspective, I do not think the lesson is "learn the exploit and move on."

The lesson is broader:

patch kernels quickly
reboot after kernel updates
reduce local shell access
avoid over-privileged containers
monitor sensitive identity and privilege files
review code paths that recycle stateful structures

The exploit is interesting.

But the engineering lesson is more valuable.

A single stale flag inside a reused kernel structure broke one of the assumptions Linux users rely on every day:

read-only files should not be writable by an unprivileged process.

That is the kind of bug that reminds me why low-level systems programming requires paranoia, not just correctness.

Final Thoughts

Dirty Pipe is one of those vulnerabilities that looks almost too simple after you understand it.

A stale flag survived inside a reused pipe buffer.

That pipe buffer was later pointed at a Page Cache page.

The kernel trusted the stale flag.

And that was enough.

For me, the most important lesson is this:

Security bugs often live at the boundaries between correct subsystems.

The Page Cache was doing its job.

Pipes were doing their job.

splice() was doing its job.

But the transition between those systems carried stale state, and that stale state broke the security model.

That is why kernel engineering is so fascinating — and so unforgiving.

DEV Community

How I Analyzed the Linux Kernel's Deadliest Logic Bug: A Deep Dive into Dirty Pipe (CVE-2022-0847)

The Conceptual Backstory: Page Cache, Pipes, and `splice()`

1. The Page Cache: RAM as a Disk Mirror

2. The Pipe Buffer

3. The `splice()` Syscall: Zero-Copy Magic

Digging Into the Code: The Bug in `lib/iov_iter.c`

The Intersection of Two Commits

1. Commit `241699cd72a8` — October 2016

2. Commit `f6dd975583bd` — May 2020

Step-by-Step: How the Exploit Mechanics Worked

Stage 1: Polluting the Pipe Buffers

Stage 2: Draining the Pipe

Stage 3: Splicing File Data into the Pipe

Stage 4: Writing into the Pipe

Why Dirty Pipe Was So Dangerous

No Race Condition

No Classic Memory Corruption

High Reliability

Page Cache Impact

Dirty Pipe vs Dirty COW

The Upstream Fix

Key Developer Takeaways

1. Always Initialize Reused Structures

2. Flags Are Security Boundaries

3. Subsystem Interactions Matter

4. Logic Bugs Can Be More Reliable Than Memory Corruption

5. Defensive Coding Is Not Optional in Systems Programming

Exploit Discussion: Why I Will Not Weaponize It Here

Safe Validation: How to Check Exposure Without Exploiting the Machine

Mitigation: The Real Fix Is a Kernel Update

Reducing the Attack Surface

Containers: Do Not Forget the Host Kernel

Monitoring Sensitive Files

My Practical Takeaway for Security Engineers

Final Thoughts

References

Top comments (0)

The Conceptual Backstory: Page Cache, Pipes, and splice()

1. The Page Cache: RAM as a Disk Mirror

2. The Pipe Buffer

3. The splice() Syscall: Zero-Copy Magic

Digging Into the Code: The Bug in lib/iov_iter.c

The Intersection of Two Commits

1. Commit 241699cd72a8 — October 2016

2. Commit f6dd975583bd — May 2020

Step-by-Step: How the Exploit Mechanics Worked

Stage 1: Polluting the Pipe Buffers

Stage 2: Draining the Pipe

Stage 3: Splicing File Data into the Pipe

Stage 4: Writing into the Pipe

Why Dirty Pipe Was So Dangerous

No Race Condition

No Classic Memory Corruption

High Reliability

Page Cache Impact

Dirty Pipe vs Dirty COW

The Upstream Fix

Key Developer Takeaways

1. Always Initialize Reused Structures

2. Flags Are Security Boundaries

3. Subsystem Interactions Matter

4. Logic Bugs Can Be More Reliable Than Memory Corruption

5. Defensive Coding Is Not Optional in Systems Programming

Exploit Discussion: Why I Will Not Weaponize It Here

Safe Validation: How to Check Exposure Without Exploiting the Machine

Mitigation: The Real Fix Is a Kernel Update

Reducing the Attack Surface

Containers: Do Not Forget the Host Kernel

Monitoring Sensitive Files

My Practical Takeaway for Security Engineers

Final Thoughts

References

The Conceptual Backstory: Page Cache, Pipes, and `splice()`

3. The `splice()` Syscall: Zero-Copy Magic

Digging Into the Code: The Bug in `lib/iov_iter.c`

1. Commit `241699cd72a8` — October 2016

2. Commit `f6dd975583bd` — May 2020