From Research PoC to Redteam Toolkit: Hardening CVE-2026-31431 for Production Operations
Introduction
On April 29, 2026, Theori and Xint disclosed CVE-2026-31431 — a local privilege escalation vulnerability in the Linux kernel's AF_ALG crypto subsystem. Their research, published at copy.fail, demonstrated a novel page-cache mutation primitive: by abusing the authencesn AEAD template's in-place optimization combined with splice(), an attacker could overwrite cached pages of a setuid binary without ever modifying the on-disk inode.
The original proof-of-concept was written in Python — excellent for research demonstration, but impractical for real-world redteam operations where Python is rarely available on target servers and the tool's footprint must be minimal.
Tony Gies quickly produced a baseline C port using nolibc, which solved the deployment problem but remained a research tool at heart.
This article documents our work extending that foundation into a production-grade redteam toolkit — adding operational security, anti-forensics, automatic target discovery, fileless payload delivery, and cross-platform build infrastructure. We share the architectural decisions, trade-offs, and defensive takeaways from this effort.
The Gap Between Research and Operations
Why Python PoCs Don't Survive First Contact
| Research Requirement | Operational Reality |
|---|---|
| Python 3.8+ available | Servers run minimal images; no Python |
pip install dependencies |
Airgapped networks; no package manager |
| 50+ MB with libraries | Binary must be < 100 KB for covert deployment |
| Run once, observe output | Must survive for weeks with minimal interaction |
| Clean environment | EDR, SIEM, AppArmor, SELinux actively hunting |
| Manual target selection | Operator may not know which setuid binary exists |
The baseline C port solved the deployment size problem (~2 KB payload), but lacked:
- Operational control: How does an operator trigger execution remotely?
-
Stealth: How do we hide from
ps,top, and EDR process monitoring? - Cleanup: How do we remove forensic artifacts after exploitation?
- Resilience: What happens if the C2 server is down?
- Cross-platform support: Cloud targets run ARM64, not just x86_64.
Architecture Overview
Our toolkit is organized into nine modules spanning four layers:
┌─────────────────────────────────────────────────────────────┐
│ ORCHESTRATOR (exploit.c) │
│ Coordinates all modules in a 7-step pipeline: │
│ Hide → Discover → Prepare → Verify → Exploit → Cleanup → │
│ Deliver │
└─────────────────────────────────────────────────────────────┘
│
┌─────────────┬─────────┴─────────┬─────────────┐
▼ ▼ ▼ ▼
┌────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌────────┐
│ patch │ │ target │ │ anti │ │ stage1 │ │ memfd │
│ chunk │ │ discovery│ │ forensics│ │ delivery │ │ exec │
│ │ │ │ │ │ │ │ │ │
└────────┘ └──────────┘ └──────────┘ └──────────┘ └────────┘
│ │ │ │ │
└─────────────┴──────────────┴─────────────┴────────────┘
│
┌─────────────────────────┴─────────────────────────┐
▼ ▼
┌──────────────┐ ┌──────────────┐
│ proc_hide │ │ sleep_jitter │
│ signal │ │ stage2 C2 │
│ trigger │ │ implant │
└──────────────┘ └──────────────┘
Module Responsibilities
| Module | File(s) | Core Function |
|---|---|---|
| Exploit Primitive | patch_chunk.c/h |
AF_ALG/splice page cache mutation with socket reuse, parallel writes, and verification |
| Target Discovery | target_discovery.c/h |
Auto-scan and score setuid binaries; MAC-aware selection |
| Anti-Forensics | anti_forensics.c/h |
Cache dropping, timestamp restoration, self-destruction |
| Stage-1 Delivery | stage1.c/h |
Fileless payload fetch via HTTP/HTTPS/DNS/embedded |
| Stage-2 C2 | stage2_template.c/h |
Reverse shell with reconnect, jitter, signal control |
| memfd Execution | memfd_exec.c/h |
Anonymous file execution with cloaking and decryption |
| Process Hiding | proc_hide.c/h |
argv/cmdline/comm masquerading |
| Signal Control | signal_trigger.c/h |
Operator-triggered execution with zero-CPU waiting |
| Sleep Jitter | sleep_jitter.c/h |
Random delays with uniform/triangular/exponential distributions |
| Vulnerability Checker | vulnerable.c |
Non-destructive kernel susceptibility test |
Module Deep Dives
1. Hardened Exploit Primitive: patch_chunk.c
The original baseline opened a fresh AF_ALG socket for every 4-byte window. Our implementation reduces the syscall footprint by ~60% through socket reuse:
// Original: socket() + bind() + setsockopt() + accept() per chunk
// Ours: accept() per chunk; ctrl socket reused across all chunks
int ctrl = -1, op = -1;
for (off_t off = 0; off < len; off += 4) {
patch_chunk(fd, off, window, &ctrl, &op); // ctrl reused
}
Key improvements:
-
Atomic verification: After each write,
mmap()+memcmp()confirms the mutation landed. If page cache was reclaimed (rare under load), auto-retry with 1ms backoff. -
Parallel writes:
fork()distributes chunks across up to 16 CPU cores. A 50 KB payload drops from ~12 seconds to ~800ms on modern hardware. -
Granular error codes:
0= verified success,1= kernel patched (operation rejected),-1= fatal error. -
Zero heap allocations: All buffers on stack; no
malloc/freejitter for EDR to hook.
2. Automatic Target Discovery: target_discovery.c
Manually specifying /usr/bin/su fails when:
- The target uses
sudoinstead ofsu - AppArmor blocks
subut notpkexec - The binary is in
/usr/local/binor a snap package
Our scanner operates in three phases:
Phase 1: Check 18 priority targets (su, sudo, passwd, pkexec, mount, ping...)
Phase 2: Scan standard directories (/usr/bin, /bin, /usr/sbin...)
Phase 3: Deep scan (/usr/lib, /opt) if aggressive mode enabled
Each candidate receives a composite score:
score = setuid_root(1000) + setuid_user(500)
+ small_size_bonus(200 per KB under 100KB)
+ no_apparmor(300) - apparmor_enforced(-500)
+ no_selinux(200) - selinux_enforced(-400)
+ standard_path(100)
This automatically deprioritizes binaries under active MAC enforcement — reducing the chance of an exploit that "works" but immediately triggers an EDR alert.
3. Fileless Execution: memfd_exec.c
The memfd_create(2) syscall creates an anonymous file existing only in RAM. Combined with fexecve(3), this enables zero-disk execution:
int mfd = memfd_create("kworker", MFD_CLOEXEC);
write(mfd, payload, len);
lseek(mfd, 0, SEEK_SET);
fexecve(mfd, argv, envp); // Never touches filesystem
Cloaking: The memfd name appears in /proc/$pid/fd/ as memfd:kworker — indistinguishable from legitimate kernel worker threads to casual inspection.
Fork-and-forget: A double-fork sequence creates an orphan process adopted by init (PPID=1), severing the parent-child relationship visible in process trees:
pid_t child = fork();
if (child == 0) {
pid_t grandchild = fork();
if (grandchild == 0) {
setsid();
fexecve(mfd, argv, envp);
}
_exit(0); // Intermediate dies, grandchild orphaned
}
waitpid(child, NULL, 0); // Original parent exits cleanly
4. Anti-Forensics: anti_forensics.c
The page cache mutation is unique among LPE techniques: the on-disk inode is never modified. However, mutated pages in RAM are still forensic artifacts. Our cleanup sequence:
| Step | Technique | Target |
|---|---|---|
| 1 | posix_fadvise(POSIX_FADV_DONTNEED) |
Per-file page cache eviction |
| 2 | echo 3 > /proc/sys/vm/drop_caches |
Global cache drop (post-root) |
| 3 |
utimensat() timestomp |
Restore original atime/mtime |
| 4 | Self-destruct | Overwrite dropper binary with zeros |
| 5 | Memory wipe |
volatile zeroing of keys, C2 addresses |
Timestomp is critical: splice() reads the target file, which may update atime. Restoring the original timestamp prevents EDR heuristics from flagging "setuid binary accessed at unusual time."
5. Signal-Based Operator Control: signal_trigger.c
Traditional implants use polling loops (sleep(1); check_flag();), consuming CPU and standing out in EDR telemetry. We use sigsuspend() for zero-CPU waiting:
// Process state: S (sleeping, interruptible)
// CPU usage: 0.0%
// EDR sees: normal idle daemon
while (!trigger_received) {
sigsuspend(&wait_mask); // Returns only on signal
}
Operational modes:
| Mode | Behavior | Use Case |
|---|---|---|
trigger_oneshot() |
Sleep → execute → exit | Hit-and-run assessment |
trigger_daemon() |
Sleep → execute → loop | Persistent long-term implant |
trigger_auto() |
Sleep with timeout fallback | Unattended deployment |
Operator commands:
kill -USR1 $PID # Execute now
kill -USR2 $PID # Request status (no execution)
kill -TERM $PID # Graceful shutdown with cleanup
6. Sleep Jitter: sleep_jitter.c
Regular reconnect intervals (every 600 seconds exactly) trigger beaconing detection in SIEM. We implement three statistical distributions:
| Distribution | Pattern | Detection Evasion |
|---|---|---|
| Uniform | Equal probability across range | Basic jitter |
| Triangular | Cluster around mean | Mimics "normal" random traffic |
| Exponential | Mostly short, occasional long | Breaks time-based correlation |
Drift compensation maintains the average interval despite jitter — ensuring a 10-minute target doesn't drift to 5 or 20 minutes over hours of operation.
RNG backends (in order of preference): getrandom(2), /dev/urandom, rdtsc fallback. Rejection sampling eliminates modulo bias.
Build System: Cross-Platform Static Binaries
Why Static Linking Matters
Dynamic binaries fail when:
- Target lacks
libc.so.6(Alpine Linux uses musl) -
LD_LIBRARY_PATHis sanitized - EDR hooks
dlopen()orld.so
Our Makefile supports four toolchain strategies:
# Standard: glibc static (portable, ~2 MB)
make redteam
# Tiny: musl static (~50-100 KB, no glibc dependency)
make musl-static
# Modern: zig cross-compile (no toolchain installation)
make cross-zig-arm64
# Traditional: GNU cross toolchain
make cross-arm64 CROSS_COMPILE=aarch64-linux-gnu-
Supported Architectures
| Architecture | Typical Target |
|---|---|
| x86_64 | On-premise servers, workstations |
| ARM64 | AWS/Azure/GCP cloud instances |
| RISC-V | Embedded, experimental cloud |
| ARM HF | IoT devices, Raspberry Pi |
Operational Security Considerations
What We Can Hide
| Artifact | Technique | Effectiveness |
|---|---|---|
| Command line | overwrite_argv() |
High — visible in /proc/$pid/cmdline
|
| Process name | prctl(PR_SET_NAME) |
High — visible in ps, top
|
| Parent relationship | Double-fork | High — PPID=1 (init) |
| Binary on disk | Self-destruct | High — zeroed before exec |
| Page cache | fadvise(DONTNEED) |
Medium — may be reclaimed naturally |
| Network connections | DNS beaconing, jitter | Medium — reduces correlation |
What We Cannot Hide (Kernel-Enforced)
| Artifact | Why Visible | Mitigation |
|---|---|---|
/proc/$pid/exe |
Kernel-maintained symlink | Use memfd (shows as (deleted)) |
| PID number | Kernel-assigned | None without rootkit |
/proc/$pid/status |
Kernel-generated | None from userspace |
| AF_ALG socket creation | Syscall traceable | Minimize via socket reuse |
Defensive Detection Opportunities
For blue teams, this toolkit reveals several detection vectors:
- AF_ALG + splice() correlation: eBPF programs can trace this specific combination — rare in legitimate workloads.
-
memfd_create with suspicious names: While
memfd:kworkerblends in, thememfd_createsyscall itself is uncommon for non-browser processes. -
Bracketed process names in userspace: Kernel threads don't have userspace memory maps; checking
/proc/$pid/mapsreveals the masquerade. - DNS beaconing: Regular TXT queries or A-record lookups to a single domain, especially with jittered intervals.
- Page cache integrity: Kernel modules or hypervisors can verify setuid binary cache pages against on-disk hashes.
Defensive Takeaways
Immediate Mitigations
-
Patch the kernel: Upgrade to Linux >= 6.14 with commit
a664bf3d603d, or apply your distribution's backport. - Enable MAC enforcement: AppArmor and SELinux profiles on setuid binaries significantly raise the exploitation bar.
-
Monitor AF_ALG: The
authencesntemplate is rarely used legitimately; audit its usage viaauditdor eBPF. - Verify page cache: Periodic integrity checks on cached setuid pages can detect in-memory mutation.
Long-Term Architectural Changes
The root cause — treating splice'd file pages as writable crypto destinations — suggests a broader principle: input and output buffers in kernel crypto paths should never alias. Future kernel designs should enforce separate scatterlists for source and destination, even when "in-place" optimization seems safe.
Credits and Acknowledgments
This work builds directly on the research and code of others:
- Theori (Jinoh Kang, Yonghwi Jin, Seunghyun Lee) and Xint — Original vulnerability discovery, disclosure, and the Python proof-of-concept at copy.fail.
-
Tony Gies — Baseline C port (
tgies/copy-fail-c) usingnolibc, providing the foundational cross-platform syscall wrappers. -
Linux kernel developers —
memfd_create(2),fexecve(3), and thenolibcheader-only libc alternative. - musl libc and Zig projects — Toolchains enabling tiny, portable static binaries.
Our contributions are strictly the operational hardening layer: anti-forensics, stealth, automatic targeting, and build infrastructure. The core vulnerability research belongs entirely to Theori and Xint.
Repository and License
-
Repository:
https://github.com/toxy4ny/copy-fail-exploit-on-c-redteam - License: Dual LGPL-2.1-or-later / MIT
- Original PoC: theori-io/copy-fail-CVE-2026-31431
- Baseline C Port: tgies/copy-fail-c
Disclaimer
This software is provided solely for authorized security research and authorized penetration testing. The authors assume no liability for misuse. Always obtain explicit written permission before testing systems you do not own.
If you discover indicators of compromise matching this toolkit's behavior on your systems:
- Apply the kernel patch (commit
a664bf3d603dor distribution backport) - Review
/var/log/audit/and EDR telemetry forAF_ALGanomalies - Verify integrity of setuid binary page caches
Have you adapted research tools for production redteam operations? What operational challenges did you encounter? Share your experiences in the comments.
Top comments (0)