DEV Community

James Lee
James Lee

Posted on

Linux Performance Tuning: CPU, Memory, I/O & Network

Linux performance optimization covers four major subsystems:

Subsystem Typical Bottleneck Scenario
CPU Compute-intensive workloads (Nginx, Node.js, math/batch processing)
Memory Database workloads (MySQL) — heavy memory and storage consumption
I/O Disk-bound applications with heavy read/write
Network High-throughput web services

Section 1: CPU Performance

CPU is the most critical subsystem — responsible for all computation. Modern production servers use multi-core CPUs based on SMP (Symmetric Multiprocessing) architecture. In practice, CPU utilization is often below 5%, meaning significant resource waste.

CPU Cache Hierarchy

# lscpu
L1d cache:   32K    ← L1 data cache (static, per-core)
L1i cache:   32K    ← L1 instruction cache (static, per-core)
L2 cache:    256K   ← dynamic, shared
L3 cache:    8192K  ← dynamic, shared across cores
Enter fullscreen mode Exit fullscreen mode
  • L1 cache: static cache, split into data and instruction caches
  • L2 / L3 cache: dynamic cache; L2 is shared between cores

CPU Affinity

In SMP systems, the Linux scheduler may run the same thread on different cores across time slices. Since each core has its own memory space (not shared), this causes cache invalidation — the thread's data must be reloaded into the new core's cache, degrading performance.

CPU affinity pins a process to a specific core, maximizing cache hit rate:

# Pin process 73890 to CPU core 0
taskset -pc 0 73890
Enter fullscreen mode Exit fullscreen mode

NUMA (Non-Uniform Memory Access)

taskset alone doesn't guarantee local memory allocation. For NUMA architectures, use numactl:

NUMA topology:
┌──────────────┐     ┌──────────────┐
│  CPU Node 0  │     │  CPU Node 1  │
│  Local RAM   │     │  Local RAM   │
│  (fast)      │     │  (fast)      │
└──────┬───────┘     └──────┬───────┘
       │  remote access (slower)  │
       └──────────────────────────┘
Enter fullscreen mode Exit fullscreen mode
# View current NUMA configuration
numactl --show

# Bind program to specific NUMA node
numactl --cpunodebind=0 --membind=0 ./myapp
Enter fullscreen mode Exit fullscreen mode

⚠️ Database servers should NOT use NUMA by default. If required, start the DB with numactl --interleave=all to avoid memory hotspots.

CPU Scheduling Policies

Real-time scheduling (priority 1–99, higher = more urgent):

Policy Behavior
SCHED_FIFO Static priority; once running, holds CPU until higher-priority task arrives or it yields
SCHED_RR Round-robin with time slices; expired slice goes to end of queue — fair among equal-priority tasks

General scheduling (priority 100–139, lower number = higher priority):

Policy Behavior
SCHED_OTHER Default; priority determined by nice + counter values. Least recently scheduled gets priority.
SCHED_BATCH For batch processing
SCHED_IDLE For very low priority background tasks
# Adjust process priority with nice (-20 to 19, lower = higher priority)
renice 5 <pid>

# Modify real-time scheduling priority
chrt -r -p 50 <pid>
Enter fullscreen mode Exit fullscreen mode

Context Switches

The Linux kernel treats each core as an independent processor. Each core can run 50–50,000 processes. Each thread gets a time slice; when it expires or is preempted, a context switch occurs.

The more context switches, the heavier the kernel scheduling overhead.

Run Queue

Each CPU has a run queue. A thread is either sleeping (blocked on I/O) or runnable (waiting for CPU time).

load = currently running threads + threads in run queue

Example: 2 cores, 2 running + 4 queued → load = 6
Enter fullscreen mode Exit fullscreen mode

CPU Performance Targets

Healthy CPU metrics:
┌─────────────────────────────────────────┐
│  us (user)    60% – 70%                 │
│  sy (system)  30% – 35%                 │
│  id (idle)     0% –  5%                 │
│  run queue    ≤ 4 per core (ideal)      │
└─────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode
# Monitor with vmstat (1-second intervals, 5 samples)
vmstat 1 5
Enter fullscreen mode Exit fullscreen mode
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs  us  sy  id  wa
 3  0 1150840 271628 260684 5530984  0    0     2     1    0    0  22   4  73   0
 5  0 1150840 270264 260684 5531032  0    0     0     0  5873 6085 13  13  73   0
Enter fullscreen mode Exit fullscreen mode

High in (interrupts) and cs (context switches) indicate the kernel is constantly switching processes and servicing hardware requests.

Bind Interrupts to a Specific CPU

# Bind interrupt type 19 to CPU core 2
echo 03 > /proc/irq/19/smp_affinity

# Bind TCP interrupts to one CPU (reduces scheduler interference)
Enter fullscreen mode Exit fullscreen mode

Section 2: Memory Performance

Linux uses Virtual Memory Management (VMM) — writes go to filesystem cache in memory first, then flush to disk lazily. This is why available memory appears low after running Linux for a while: most is consumed by cache + buffer.

Optimization goal: reduce disk writes, improve write efficiency.

Dirty Data Flush Policy

# Trigger pdflush when dirty data exceeds 10% of physical memory
echo 10 > /proc/sys/vm/dirty_background_ratio

# Flush dirty data that has been in memory longer than 2000ms
echo 2000 > /proc/sys/vm/dirty_expire_centisecs
Enter fullscreen mode Exit fullscreen mode

⚠️ Tune carefully — these settings have a large impact on I/O performance.

Swap Tuning

When physical memory is insufficient, Linux uses LRU to swap out cold pages to disk, and swap in when needed.

# 0 = prefer physical memory; 100 = aggressively use swap
echo 10 > /proc/sys/vm/swappiness   # recommended for production
Enter fullscreen mode Exit fullscreen mode

Minimize swap usage in production. For Redis, disable overcommit:

echo 0 > /proc/sys/vm/overcommit_memory

Reclaiming Memory

sync
echo 3 > /proc/sys/vm/drop_caches
# 1 = drop page cache (buffers)
# 2 = drop slab cache (cached)
# 3 = drop both
Enter fullscreen mode Exit fullscreen mode

Huge Pages

Large page sizes reduce TLB misses and page table overhead:

cat /proc/meminfo | grep -i huge
# AnonHugePages: transparent huge pages (auto-managed)
# Hugepagesize:  2048 kB (standard huge page size)

# Manually set huge page count
sysctl vm.nr_hugepages=20
Enter fullscreen mode Exit fullscreen mode

32-bit: 4MB huge pages; 64-bit: 2MB huge pages.
Larger pages = less overhead but more internal fragmentation.

Page Faults

MPF (Major Page Fault):  data not in cache → read from disk (expensive)
MnPF (Minor Page Fault): data found in buffer cache → no disk I/O (cheap)
Enter fullscreen mode Exit fullscreen mode
# First run: mostly MPF (cold cache)
/usr/bin/time -v ./myapp

# Second run: mostly MnPF (warm cache)
/usr/bin/time -v ./myapp
Enter fullscreen mode Exit fullscreen mode

The File Buffer Cache continuously grows to reduce MPF and increase MnPF, until the kernel needs to reclaim memory for other processes. Low free memory ≠ memory pressure — Linux intentionally uses free memory for caching.


Section 3: Disk I/O Performance

The I/O subsystem is typically the slowest part of a Linux system — both due to physical distance from the CPU and mechanical/electrical constraints. Minimize disk I/O wherever possible.

I/O Scheduler

cat /sys/block/sda/queue/scheduler
# noop  anticipatory  deadline  [cfq]
Enter fullscreen mode Exit fullscreen mode
Scheduler Description Best For
CFQ (default) Completely Fair Queuing; up to 8 requests per time slice; idles waiting for more I/O from same process General workloads
Deadline Every request must be served before a deadline Databases, latency-sensitive
noop No scheduling; FIFO order SSDs, VMs
anticipatory Deprecated; good for write-heavy/read-light Legacy

I/O Priority

# Set process I/O priority (1=realtime, 2=best-effort, 3=idle)
ionice -c1 -p <pid>   # highest I/O priority for pid
Enter fullscreen mode Exit fullscreen mode

Page Size & Block Size

Linux kernel accesses disk I/O in pages (typically 4KB):

/usr/bin/time -v date   # shows page size info
Enter fullscreen mode Exit fullscreen mode

Tune page size and block size based on your application's I/O pattern (large sequential vs. small random).

DMA (Direct Memory Access)

DMA allows hardware to transfer data directly to/from memory without CPU involvement:

Without DMA: disk → CPU → memory  (CPU occupied during transfer)
With DMA:    disk ──────▶ memory  (CPU free during transfer)
Enter fullscreen mode Exit fullscreen mode

DMA transfer lifecycle: Request → Acknowledge → Transfer → Complete

Writing Data Back to Disk

# Force immediate flush
fsync()   # per-file
sync()    # system-wide

# pdflush runs periodically if not explicitly called
Enter fullscreen mode Exit fullscreen mode

Useful I/O Monitoring Commands

iotop      # per-process I/O usage
lsof       # list all open files and file descriptors
iostat     # disk I/O statistics
vmstat     # combined system stats including I/O
Enter fullscreen mode Exit fullscreen mode

Section 4: Network Performance

For web applications, network performance is critical. Potential bottlenecks include: application response time, Linux network subsystem, NIC, and bandwidth.

NIC Settings

# Check if NIC is in full-duplex mode
ethtool eth0

# Increase MTU for high-bandwidth (≥1Gbps) networks
ifconfig eth0 mtu 9000 up   # jumbo frames
Enter fullscreen mode Exit fullscreen mode

TCP Buffer Tuning

# TCP read buffer: min / initial / max (bytes)
sysctl -w net.ipv4.tcp_rmem="4096 87380 8388608"

# TCP write buffer: min / initial / max (bytes)
sysctl -w net.ipv4.tcp_wmem="4096 87380 8388608"
Enter fullscreen mode Exit fullscreen mode

TCP Window Scaling

# Disable window scaling and set fixed window size
# (similar to setting JVM -Xms = -Xmx for predictability)
sysctl -w net.ipv4.tcp_window_scaling=0
Enter fullscreen mode Exit fullscreen mode

TCP Connection Reuse

Reusing TIME_WAIT connections avoids the full 3-way handshake overhead — significant performance gain for web servers:

sysctl -w net.ipv4.tcp_tw_reuse=1
sysctl -w net.ipv4.tcp_tw_recycle=1
Enter fullscreen mode Exit fullscreen mode

Keepalive Timeout

# Release idle persistent connections sooner (default is much longer)
sysctl -w net.ipv4.tcp_keepalive_time=1800   # 1800 seconds
Enter fullscreen mode Exit fullscreen mode

SYN Backlog (DoS Protection)

# Max length of queue for TCP connections not yet ESTABLISHED
# Prevents server crash under SYN flood / DoS attacks
sysctl -w net.ipv4.tcp_max_syn_backlog=4096
Enter fullscreen mode Exit fullscreen mode

Disable Unnecessary Protocols

# Disable ICMP broadcast responses to reduce noise
sysctl -w net.ipv4.icmp_echo_ignore_broadcasts=1
Enter fullscreen mode Exit fullscreen mode

Bind Network Interrupts to One CPU

# Bind NIC interrupt to CPU core 1 (reduces scheduler interference)
echo 02 > /proc/irq/<nic_irq>/smp_affinity
Enter fullscreen mode Exit fullscreen mode

Recommended Monitoring Toolset

Tool Purpose
htop Interactive process and CPU monitor
vmstat CPU, memory, swap, I/O, context switches
iotop Per-process disk I/O
sar Historical system activity reports
strace Trace system calls for a process
iftop Real-time network bandwidth by connection
ss Socket statistics (faster replacement for netstat)
lsof List open files and file descriptors
ethtool NIC diagnostics and settings
mtr Combined traceroute + ping for network diagnosis

Top comments (0)