DEV Community

Cover image for Why Purging Nginx Cache Is Only Half the Job (And How I Built the Other Half)
Hasan ÇALIŞIR
Hasan ÇALIŞIR

Posted on

Why Purging Nginx Cache Is Only Half the Job (And How I Built the Other Half)

If you're self-hosting WordPress behind Nginx with caching, you've probably relied on plugins to automatically purge your cache.

📝 Post updated → 🗑 cache wiped → ✅ done.

Except it's not done. The cache is now cold. The next visitor hits your server with a full uncached PHP + DB round trip and pays the latency penalty — the exact problem caching was supposed to solve.

Most Nginx cache plugins only purge — they leave the cache cold. I wanted something that could fix that — which eventually led me to build NPP (Nginx Cache Purge Preload), a plugin that preloads your Nginx cache so visitors always hit a cached page.

But before we get to NPP, here’s the problem that almost every WordPress + Nginx setup silently suffers from.


The Problem No One Was Solving

When I set up Nginx caching on my WordPress sites, the workflow looked like this:

  1. Publish or update a post
  2. Plugin purges the relevant cache entries
  3. First real visitor triggers a full PHP + DB round trip to rebuild the cache
  4. Everyone after that gets the cached version

Step 3 is the silent performance hole. On a busy site it barely matters. On a blog, a portfolio, a WooCommerce store — that first cold response after every update is exactly the experience your visitors shouldn't be getting.

I wanted something that inverted step 3: preload the cache immediately after purging, before any visitor arrives.

That one missing piece is where the journey toward NPP started. Everything else — Redis sync, Cloudflare APO integration, WooCommerce hooks, the concurrent lock system — was built around making that loop airtight.


The Full Cache Lifecycle

After realizing that a simple purge left the cache cold, I had to map out the entire lifecycle — from a post update to the moment a visitor finally gets a cached page. Understanding this end-to-end flow was key to figuring out where things broke and where I could intervene.

Here’s what the system needed to handle — and eventually what NPP manages:

NPP Full Lifecycle — post update → purge → lock → preload → sync → cache HIT

Caption: Post updated → 3-layer purge → atomic lock → wget preload → Cloudflare + Redis sync → visitor gets a cache HIT, not a cold response.


The 3-Layer Purge Strategy

Once I realized purging alone wasn’t enough, I had to figure out how to reliably remove cache entries without breaking anything. I ended up designing a three-path system that tries the fastest method first, then falls back only when necessary.

This is true server-side cache purging — not application-level cache clearing. NPP's purge engine is sophisticated. For single-URL purges, it tries three paths in order and stops at the first success.

NPP 3-Layer Purge Decision Tree — HTTP → URL Index → Recursive Scan

Caption: Fast-Path 1: HTTP via ngx_cache_purge module (atomic, fastest). Fast-Path 2: URL→filepath index lookup (direct file delete, no scan). Fallback: recursive filesystem scan (walks entire cache dir, reads each file's cache key header).

Fast-Path 1 — HTTP Purge (optional)

If HTTP Purge is enabled and the ngx_cache_purge Nginx module is detected, NPP sends an HTTP request to the module's purge endpoint. On HTTP 200 the filesystem is never touched. On any other response NPP falls through automatically.

Fast-Path 2 — URL Index lookup

NPP maintains a persistent URL→filepath index built during Preload All. If the URL is found and the file still exists, NPP deletes it directly — no directory scan needed. The index grows incrementally: every successful single-page purge writes its resolved path back, so over time nearly all single-page purges skip the scan entirely.

Fast-Path 3 — Recursive filesystem scan

If neither fast-path succeeds, NPP walks the entire Nginx cache directory, reads each file's cache key header, and deletes the matching entry. This is the original workflow and remains the safe fallback for all environments.

Purge All is different. It always uses filesystem operations — recursively removing the entire cache directory. HTTP Purge does not apply to Purge All. If Cloudflare APO Sync or Redis Object Cache Sync is enabled, those are triggered after the filesystem purge completes.


How Preloading Actually Works

After solving purge reliability, the next challenge hit me: how to warm the cache automatically, without slowing down the site or hitting PHP limits. I needed something that could crawl all URLs and populate cache entries immediately after a purge.

I ended up building a preload engine that uses wget to request each URL and force Nginx to store it. A PID file tracks the running process, and a REST endpoint (/nppp_nginx_cache/v2/preload-progress) streams real-time progress to the WordPress dashboard — which URL is being crawled, how many 404s have occurred, server load, and elapsed time.

Why wget instead of a pure PHP crawler? That choice was critical. A PHP-based crawler would run inside a PHP-FPM worker, bound by max_execution_time and memory_limit, and it would block a worker slot for the entire crawl. wget runs as an independent OS process — outside PHP’s memory space and execution timer, and without holding a worker slot hostage. That independence also made the PID-based Preload Watchdog and the safexec privilege-drop model possible.


The Vary Header Trap (The Silent Cache Miss Problem)

Just when I thought the preload engine had solved everything, I hit a subtle trap: even with NPP preloading running, real visitors were still hitting cache misses. Why?

When PHP has zlib.output_compression = On, it adds a Vary: Accept-Encoding response header. Nginx's cache engine then performs a two-step lookup: it first resolves the main cache file via MD5(cache_key) as normal, reads the Vary header stored inside it, then computes a secondary variant hash from the actual Accept-Encoding value in the request. This variant hash becomes the filename of a completely separate cache file — not an appendage to the existing key. Result: one independent cache file per encoding variant.

  • NPP's preloader sends Accept-Encoding: identity → cache file with hash abc123
  • Real browser sends Accept-Encoding: gzip → different hash def456

CACHE MISS The cache is never served to real visitors.

Vary Header Trap — Before & After Fix

Caption: Left (broken): NPP preloads with Accept-Encoding: identity, browser uses gzip — two different variant hashes, two separate cache files, preloaded entry never matched. Right (fixed): fastcgi_ignore_headers Vary strips the variant — one cache file, always matched.

The Fix — Two Required Changes

Step 1 — Disable PHP-level compression:

; php.ini
zlib.output_compression = Off
Enter fullscreen mode Exit fullscreen mode

Step 2 — Strip Accept-Encoding before it reaches PHP, and ignore Vary during cache key resolution:

# Inside your Nginx PHP fastcgi location block
fastcgi_param         HTTP_ACCEPT_ENCODING  "";
fastcgi_ignore_headers Vary;
Enter fullscreen mode Exit fullscreen mode

Step 3 — Let Nginx handle all compression at the http {} level:

# nginx.conf http block
gzip on;
gzip_vary on;
gzip_types text/plain text/css application/json application/javascript;
Enter fullscreen mode Exit fullscreen mode

gzip_vary on adds Vary: Accept-Encoding to served responses — but this fires after the cache lookup, not before. The cache key is already resolved by then. It does not affect cache file creation.

Why fastcgi_ignore_headers Vary is safe here: Without it, Nginx would risk serving gzip content to clients that can't decompress it. But since you've disabled PHP compression AND Nginx now handles gzip via gzip_types, every response is already in the correct encoding. Suppressing the header variant has no downside.

This applies equally to all Nginx cache types — use proxy_ignore_headers Vary or uwsgi_ignore_headers Vary accordingly.


The Concurrent Purge Lock

Just when I thought purge and preload were working smoothly, a new problem hit❗On a WordPress site with multiple admins, automated deploys, WP-Cron jobs, and REST API triggers all potentially firing at once, purge operations can collide. Two simultaneous purge operations walking the same cache directory can leave it in a partially-deleted state, corrupt the index, or cause the preload that follows to warm stale entries.

I needed a way to make purge operations atomic. NPP solves this with a purge lock built on WP_Upgrader::create_lock(). This is an atomic INSERT IGNORE into wp_options — the database engine guarantees exactly one winner when two processes race simultaneously.

The lock is scoped by operation type with context-aware TTLs:

Context Operation TTL Why
single Single-page purge 180s Walks entire cache dir file-by-file — slow on large caches or NAS
all Purge All 60s Kernel handles recursion — fast even on huge caches
premium Advanced tab purge 60s Deletes a single pre-located file — pure crash-safety margin

The TTL is a crash-recovery value, not an operation timeout. Under normal conditions, the lock is always released immediately via finally. The TTL only matters if a PHP process crashes mid-purge and orphans the lock.

The preload engine also calls nppp_is_purge_lock_held() before starting — it aborts early rather than warming a cache directory that's actively being deleted.


The Preload Watchdog

Thought I was done? Ha! Enter the next curveball. Post-preload tasks — building the URL→filepath index, sending the completion email, triggering the mobile preload — are normally handled by WP-Cron. WP-Cron depends on visitor traffic to fire.

On a fully-cached site, no visitor may hit the server after preloading finishes (Nginx serves everything, PHP never runs). This means post-preload tasks can be delayed indefinitely, or never run at all.

The solution? The Preload Watchdog solves this. It's a background process that starts with each preload cycle, watches the PID file, and fires post-preload tasks the exact moment the wget process exits — no visitor required. If a Purge All cancels the preload mid-run, the watchdog is also stopped so it doesn't trigger tasks for a cancelled cycle.


Redis Object Cache Sync: Bidirectional, Without Infinite Loops

Once purge and preload were solid, the next challenge was keeping multiple caches in sync. If you're running Redis Object Cache alongside Nginx cache, you have two independent caches that can get out of sync. NPP handles both directions.

Redis ↔ Nginx Bidirectional Sync with Loop Prevention

Caption: Direction 1: NPP Purge All → wp_cache_flush() (clears Redis so rebuilds use fresh DB data). Direction 2: redis_object_cache_flush → nppp_purge_callback() (Redis flush triggers full Nginx purge). Loop prevention: NPPP_REDIS_FLUSH_ORIGIN global flag is set before the cascade and checked at each direction's entry point.

NPP Purge → Redis Flush: After every successful Purge All, NPP calls wp_cache_flush(). This ensures PHP regenerates fresh data from the database when rebuilding cache entries during preload.

Redis Flush → NPP Purge: When the Redis Object Cache drop-in fires redis_object_cache_flush (dashboard flush, WP-CLI wp cache flush, or any plugin calling wp_cache_flush()), NPP automatically purges all Nginx cache entries.

Loop prevention: Direction 1 triggers Direction 2, which would trigger Direction 1 again, forever. NPP breaks the cycle with a $GLOBALS['NPPP_REDIS_FLUSH_ORIGIN'] flag set before the cascade and checked at both entry points. There's also a guard that auto-disables the Redis sync toggle if Redis goes away at runtime, keeping the UI consistent without manual intervention.


Cloudflare APO Sync

If you're using Cloudflare APO (Automatic Platform Optimization) for WordPress, your edge cache runs independently of your Nginx origin cache. By default, purging Nginx does nothing to Cloudflare's cached copies.

NPP's Cloudflare APO integration mirrors every purge action to the Cloudflare layer automatically, using the same hooks that trigger Nginx cache purge. IDN (Internationalized Domain Names) are normalized to ASCII before comparison, so sites on non-Latin TLDs work correctly.


WooCommerce: The Stock Change Problem

WooCommerce stock updates are a special case. When an order is placed and stock quantity drops, WooCommerce writes directly to the database without going through wp_update_post(). This means transition_post_status — what most cache plugins listen to — never fires.

NPP hooks into WooCommerce's own stock events instead:

  • woocommerce_product_set_stock / woocommerce_variation_set_stock — quantity changes
  • woocommerce_product_set_stock_status / woocommerce_variation_set_stock_status — instock ↔ outofstock ↔ onbackorder
  • woocommerce_order_status_cancelled — WooCommerce restores stock on cancellation, affected product pages need a refresh

For variations, the purge resolves to the parent product ID (the public-facing URL). There's also deduplication logic that prevents double-purging during a manual product save where both save_post and stock hooks fire in the same request chain.


Percent-Encoded URL Cache Misses (Non-ASCII Sites)

Another subtle trap popped up with non-ASCII URLs. Nginx cache is case-sensitive. For URLs with non-ASCII characters like /product/水滴轮锻碳/, the percent-encoding can be uppercase (%E6%B0%B4) or lowercase (%e6%b0%b4) depending on the client or proxy. Nginx sees these as different cache keys — preloaded with one case, visitor arrives with the other → CACHE MISS.

NPP solved this with an optional libnpp_norm.so library (loaded via LD_PRELOAD) that normalizes percent-encoded HTTP request lines during preloading to ensure consistent cache keys. This pairs with safexec (covered below).


Permission Architecture

Then came a Linux classic: file permissions. Just when I thought things were under control, Linux decided to remind me who’s boss. In many Linux setups, WEBSERVER-USER (nginx / www-data) creates cache files and PHP-FPM-USER runs WordPress. These are different users with different filesystem permissions. PHP-FPM can't write to cache files owned by nginx.

Permission Architecture — WEBSERVER-USER vs PHP-FPM-USER with bindfs FUSE mount

Caption: Nginx (www-data) creates cache files in /dev/shm/fastcgi-cache/. PHP-FPM (psauxit) can't write there. bindfs creates a FUSE mount at /dev/shm/fastcgi-cache-mnt/ where PHP-FPM has full access. NPP points to the mount path, not the original. Docker setups use a shared volume instead.

To tame this mess, I shipped install.sh — a bash script that automatically detects PHP-FPM-USER and Nginx cache paths, creates the bindfs FUSE mount, and registers a npp-wordpress systemd service to keep the mount persistent across reboots.

# One-liner setup (monolithic server)
sudo bash -c "$(curl -Ss https://psaux-it.github.io/install.sh)"
Enter fullscreen mode Exit fullscreen mode

install.sh is for monolithic servers only — where Nginx, PHP-FPM, and WordPress run on the same host. For Docker-based setups, use the dedicated Docker Compose environment linked below.


Security: safexec + libnpp_norm.so

By this point, NPP could purge, preload, handle Vary headers, dodge permission traps, percent encoded URLs, race conditions… basically everything I’d dreamed of. But then came the classic “oh no” moment: shell_exec and proc_open running wget during preload were an open invitation for chaos.

Enter CVE-2025-6213 — a real eye-opener. Suddenly, all the unsanitized shell_exec calls in WordPress cache plugins weren’t just theoretical hazards anymore. Arbitrary command execution? Yep, that was a thing.

So after lots of late nights, a few cups of questionable coffee, and some frantic Googling, I solved it properly. And thus, safexec was born — a hardened little C binary sitting between PHP and the shell, like a tiny, ruthless bouncer for your preload process.

safexec

NPP ships safexec — a hardened C binary installed with SUID permissions that sits between PHP and the shell. It enforces strict controls over which commands can execute, drops privileges before exec, and keeps the preload process fully isolated from the WordPress/PHP-FPM context. Combined with libnpp_norm.so, it also handles percent-encoded URL normalization (described above).

What safexec enforces:

  • Strict allowlist — only wget, curl, and a small set of known-safe binaries can run
  • Absolute path pinning — tool resolved to a trusted system dir, argv rewritten before exec
  • Privilege drop — drops to nobody; falls back to PHP-FPM user; aborts if still euid==0
  • Environment wipeclearenv() + trusted PATH only + umask(077) + PR_SET_DUMPABLE(0)
  • Process isolation — own cgroup v2 subtree (nppp.<pid>) on Linux; rlimits fallback
  • PR_SET_NO_NEW_PRIVS(1) — child can never regain privileges after exec
# Install safexec (one-liner)
curl -fsSL https://psaux-it.github.io/install-safexec.sh | sudo sh

# Or via package (Debian/Ubuntu amd64)
wget https://github.com/psaux-it/nginx-fastcgi-cache-purge-and-preload/releases/download/v2.1.5/safexec_1.9.5-1_amd64.deb
sudo apt install ./safexec_1.9.5-1_amd64.deb
Enter fullscreen mode Exit fullscreen mode

Packages are available for Debian/Ubuntu (.deb), RHEL/Fedora/Rocky (.rpm), and Alpine (.apk) — grab them from the Releases page with SHA256 checksums included.

safexec is optional — NPP falls back to running as the PHP-FPM user if not installed — but strongly recommended for all production environments.


Bootstrap Architecture: Zero Cost on 99% of Requests

PHP bloat as an attack surface — including shell_exec/proc_open handlers and WP_Filesystem recursive operations — would be a performance and security liability. NPP must stay completely dormant on unauthenticated requests.

// Entry point gate (simplified from the actual source)
add_action('init', function() {
    if (!is_admin()) return;            // not an admin page → dormant
    if (!is_user_logged_in()) return;   // not logged in → dormant

    if (current_user_can('manage_options')) {
        nppp_load_bootstrap();          // full UI access
        return;
    }
    // Non-admin with custom purge capability:
    // load bootstrap only when auto-purge is active
    if (current_user_can('nppp_purge_cache')) {
        nppp_load_bootstrap();          // auto-purge hook only, no settings UI
    }
});
Enter fullscreen mode Exit fullscreen mode

REST API endpoints and WP-Cron events follow the same principle: narrow execution gates, minimal footprint, fully isolated processes. The result is a plugin that’s nearly invisible.


Resources

📦 WordPress.org: https://wordpress.org/plugins/fastcgi-cache-purge-and-preload-nginx/

🐙 GitHub: https://github.com/psaux-it/nginx-fastcgi-cache-purge-and-preload

🛡️ safexec: https://github.com/psaux-it/nginx-fastcgi-cache-purge-and-preload/tree/main/safexec

🐳 Docker: https://github.com/psaux-it/wordpress-nginx-cache-docker


The journey doesn’t stop here — I’m happy to dive into setup quirks, hidden corners of NPP that made this project both tricky and fun. There’s a lot under the hood, and for anyone curious, I’m eager to walk through the details.

Top comments (0)