This article was originally published on Jo4 Blog.
You deleted the URL. Redis says it's gone. The database confirms it. But users click the link and still get redirected to the destination. cf-cache-status: HIT. Cloudflare is happily serving a cached copy that nobody told it to forget.
The Problem
We run a URL shortener behind Cloudflare CDN. For performance, we cache redirect responses at the edge with a 2-hour TTL. This means popular short URLs resolve in under 50ms globally without touching our origin.
The issue surfaced when a customer deleted a short URL and then clicked it to verify. Still working. They tried again 30 minutes later. Still working. They opened a support ticket.
Same story with URL updates. A user changed the destination from https://old-site.com to https://new-site.com. The short URL kept redirecting to the old destination. OG metadata updates had the same problem — social cards showed stale titles and images because the HTML page was cached at the edge.
Three distinct mutations, all broken:
- Delete URL — link stays alive via CDN cache
- Update URL — old destination served from CDN, old metadata in Redis
- Admin force-expire — neither Redis nor CDN gets invalidated
The Root Cause: Layered Caching, Partial Invalidation
Our caching architecture has two layers:
User Request → Cloudflare CDN (edge cache) → Spring Boot API → Redis (app cache) → PostgreSQL
When a URL is deleted, the service correctly invalidated the Redis cache. But it never told Cloudflare. The CDN layer continued serving stale responses until the TTL expired naturally.
The update path was worse. UrlService.updateUrl() wrote the new destination to the database but invalidated neither Redis nor Cloudflare. Reads hit Redis first, got the old cached value, and never saw the database update.
Admin operations were the worst. AdminService.forceExpireUrl() and AdminService.deleteUrl() updated the database directly and skipped both cache layers entirely. Admin code had been written as direct repository calls, bypassing the service-layer cache invalidation that regular user operations went through.
The Fix: Purge Both Layers on Every Mutation
Step 1: Add purgeUrls() to CloudflareService
Cloudflare exposes POST /zones/{zone_id}/purge_cache with a {"files": [...]} body. We wrapped it in a service method:
@Async
public void purgeUrls(List<String> urls) {
if (!isEnabled() || urls == null || urls.isEmpty()) {
return;
}
// Cloudflare API allows max 30 URLs per purge request
List<List<String>> batches = partition(urls, 30);
for (List<String> batch : batches) {
purgeUrlBatch(batch);
}
}
private void purgeUrlBatch(List<String> urls) {
String endpoint = CLOUDFLARE_API_BASE + "/zones/" + zoneId + "/purge_cache";
try {
HttpHeaders headers = createHeaders();
Map<String, Object> body = Map.of("files", urls);
HttpEntity<Map<String, Object>> request = new HttpEntity<>(body, headers);
ResponseEntity<CloudflareResponse> response = restTemplate.exchange(
endpoint, HttpMethod.POST, request, CloudflareResponse.class
);
if (response.getBody() != null && response.getBody().isSuccess()) {
log.info("Purged {} URL(s) from Cloudflare cache", urls.size());
} else {
log.warn("Cloudflare cache purge returned non-success for URLs: {}", urls);
}
} catch (Exception e) {
log.warn("Failed to purge Cloudflare cache: {}", e.getMessage());
}
}
Two design decisions here:
@Async (fire-and-forget). CDN purge should never block the user operation. If Cloudflare is slow or down, the delete/update still completes instantly. The cache will expire naturally via TTL as a fallback.
Batched in groups of 30. Cloudflare's API limits purge requests to 30 URLs per call. A single short URL can produce up to 3 cacheable URLs (UI page, API endpoint, custom domain), so this limit matters for bulk operations.
Step 2: Build the List of Cacheable URLs
Each short URL can be cached under multiple paths. We need to purge all of them:
private void addCacheableUrls(List<String> urls, String shortUrl, String customDomain) {
// UI page (HTML with OG tags, served via Cloudflare CDN)
urls.add(uiHost + "/a/" + shortUrl);
// API endpoint (JSON, also cached by Cloudflare)
urls.add(apiHost + "/api/v1/public/a/" + shortUrl);
// Custom domain URL (if configured)
if (StringUtils.isNotBlank(customDomain)) {
urls.add("https://" + customDomain + "/a/" + shortUrl);
}
}
For updates where the short URL or custom domain itself changed, we purge both old and new URLs:
private void purgeCloudflareCache(String shortUrl, String customDomain,
String oldShortUrl, String oldCustomDomain) {
List<String> urlsToPurge = new ArrayList<>();
addCacheableUrls(urlsToPurge, shortUrl, customDomain);
if (oldShortUrl != null && !oldShortUrl.equals(shortUrl)) {
addCacheableUrls(urlsToPurge, oldShortUrl, oldCustomDomain);
} else if (oldCustomDomain != null && !oldCustomDomain.equals(customDomain)) {
urlsToPurge.add("https://" + oldCustomDomain + "/a/" + shortUrl);
}
if (!urlsToPurge.isEmpty()) {
cloudflareService.purgeUrls(urlsToPurge);
}
}
Step 3: Wire Into Every Mutation Path
This is where the original bug lived. We had to audit every code path that mutates URL state:
UrlService (user-facing operations):
-
updateUrl()— added Redis invalidation + Cloudflare purge -
deleteUrl()— already had Redis invalidation, added Cloudflare purge
AdminService (admin operations):
-
forceExpireUrl()— added both Redis + Cloudflare invalidation -
deleteUrl()— added both Redis + Cloudflare invalidation -
refreshMetadata()— added Cloudflare purge (OG tags changed)
The admin fix required a dedicated helper since admin code was calling repositories directly:
private void invalidateUrlCaches(UrlEntity url) {
urlCacheService.invalidateCache(url.getShortUrl());
List<String> urlsToPurge = new ArrayList<>();
urlsToPurge.add(uiHost + "/a/" + url.getShortUrl());
urlsToPurge.add(apiHost + "/api/v1/public/a/" + url.getShortUrl());
if (StringUtils.isNotBlank(url.getCustomDomain())) {
urlsToPurge.add("https://" + url.getCustomDomain() + "/a/" + url.getShortUrl());
}
cloudflareService.purgeUrls(urlsToPurge);
}
One method. Both cache layers. Called from every admin mutation.
Why @Async Is the Right Call
CDN purge is a network call to Cloudflare's API. It adds 100-300ms of latency. If we made it synchronous:
- User deletes a URL — waits an extra 200ms for Cloudflare confirmation
- Cloudflare API is down — user's delete fails or hangs
- Bulk operations — each URL adds another round-trip
With @Async, the user operation completes immediately. The purge runs in the background thread pool. If it fails, the cache expires naturally via TTL (2 hours max). The user never notices.
The tradeoff: there's a brief window (milliseconds to seconds) where the CDN might still serve stale content after an update. For a URL shortener, this is acceptable. For something like financial data, you'd want synchronous purge with error handling.
Lessons Learned
1. Cache invalidation has layers. If your architecture has CDN → Redis → Database, you need to invalidate from the outside in. Clearing Redis doesn't help if Cloudflare is still serving cached responses. Most requests never reach your app server when the CDN has a hit.
2. Admin code is a blind spot. Admin operations often bypass service-layer abstractions. They call repositories directly for flexibility, but that means they skip whatever cache invalidation the service layer provides. Audit every mutation path, not just the user-facing ones.
3. Fire-and-forget is correct for CDN purge. Don't block user operations on external API calls. Use @Async, log failures, and rely on TTL expiration as your safety net. The worst case is stale content for a bounded time window.
4. Enumerate all cacheable URLs. A single logical resource can exist at multiple CDN URLs. Miss one and you have a partial purge. Our short URLs have three: the UI page, the API endpoint, and the custom domain variant. All three need purging.
Ever been bitten by a stale CDN cache hiding a "deleted" resource? What's your cache invalidation strategy?
Building jo4.io - URL shortener with analytics, custom domains, and team workspaces.
Top comments (0)