Post-Mortem: Why My Blog Cover Images Silently Failed to Restore from S3

#python #django #aws #api

Six blog posts on my portfolio had broken cover images for longer than I'd like to admit. The images were in S3. The management command to restore them had been written and run. And yet — nothing. Blank covers, every time.

Here's the full breakdown of what went wrong, why, and how it got fixed.

The Setup

My portfolio backend is a Cookiecutter Django project running on a DigitalOcean droplet with Docker Compose. Blog post cover images are stored on AWS S3. The cover field on the BlogPost model is an ImageField that stores a relative path like blog_posts/deploy.webp — Django's S3 storage backend handles prepending the media/ prefix and building the full URL.

When I migrated away from using Hashnode as a headless CMS and imported all posts into my own Django backend, the cover images came along as CUID-based filenames (e.g. blog_posts/cmoxrumae00ms2em7bje5at07.png). Those CUIDs don't exist in S3 — the actual files were uploaded separately with descriptive names like deploy.webp, manual.webp, sortinghashnode.webp.

To fix this, Gemini wrote restore_covers.py, a management command with two matching strategies:

Exact CUID match — looks for blog_posts/{cuid}.webp etc. in S3
Fuzzy slug match — tokenizes the post slug and looks for S3 filenames with overlapping keywords

The command was run. Six posts still had broken covers.

Root Cause 1: The Fuzzy Matcher Couldn't Tokenize Concatenated Filenames

The S3 filenames are lowercase concatenated words: sortinghashnode.webp, trackingpage.webp, postmortem.webp. The fuzzy matcher works by calling get_keywords() on both the post slug and each S3 filename, then computing the set intersection.

Here's the problem. get_keywords() uses re.findall(r"[a-zA-Z0-9]+", text) to tokenize. Applied to a filename:

get_keywords("sortinghashnode.webp")
# → {"sortinghashnode"}   ← one token

And for the post slug:

get_keywords("sorting-hashnode-series-posts-how-to-display-the-latest-post-first")
# → {"sorting", "hashnode", "series", "posts", "display", "latest", "first"}

The intersection of {"sortinghashnode"} and {"sorting", "hashnode", ...} is empty. Score = 0. No match.

The same failure applied to every concatenated filename in the bucket. trackingpage.webp couldn't match a slug containing tracking and page. postmortem.webp couldn't match a slug containing mortem (since post is a stop word). None of them scored above 0.

The fix was to replace the plain set intersection with substring containment, enforcing a minimum token length of 4 characters to prevent false positives from short words:

min_token_len = 4
overlap = set()
for pk in post_keywords:
    for fk in file_keywords:
        if pk == fk or (
            len(pk) >= min_token_len
            and len(fk) >= min_token_len
            and (pk in fk or fk in pk)
        ):
            overlap.add(pk)

Now "sorting" in "sortinghashnode" → True, score += 1. "hashnode" in "sortinghashnode" → True, score += 1. The correct file gets matched.

Root Cause 2: No AWS Credentials in the Local Environment

The second reason the command silently failed: it was run via docker compose -f docker-compose.local.yml, which loads .envs/.local/.django. That file has no AWS credentials.

When storage.listdir("blog_posts") is called with no S3 credentials, it either errors out silently (caught by a bare except Exception) or returns an empty list because the local filesystem storage backend is active instead of S3. The command's output showed:

Could not list storage directory directly: ...
Fuzzy matching won't be available.

But the overall command still exited 0 with a summary that made it look like it ran fine. With zero files in storage_files, the fuzzy loop had nothing to iterate over — so every post hit the "no existing file found in storage" branch.

The management command is designed to run in the production container, where .envs/.production/.django already has the correct AWS credentials wired up.

Root Cause 3: One Image Was Never Uploaded to S3

Even with both fixes above, the post "How I Fixed the Hashnode GraphQL API Stale Cache Bug (Stellate CDN)" would still have a broken cover — because no matching file exists in S3 at all. The DB had blog_posts/cmlyqj0cc006627lvguola3gg.png and no file with a descriptive name was ever uploaded for it.

This one requires a manual upload via Django admin.

The Fix

For the five posts with known S3 matches, I wrote a Django data migration that directly sets the correct cover paths:

COVER_FIXES = {
    "how-to-manually-backup-wordpress-sites-via-ssh": "blog_posts/manual.webp",
    "deploying-cookiecutter-django-on-a-digitalocean-droplet-ubuntu-24-04-lts": "blog_posts/deploy.webp",
    "post-mortem-the-march-2026-axios-supply-chain-attack": "blog_posts/postmortem.webp",
    "sorting-hashnode-series-posts-how-to-display-the-latest-post-first": "blog_posts/sortinghashnode.webp",
    "tracking-page-views-in-a-react-spa-with-google-analytics-4": "blog_posts/trackingpage.webp",
}

def fix_covers(apps, schema_editor):
    BlogPost = apps.get_model("blogs", "BlogPost")
    for slug, cover_path in COVER_FIXES.items():
        BlogPost.objects.filter(slug=slug).update(cover=cover_path)

This runs automatically on python manage.py migrate during the next production deploy — no manual SSH step needed.

For the fuzzy matcher, the substring containment fix was patched into restore_covers.py so future runs work correctly for any similarly named files.

For the sixth post, a manual image upload to Django admin is the remaining action item.

What I'd Do Differently

The real issue is that the restore command's failure mode was too quiet. It logged "fuzzy matching won't be available" but still printed a clean summary with zeroes in the "could not restore" column for cases where the file list was empty. That made it look successful.

A better design: if storage.listdir() fails entirely, the command should exit early with a non-zero code rather than continuing with no files to match against. Silently succeeding at nothing is worse than loudly failing at something.

The slug-to-filename mismatch was also a predictable problem from the start. The files were uploaded manually with short descriptive names, but the DB records came from Hashnode with long CUIDs. A mapping file (even a simple JSON dict of slug → filename) would have made the restore command deterministic instead of relying on fuzzy heuristics.