DEV Community

Cover image for I Hardened a Rust Media Upload API with Magic Bytes, Atomic Quotas, and Race Condition Fixes (Part 3)
freerave
freerave

Posted on

I Hardened a Rust Media Upload API with Magic Bytes, Atomic Quotas, and Race Condition Fixes (Part 3)

How we built a production-grade Cloudflare R2 upload pipeline in Rust — with layered security, an atomic quota system, and a zero-trust file validation strategy.

In Part 2 of this series, we enforced Strict Separation and Fail-Fast validation to prevent silent scheduling failures and auto-refresh OAuth tokens in the background.

But our users still needed to attach images to their scheduled posts. And the moment you let users upload files to your server, you inherit one of the hardest problems in backend security: you can never trust what a user sends you.

Here is a deep dive into how we built the POST /v1/media/upload endpoint in Rust — with layered file validation, Cloudflare R2 storage, and an atomic quota system that closes a subtle but critical race condition.


The Problem: File Uploads Are a Security Minefield

Allowing file uploads without proper validation is one of the most common vectors for server compromise. The naive approach — checking the file extension or trusting the Content-Type header — is dangerously insufficient.

A malicious user can trivially rename exploit.php to photo.jpg and send it with Content-Type: image/jpeg. A server that trusts that header will store an executable PHP file in what it thinks is an image directory.

We needed a Zero-Trust validation pipeline. The rule: don't believe anything the client tells you. Verify everything from the raw binary.


Layer 1: Authentication & Body Size at the Router

Before a single byte of the file payload is even parsed, two guards run at the Axum router level.

The first is our existing Bearer JWT middleware — no valid token, no entry. The second is a global body size limit declared on the router itself:

// src/main.rs

use axum::extract::DefaultBodyLimit;

let app = Router::new()
    .route("/v1/media/upload", post(routes::media::upload_media))
    // ... other routes ...
    .layer(DefaultBodyLimit::max(10 * 1024 * 1024)); // 10 MB hard cap
Enter fullscreen mode Exit fullscreen mode

This DefaultBodyLimit layer rejects oversized requests at the framework level — before our handler allocates any memory for the file. It is the outermost wall.


Layer 2: Magic Bytes Validation (Zero-Trust File Identity)

Inside the handler, we apply a second, stricter file size check (5 MB) and then the core of our zero-trust strategy: magic bytes validation.

Every legitimate image file format begins with a known binary signature, called a "magic number," embedded in the first few bytes of the file. JPEG files always start with FF D8 FF. PNG files always start with 89 50 4E 47. These cannot be faked without corrupting the file.

Here is our implementation:

// src/routes/media.rs

// Allowed MIME types and their corresponding magic bytes
const ALLOWED_TYPES: &[(&str, &[&[u8]], &str)] = &[
    ("image/jpeg", &[&[0xff, 0xd8, 0xff]], "jpg"),
    ("image/png",  &[&[0x89, 0x50, 0x4e, 0x47]], "png"),
    ("image/webp", &[&[0x52, 0x49, 0x46, 0x46]], "webp"), // RIFF header
    ("image/gif",  &[&[0x47, 0x49, 0x46, 0x38]], "gif"),
];

fn validate_magic_bytes(buffer: &[u8], mime_type: &str) -> Option<&'static str> {
    for (allowed_mime, magics, ext) in ALLOWED_TYPES {
        if *allowed_mime == mime_type {
            for magic in *magics {
                if buffer.starts_with(*magic) {
                    return Some(ext); // Return the verified extension
                }
            }
        }
    }
    None // Content-Type claimed a valid MIME but binary doesn't match — reject
}
Enter fullscreen mode Exit fullscreen mode

This function does two things at once. First, it checks that the claimed Content-Type is on our whitelist. Second, it verifies that the actual binary content matches the claimed format. A renamed executable will fail this check because its binary signature will never match FF D8 FF.


Layer 3: UUID-Based Storage Keys (Path Traversal Prevention)

Even after validating the file content, we never use the client-supplied filename to construct the storage key. A filename like ../../../etc/passwd — known as a Path Traversal attack — could theoretically escape the intended storage directory.

Our solution is to discard the original filename entirely and generate a random UUID as the storage key:

// The raw client filename is intentionally discarded.
// UUID-based key — fully immune to path traversal.
let file_name = format!("dotsuite/scheduled_posts/{}.{}", Uuid::new_v4(), ext);
Enter fullscreen mode Exit fullscreen mode

The extension comes from our validate_magic_bytes function — not from the client. The full storage path is entirely server-generated. The user's filename never touches the storage layer.


Layer 4: Atomic Quota Enforcement (Closing the Race Condition)

This is where it gets subtle. Our initial quota check looked reasonable:

// ❌ The naive (broken) approach
let current = user.images_used; // Read from DB
if current >= image_quota {
    return Err(quota_exceeded_error);
}
// ... upload the file ...
db.increment_images_used(user_id).await; // Write to DB
Enter fullscreen mode Exit fullscreen mode

The flaw: there is a window between the read and the write. If two upload requests arrive simultaneously from the same user whose quota counter is at limit - 1, both will read current < limit, both will pass the check, and both will upload — incrementing the counter to limit + 1. The quota is silently bypassed.

We had already solved this exact pattern in our schedule_post route using MongoDB's find_one_and_update — an atomic operation that combines the check and the increment in a single database command. We applied the same fix here:

// src/routes/media.rs

// ── Atomic quota check + slot reservation ────────────────────────────────
// find_one_and_update eliminates the race condition: the check and the
// increment are a single atomic MongoDB operation, not two separate ones.
let quota_filter = if user.tier == Tier::Free {
    mongodb::bson::doc! {
        "_id": user_id,
        "$expr": { "$lt": ["$images_used", image_quota as i64] }
    }
} else {
    // Paid tiers have no hard cap — we still track for analytics.
    mongodb::bson::doc! { "_id": user_id }
};

let reserved = users_col
    .find_one_and_update(
        quota_filter,
        mongodb::bson::doc! { "$inc": { "images_used": 1 } },
    )
    .await?;

if reserved.is_none() {
    return Err(AppError::Forbidden(format!(
        "Image upload quota reached ({}/{} uploads). Upgrade to Basic for unlimited uploads.",
        user.images_used, image_quota
    )));
}
Enter fullscreen mode Exit fullscreen mode

The database becomes the single source of truth. No two concurrent requests can both pass the quota gate because MongoDB guarantees the atomicity of findOneAndUpdate at the document level.


Layer 5: The R2 Upload & Quota Rollback

After the quota slot is reserved, we upload to Cloudflare R2 via the AWS S3-compatible SDK. But there is one more edge case: what if the R2 upload fails after we have already incremented the quota counter? The slot was reserved but no file was stored — the user's quota is penalised for a failure that wasn't their fault.

To handle this cleanly, we perform a rollback on R2 failure:

if let Err(e) = upload_result {
    tracing::error!("Failed to upload to R2: {:?}", e);

    // Rollback: return the slot so the quota isn't wasted.
    let rollback = users_col
        .update_one(
            mongodb::bson::doc! { "_id": user_id },
            mongodb::bson::doc! { "$inc": { "images_used": -1i32 } },
        )
        .await;

    if let Err(rb_err) = rollback {
        tracing::error!(
            "Failed to rollback images_used for user {}: {}",
            user_id, rb_err
        );
    }

    return Err(AppError::Internal(anyhow::anyhow!(
        "Failed to upload media to cloud storage"
    )));
}
Enter fullscreen mode Exit fullscreen mode

The rollback is best-effort — we log a critical error if it fails, but we do not surface the rollback failure to the client.


The Complete Security Stack

Here is what runs on every single upload request, in order:

Layer Mechanism Rejects
1 Bearer JWT middleware Unauthenticated requests
2 DefaultBodyLimit (10 MB) Oversized requests before parsing
3 Handler check (5 MB) Files within the body but over the per-file limit
4 Magic bytes validation Wrong MIME type, renamed executables, corrupted files
5 Atomic find_one_and_update Quota-exceeded requests (race-condition-free)
6 UUID storage key Path traversal attacks
7 R2 upload + rollback Storage failures without wasting quota

None of these layers is sufficient alone. Together, they form a defence-in-depth pipeline where each layer assumes the previous one could have been bypassed.


Conclusion

Building a production-grade file upload endpoint is deceptively complex. The surface-level logic — receive file, save to storage — takes an hour. The hardening takes days.

The three biggest lessons from this build:

  1. Never trust Content-Type. A header is a claim, not a proof. Always read the raw binary signature of the file.

  2. A quota check and a quota increment must be one atomic operation. Two database calls — even milliseconds apart — create a race window that determined users will find and exploit.

  3. Reserve resources before the expensive operation, and roll back on failure. Incrementing the quota counter before the R2 upload, with a decrement on failure, is always safer than incrementing after a success that might never be recorded.

The POST /v1/media/upload endpoint is now a vault. In Part 4, we will build the Next.js scheduling UI that calls it.

(If you haven't read the previous deep dives, check out the full Ship on Schedule Series.)

Top comments (0)