System Design EP: 12 - Why Your Database Hates Your Images: A Guide to BLOBs

#systemdesign #blobs #aws

Ummm, BLOBS. Everyone has heard that term somewhere sometime across their development journey, but what are those?

In system design, a BLOB (Binary Large Object) is any unstructured data that doesn't fit neatly into a database row. Unlike integers or strings, the database engine doesn't "understand" this data,it just sees a massive chunk of bytes.

Examples: User avatars (JPEG), video files (MP4), PDFs, Application Logs, Backup files.

Size Range: From a few KBs (thumbnails) to TBs (genome sequencing data).

So basically storing mp4 file directly as a file is difficult, how to store it say in MySQL but it becomes easier when we store in Blobs using 0 & 1s.

When looking for the "definitive" resource on Blobs (Binary Large Objects) in System Design, most engineers end up piecing together chapters from different sources.

Since a single perfect article is hard to find, I have synthesized the industry-standard approach below—essentially the article you are looking for—followed by links to the best external deep-dives.

The Great Debate: Database vs. Object Storage

The most common interview question and real-world decision is: "Should I store user uploads in my SQL database or on disk?"

Option A: Storing BLOBs in the Database

You store the image directly in a column (e.g., VARBINARY in MySQL or BYTEA in PostgreSQL).

Pros: ACID compliance (the image and the user profile update commit together), easy backups (one dump file).
Cons:
Performance Suicide: Databases are optimized for small, random reads/writes. Reading a 10MB image consumes the same I/O as reading thousands of user rows.
Cost: Block storage (SSD for DBs) is 3-5x more expensive than Object Storage.
Scalability: You cannot easily cache database responses at the edge (CDN) compared to static URLs.

Option B: Storing BLOBs in Object Storage (The Standard)

You store the actual file in a service like Amazon S3, Google Cloud Storage, or Azure Blob, and store only the reference URL in your database.

Pros: Infinite scalability, cheaper storage tiers, built-in redundancy, easy integration with CDNs.
Cons: Loose consistency (you might delete the database row but forget to delete the S3 file, creating "orphan" data).

The Verdict: 99% of the time, use Object Storage (Option B). Only use Option A if your files are tiny (<20KB), strictly transactional, and security is paramount (e.g., encryption keys or sensitive legal docs that must never leak via a public URL).

The Architecture Pattern

The standard design pattern for handling blobs involves decoupling metadata from storage.

The Workflow:

Client Request: User uploads a profile picture.
API Gateway: Authenticates the user.
Presigned URL: Instead of uploading to your server (which blocks your server's threads), your backend generates a "Presigned URL" from S3. This authorizes the client to upload directly to the bucket for a limited time.
Direct Upload: Client uploads the binary data directly to S3/Blob Storage.
Confirmation: S3 returns a success code. The client notifies your backend: "Upload Complete."
Metadata Save: Your backend saves the file path (e.g., s3://my-bucket/users/123/avatar.png) into the SQL database users table.