Ummm, BLOBS. Everyone has heard that term somewhere sometime across their development journey, but what are those?
In system design, a BLOB (Binary Large Object) is any unstructured data that doesn't fit neatly into a database row. Unlike integers or strings, the database engine doesn't "understand" this data,it just sees a massive chunk of bytes.
Examples: User avatars (JPEG), video files (MP4), PDFs, Application Logs, Backup files.
Size Range: From a few KBs (thumbnails) to TBs (genome sequencing data).
So basically storing mp4 file directly as a file is difficult, how to store it say in MySQL but it becomes easier when we store in Blobs using 0 & 1s.
When looking for the "definitive" resource on Blobs (Binary Large Objects) in System Design, most engineers end up piecing together chapters from different sources.
Since a single perfect article is hard to find, I have synthesized the industry-standard approach below—essentially the article you are looking for—followed by links to the best external deep-dives.
The Great Debate: Database vs. Object Storage
The most common interview question and real-world decision is: "Should I store user uploads in my SQL database or on disk?"
Option A: Storing BLOBs in the Database
You store the image directly in a column (e.g., VARBINARY in MySQL or BYTEA in PostgreSQL).
- Pros: ACID compliance (the image and the user profile update commit together), easy backups (one dump file).
- Cons:
- Performance Suicide: Databases are optimized for small, random reads/writes. Reading a 10MB image consumes the same I/O as reading thousands of user rows.
- Cost: Block storage (SSD for DBs) is 3-5x more expensive than Object Storage.
- Scalability: You cannot easily cache database responses at the edge (CDN) compared to static URLs.
Option B: Storing BLOBs in Object Storage (The Standard)
You store the actual file in a service like Amazon S3, Google Cloud Storage, or Azure Blob, and store only the reference URL in your database.
- Pros: Infinite scalability, cheaper storage tiers, built-in redundancy, easy integration with CDNs.
- Cons: Loose consistency (you might delete the database row but forget to delete the S3 file, creating "orphan" data).
The Verdict: 99% of the time, use Object Storage (Option B). Only use Option A if your files are tiny (<20KB), strictly transactional, and security is paramount (e.g., encryption keys or sensitive legal docs that must never leak via a public URL).
The Architecture Pattern
The standard design pattern for handling blobs involves decoupling metadata from storage.
The Workflow:
- Client Request: User uploads a profile picture.
- API Gateway: Authenticates the user.
- Presigned URL: Instead of uploading to your server (which blocks your server's threads), your backend generates a "Presigned URL" from S3. This authorizes the client to upload directly to the bucket for a limited time.
- Direct Upload: Client uploads the binary data directly to S3/Blob Storage.
- Confirmation: S3 returns a success code. The client notifies your backend: "Upload Complete."
-
Metadata Save: Your backend saves the file path (e.g.,
s3://my-bucket/users/123/avatar.png) into the SQL databaseuserstable.
Top comments (0)