DEV Community: Krishna Kanth Latya

System Design: How a Simple App Grows from 10 Users to 10 Million Users

Krishna Kanth Latya — Sat, 20 Jun 2026 21:01:57 +0000

Building an application is easy. Building one that survives success is the real challenge.

Most applications start with a minimalist architecture: a single server, a database, and a handful of users. Everything feels fast, reliable, and inexpensive.

Then growth happens. A marketing campaign goes viral, or a popular influencer mentions your product. Suddenly, thousands of users arrive simultaneously. What worked perfectly yesterday starts failing today.

So, how do companies scale seamlessly from a few users to millions? Let’s walk through that architectural journey step-by-step.

The Evolutionary Journey of Scaling :

Stage 1: The Startup Phase :

Imagine you launch a new platform called RecipeShare, where users upload and share cooking recipes. At launch, your architecture is as simple as it gets:

The application server handles everything: authentication, recipe uploads, search, notifications, and database queries.

Why this works: It is cheap, easy to deploy, simple to debug, and requires minimal operational overhead. For your first few hundred users, this monolithic setup is more than enough.

Stage 2: Database Separation

As traffic grows, the single machine begins to struggle. The application logic and the database start competing for the same hardware resources: CPU, memory, disk I/O, and network bandwidth.
The first major optimization is splitting them onto dedicated hardware:

The Benefit: Independent scaling, optimized server configurations, and improved reliability. Application traffic spikes no longer directly starve the database of resources.

Stage 3: Horizontal Scaling (Multiple App Servers)

Your platform gains traction. Thousands of users are now uploading recipes simultaneously, and that single application server becomes a glaring bottleneck. Instead of buying a bigger server (Vertical Scaling), we scale horizontally by adding more servers and introducing a Load Balancer:

The Benefit: High throughput and high availability. If one application server crashes, the load balancer automatically reroutes traffic to the healthy ones, eliminating a single point of failure.

Stage 4: Caching Popular Data

In most applications - especially a recipe platform - users read data far more frequently than they write it. Popular recipes, user profiles, and trending categories are requested repeatedly, making constant database queries incredibly expensive.
We introduce an in-memory cache layer (like Redis or Memcached):

The Benefit: Sub-millisecond response times and drastically reduced database load. A well-implemented cache layer can frequently deflect over 80% of read traffic away from your database.

Stage 5: Decoupling Media Content (Object Storage)

Users love uploading high-resolution food photos and cooking videos. Storing these massive binary blobs directly inside a relational database is inefficient and tanks database performance.

Instead, we offload media to Object Storage (like AWS S3) and store only the lightweight metadata in the database:

The Benefit: Smaller database backups, cheaper storage costs, and faster data processing.

Stage 6: Edge Computing via CDNs

As RecipeShare expands internationally, users in India, Europe, and North America begin accessing the platform. Because your main servers reside in one location, international users suffer from high latency, slow image loads, and buffering videos.

To fix this, we introduce a Content Delivery Network (CDN):

The Benefit: Static assets (images, videos, CSS) are cached globally at edge locations close to the user, resulting in blazing-fast load times and dramatically reduced origin bandwidth costs.

Stage 7: Transitioning to a Stateless Architecture

Initially, user sessions (like login states) might be stored in the memory of individual application servers. This creates “sticky session” dependencies; if User A’s next request hits a different server, they are suddenly logged out.

To scale horizontally without friction, we move session data to a shared, centralized store:

The Benefit: The application layer becomes entirely stateless. Any server can handle any request from any user, making autoscaling and zero-uptime deployments trivia

The “Sticky” Problem:
The Waiter Example: Imagine you order a meal from Waiter 1, who writes it down in his own personal notepad. A few minutes later, you want to change your order, but Waiter 1 is busy, so Waiter 2 steps up. Because the data is trapped in Waiter 1’s pocket, Waiter 2 has no idea who you are or what you ordered. You are stuck waiting for Waiter 1.
The Stateless Solution :
The Waiter Example: To fix this, the kitchen installs a giant digital whiteboard in the center of the room. Now, when Waiter 1 takes your order, he instantly writes it on the shared whiteboard. If you need a modification later, Waiter 2 can step up, look at the board, instantly see your order history, and handle the change seamlessly.
Why it Matters :
The Takeaway: Because the waiters keep no secrets in their own pockets, they are completely stateless. Any waiter can serve any customer at any given second.
_

Stage 8: Database Replication (Read Replicas)

Even with caching, your primary database is starting to sweat from sheer volume. Since the vast majority of database traffic is still reads, we can clone the database using a Master-Slave topology:

The Benefit: Massive read scalability. Write operations go strictly to the primary database, which asynchronously replicates data to the read replicas that handle all user browsing.

Stage 9: Asynchronous Processing (Message Queues)

As the app matures, new background tasks are introduced: sending welcome emails, transcoding raw videos into multiple resolutions, and updating search indexes. Forcing a user to wait for these tasks to finish before sending an HTTP response causes severe lag.

We introduce a Message Queue (like RabbitMQ or Kafka) to handle things asynchronously:

The Benefit: The user receives an instant success response, while heavy, time-consuming tasks are processed reliably in the background by worker nodes.

Stage 10: Database Sharding

You’ve hit millions of users. Even with read replicas, the sheer volume of data writes is overwhelming a single primary database. It’s time to partition the data horizontally across multiple databases using Sharding:

The Benefit: Infinite horizontal data scalability. Each database shard only handles a fraction of the global dataset, transforming your platform into a true distributed system.

📚 The Library Example
Imagine you run a library and keep a list of every single book in a single notebook.
As your collection grows to millions of books, the notebook becomes so thick that it takes minutes just to turn a page. Even worse, if ten readers want to look up a book at the same time, they all have to fight over that one notebook. And if someone spills coffee on it, your entire library catalog is destroyed.
In system design, this is a monolithic database bottleneck. A single database instance cannot handle a massive volume of simultaneous reads and writes.
📖 The Sharded Solution
To fix this, you rip the pages out of that giant notebook and split them across three completely separate, smaller notebooks placed at different desks:

Notebook A (Shard A): Stores only books starting with A to I.

Notebook B (Shard B): Stores only books starting with J to R.

Notebook C (Shard C): Stores only books starting with S to Z.

When a reader looks for a "Lasagna Recipe", they bypass the others and go straight to Notebook B. Desks A and C remain completely quiet and free of traffic. ### Why this is a game-changer for your app The Takeaway: The books are sorted by their first letter — this sorting rule is your Shard Key. Because these notebooks are completely independent, you can add infinitely more desks as your collection grows. Best of all, if Desk A catches fire, readers using Notebooks B and C can continue finding their books completely uninterrupted.

Stage 11: Multi-Region Deployment

What happens if an entire cloud data center suffers a major blackout? A localized infrastructure outage could take your global platform completely offline.

The final frontier of scaling is deploying your architecture across multiple geographic regions:

The Benefit: High disaster recovery capability, near-perfect uptime, and localized compliance/performance for global users.

🌍 The “Cross-Ocean Flight” Problem
📚 The Library Example
Your sharded three-notebook library system in New York is a massive success.
But now, you have millions of readers living in London.
Every time a reader in London wants to look up a recipe, they must send a letter across the Atlantic Ocean to New York and wait for a response. The process is incredibly slow. Worse, if a massive storm knocks out power to the New York library, the entire global system goes dark.

In system design, this is cross-continent latency and a single-region point of failure.

🏛️ The Multi-Region Solution
📚 The Library Example
To solve this problem, you open an identical twin library in London (Region: EU-CENTRAL).
You give the London library its own matching set of three sharded notebooks (A, B, and C) so British readers can look up recipes locally in milliseconds.
To keep both libraries identical, you hire an assistant whose only job is to continuously send copies of new book entries across the ocean (Cross-Region Replication), ensuring the New York and London notebooks stay perfectly synchronized.
💡 The Takeaway
At the front door, a receptionist (Geo-DNS) checks your ID and directs you to the closest library building.

Because you now have two identical, self-sustaining libraries, if the New York building completely floods, the receptionist simply routes every reader to the London library.

Your global business never closes for a single second.

Stage 12: Observability (Monitor Everything)

Junior engineers focus entirely on scaling components; seasoned engineers focus on visibility. You cannot optimize what you do not measure. A massive distributed system needs robust monitoring for:

CPU and Memory utilization
Application error rates
P99 Request latency
Database query execution speeds
Message queue depths

A complex distributed system without monitoring is like driving a high-performance sports car in the dark without a dashboard.

Stage 13: The API Gateway & Reverse Proxy Layer

As your multi-region backend grows more complex, exposing individual service internal URLs directly to client applications (Web, Mobile) creates tightly coupled security vulnerabilities and client-side configuration nightmares.

We introduce a dedicated API Gateway (e.g., Kong, AWS API Gateway, Envoy) as the single entry point for all client traffic within a region.

The Analogy: Instead of having library visitors wander directly into the back rooms to find specific managers for billing, book requests, or complaints, you place a highly trained Concierge Desk at the front lobby. The concierge takes your request, verifies your library card, and routes you to the exact desk you need. -The Technical Shift: The Load Balancer feeds directly into the API Gateway. The Gateway handles cross-cutting concerns like:
Centralized Authentication & JWT Verification: Validating users before they hit down-stream application layers.
Rate Limiting / Throttling: Preventing malicious API clients or scripts from overwhelming individual application nodes.
Request Routing & Path Rewriting: Translating a clean external route like /v1/recipes to internal service endpoints.

Stage 14: Microservices Architecture (Domain Decomposition)
Up until this point, your application servers (even when multiplied horizontally across regions) are still running the entire codebase monolithic-style. A tiny bug in the notification code can crash the server and take down recipe browsing. Furthermore, scaling the whole monolith just to handle a spike in cooking video processing wastes massive computing resources.

We pull the monolithic codebase apart into completely autonomous, loosely coupled, domain-specific Microservices (e.g., Recipe Service, User Profile Service, Billing Service).

The Analogy: Your library has grown so large that the single team of general clerks is completely overwhelmed. You break the team down into specialized departments. You now have a Procurement Department, a Filing Department, and a Customer Accounts Department. Each department works independently, operates in its own dedicated room, and uses its own set of tools.
The Technical Shift: Each microservice runs on its own isolated compute cluster (containerized via Docker/Kubernetes) and scales independently based on its specific load characteristics.
Microservices talk to each other via lightweight, language-agnostic network protocols like gRPC or asynchronous events via the Message Queue.

Stage 15: Polyglot Persistence (The Right Database for the Right Job)

In the earlier stages, we relied purely on standard relational databases (SQL). However, as features diversify, a rigid tabular database structure becomes highly inefficient for complex operations like full-text search indexing, relational social graphs (e.g., user follows, likes), or high-frequency trending leaderboards.

We transition from a single database type to a Polyglot Persistence model, matching individual microservices to the ideal storage engine type.

The Analogy: In your library, you no longer try to force every single piece of information into identical cardboard folders. You store historical books on open shelves, visitor login times on a digital swipe-card log, and the master index in a lightning-fast alphabetical card catalog drawer.
Relational DB (PostgreSQL/MySQL): Retained for core transactions, user billing, and structured profile profiles.
NoSQL Document Store (MongoDB/DynamoDB): Used for flexible, unstructured recipe metadata and reviews.
Search Engine (Elasticsearch/OpenSearch): Used to power fuzzy text search, autocomplete, and complex ingredient filters.
Graph Database (Neo4j): Used to map out social graphs, tracking follower connections and personalized recipe recommendations.

The Golden Rule of System Design

Scaling isn’t about jumping straight to microservices, database sharding, or complex container orchestration on day one. The best architectures evolve organically.

The Success Blueprint

Build simply.
Measure performance and identify real bottlenecks.
Fix the single largest bottleneck.
Automate the operation.
Repeat.

System Design: What Actually Happens When You Upload a File to Google Drive?

Krishna Kanth Latya — Mon, 15 Jun 2026 20:27:04 +0000

Uploading a file to Google Drive feels simple. You select a file, click Upload, watch a progress bar move, and moments later the file appears in your Drive.

But behind this seemingly simple action lies a highly distributed system designed to handle millions of users, billions of files, and exabytes of data while remaining reliable, scalable, and fault-tolerant.

In this article, we'll explore what actually happens behind the scenes when you upload a file to Google Drive and how the system is designed to operate at global scale.

The Problem

At first glance, uploading a file appears straightforward:

User
  │
  ▼
Upload API
  │
  ▼
Storage

For a small application, this architecture might work. However, Google Drive operates at an entirely different scale. Users upload everything from small images to massive video files and backups that can be hundreds of gigabytes in size. At the same time, millions of users may be uploading files concurrently from different parts of the world.

This creates several challenges:

Large file uploads can take hours
Network connections may disconnect midway
Millions of uploads must be handled simultaneously
Uploaded data must remain accurate and uncorrupted
Hardware failures should never cause data loss
Storage must scale to billions of files
Users expect fast and seamless uploads

A simple upload server cannot solve these problems.

High-Level Solution

Instead of uploading an entire file at once, Google Drive breaks the file into smaller chunks. These chunks are uploaded independently, validated, temporarily stored, and later assembled into the final file.

Each upload is tracked through an upload session, allowing interrupted uploads to resume from where they stopped rather than starting over.

Once the upload is complete, the file is stored in Google's distributed storage infrastructure and replicated across multiple locations to ensure durability and availability.

Meanwhile, background services generate thumbnails, scan for viruses, extract metadata, and prepare previews without delaying the user experience.

Let's walk through the complete upload journey.

Step 1: User Authentication

Before an upload begins, Google must verify the user's identity. The Google Drive client sends an access token obtained during login.

The authentication service verifies:

User identity
Storage quota
Account permissions
Upload authorization

Only after successful verification can the upload proceed. This prevents unauthorized users from consuming storage resources.

Step 2: Upload Session Creation

Google does not immediately start receiving file data. Instead, it first creates an upload session. The upload session acts as a tracking record for the entire upload process.

It stores information such as:

User ID
File name
Upload status
Uploaded chunks
Remaining chunks

This session becomes extremely important if the upload gets interrupted.

Step 3: File Chunking

Uploading large files as a single request is inefficient and risky. Instead, Google splits files into smaller chunks.

Example: 5 GB File

Chunk 1
Chunk 2
Chunk 3
Chunk 4
...
Chunk N

Chunking provides several advantages:

Faster Recovery

If a single chunk fails:

Retry Chunk 52

instead of:

Retry Entire 5 GB File

Parallel Uploads

Multiple chunks can be uploaded simultaneously. This significantly improves upload performance.

Chunk 1 ──►
Chunk 2 ──►
Chunk 3 ──►
Chunk 4 ──►

Step 4: API Gateway and Load Balancing

Every upload request first reaches Google's edge infrastructure.

Responsibilities include:

Request routing
Authentication validation
Rate limiting
Traffic management
DDoS protection

Instead of a single upload server handling all traffic, requests are distributed across thousands of upload servers. This allows Google Drive to support millions of concurrent uploads.

Step 5: Chunk Verification

Data can become corrupted during transmission. To ensure integrity, every uploaded chunk is validated using checksums.

Common verification methods include:

SHA-256
CRC32C

If verification fails:

Chunk Rejected

The client simply uploads the chunk again. This guarantees that the stored data exactly matches the original file.

Step 6: Temporary Chunk Storage

Successfully verified chunks are stored temporarily. At this stage, the file does not yet exist as a complete object. Google stores each chunk independently while tracking progress through the upload session.

This design enables:

Upload recovery
Parallel uploads
Efficient retries

Step 7: Resumable Uploads

One of the most important features of Google Drive is resumable uploads.

Imagine a network failure during upload. Without upload sessions, the user would need to start over.

Instead, Google checks the upload session:

Uploaded Chunks:
1 ✓
2 ✓
3 ✓
4 ✓
...
400 ✓

When connectivity returns:

Resume From Chunk 401

rather than:

Resume From Chunk 1

This dramatically improves reliability and user experience.

Step 8: File Assembly Service

After all chunks arrive successfully, Google assembles them into a complete file. The assembly service ensures chunks are combined in the correct order to reconstruct the original file.

Step 9: Metadata Service

A file consists of two parts:

Metadata

The actual bytes of the file.

File Content

Information about the file.

{
  "fileId": "xyz123",
  "name": "vacation.mp4",
  "owner": "user1",
  "size": "5GB"
}

Metadata is stored separately because it allows Google Drive to provide:

Search
Sharing
Folder navigation
Permission management

without scanning the actual file contents.

Step 10: Distributed Object Storage & Metadata Mapping

Once all chunks are successfully uploaded and verified, the system logically assembles the file. Instead of physically gluing the chunks back together onto a single hard drive, the system creates a metadata map (a recipe showing how the chunks fit together) and distributes the individual chunks across Google's storage infrastructure.

Instead of keeping the data on one machine:

Storage Node A holds Chunk 1
Storage Node B holds Chunk 2
Storage Node C holds Chunk 3

Benefits include:

Horizontal scalability: No single server runs out of disk space.
Faster access: Users can download different chunks in parallel from multiple servers simultaneously.
Storage efficiency: Allows Google Drive to manage billions of massive files without bottlenecking individual hardware units.

Step 11: Chunk Replication for Durability

Hardware failures happen constantly in large-scale systems. To prevent data loss, the system doesn't just store those distributed chunks once — it immediately creates identical redundant copies of each chunk across different physical locations.

The system clones the chunks across isolated zones:

Copy 1 of all chunks → Data Center A (e.g., Oregon)

Copy 2 of all chunks → Data Center B (e.g., Iowa)

Copy 3 of all chunks → Data Center C (e.g., Belgium)

If a specific server node crashes, a rack loses power, or an entire data center goes offline due to a natural disaster, the file remains fully intact and accessible from another region.

This geographic replication strategy ensures near-perfect data durability.

Step 12: Background Processing

The upload may be complete, but additional work still needs to happen.

Google typically performs:

Virus scanning
Thumbnail generation
Search indexing
OCR processing
Video transcoding
Preview generation

Instead of blocking the upload, these tasks run asynchronously in the background. As a result, users gain access to their files quickly while additional processing continues behind the scenes.

Conclusion

A file upload may look simple on the surface, but behind the scenes it involves a sophisticated distributed system. This architecture enables Google Drive to provide a fast, reliable, and scalable experience while handling billions of files across the globe.

The next time you drag a file into Google Drive, remember that behind a simple progress bar is a massive distributed system working together to ensure your data is uploaded safely and reliably.