Harshit Singh

Posted on Oct 27 • Edited on Oct 29

System Design of YouTube: A Detailed Deep Dive into the Video Giant

#webdev #java #systemdesign #witttedtech

🏗 High-Level Design (HLD) of YouTube

The high-level design of YouTube is a distributed, large-scale architecture that supports several billion users, millions of video uploads, and hundreds of millions of searches per day. YouTube deals with challenges of scale, real-time video streaming, data processing, and distributed search.

Core High-Level Components

Content Delivery Network (CDN)
- Why: YouTube uses CDNs to reduce latency and improve performance by caching video content closer to users. A user in Tokyo should ideally stream videos from a CDN node in Japan, rather than waiting for data from the U.S. Data centers.
- How it works: CDN nodes (Edge Servers) cache videos based on user proximity and demand. YouTube uses Google’s CDN, part of Google Cloud Platform (GCP). Other commercial CDNs, like Akamai and Cloudflare, are alternatives but wouldn't provide the same level of deep integration as Google's own infrastructure.
- Why Not Alternatives: Building a proprietary CDN network makes sense for Google (owner of YouTube) due to its scale, and it allows for more cost-effective management. While commercial CDNs could be used, the costs and inefficiencies at YouTube's scale would make it impractical.
Video Upload and Processing Service
- Why: Videos uploaded by users need to be stored, processed, and transcoded into various formats (240p, 480p, 720p, 1080p, 4K) to accommodate different user bandwidths.
- How it works:
  - Upload: Users upload video data in chunks (multi-part upload), using the Google Cloud Storage API. This avoids timeouts for large files and allows resuming uploads after failures.
  - Transcoding: YouTube uses FFmpeg (a widely used multimedia processing framework) internally to transcode videos into multiple resolutions. Each uploaded video is converted into a standardized format for efficient playback across different devices.
  - Why Not Alternatives: FFmpeg is widely used for video processing because it supports virtually every multimedia format and is highly efficient. While alternatives like GStreamer exist, they lack FFmpeg's stability and features at scale.
Storage (Video and Metadata Storage)
- Video Storage: YouTube stores video data in a distributed object storage system using Google Cloud Storage (GCS). GCS offers durability, high availability, and cost efficiency with multi-region support.
- Why Not Alternatives: YouTube could use other distributed file systems like Amazon S3 or Azure Blob Storage, but it opts for GCS due to seamless integration with its other infrastructure (networking, CDN, and processing). GCS offers better scalability and management at YouTube’s scale.
- Metadata Storage: Metadata (video titles, descriptions, tags) is stored in Bigtable, a NoSQL database developed by Google.
  - Why Bigtable? It’s optimized for low-latency, high-throughput operations, which is essential for fast reads/writes of video metadata. At YouTube's scale (petabytes of metadata), relational databases would have a hard time handling the volume, and NoSQL is a better fit.
  - Why Not Other NoSQL DBs? Alternatives like Cassandra or DynamoDB could theoretically be used, but Bigtable integrates tightly with Google's ecosystem, allowing superior performance for internal services.
Content Search Service
- Why: Users need to search for millions of videos efficiently, so YouTube requires a search engine capable of full-text search and ranking results based on relevance, popularity, and personalization.
- How it works: YouTube relies on Elasticsearch for its search service.
  - Elasticsearch is used for indexing video metadata (titles, descriptions, tags). It’s distributed, supports multi-node clusters, and is designed to handle real-time, large-scale search operations.
- Why Not Alternatives: Alternatives like Solr exist, but Elasticsearch is chosen for its ease of scaling, better support for distributed architectures, and powerful query capabilities. Also, it integrates well with other parts of the GCP ecosystem.
Recommendation System
- Why: The recommendation system is the secret sauce of YouTube, providing personalized video suggestions to keep users engaged.
- How it works:
  - It uses machine learning models (like collaborative filtering, deep learning, and matrix factorization techniques) trained on user data: watch history, likes, search behavior, and demographics.
  - Data pipelines are built using Apache Spark and Flink, with TensorFlow models running in production to provide real-time recommendations.
- Why Not Alternatives: The choice of TensorFlow (Google’s own ML framework) over something like PyTorch is strategic. TensorFlow’s deep integration with GCP infrastructure makes it ideal for scalable ML workloads.
API Gateway
- Why: YouTube needs to expose a set of well-defined APIs to clients (web, mobile apps, third-party integrations). These APIs need to route requests to the appropriate microservices (video, search, recommendations, etc.).
- How it works: Google’s API Gateway handles routing, authentication, and rate-limiting. It connects clients to backend services while ensuring that the system remains modular and scalable.
- Why Not Alternatives: Google’s API Gateway is the obvious choice here because it offers built-in integration with GCP services, better scalability, and easier security management.

🖼 HLD Diagram with Core Components

Here’s a more detailed High-Level Design diagram for YouTube:


  [Clients (Web, Mobile)] --> [API Gateway] --> [Load Balancer]
                                    |                       |
                      [Search Service]        [User Service]      [Video Upload/Transcoding]
                                      |                       |
                                [Recommendation Service]       [CDN]
                                    |                        |
                           [Bigtable for Metadata]      [Google Cloud Storage for Video]

🛠 Low-Level Design (LLD) of YouTube: Deep Dive into Core Services

Now, we’ll delve deeper into each of the core services that power YouTube, the challenges they address, and the design decisions behind them.

1. Video Upload Flow and Processing

The video upload flow involves multiple stages, from upload to processing and serving. Here’s how it works:

Upload Flow:

Client uploads a video in chunks (multi-part) to YouTube’s Upload Service.
The Upload Service stores the raw chunks temporarily in Google Cloud Storage.
Once all chunks are uploaded, a message is sent to a Kafka message queue, which triggers video processing.

Video Processing (Transcoding):

Transcoding Pipelines take the raw uploaded video and convert it into multiple resolutions. This is critical for delivering video based on varying internet speeds and devices.
Video transcoding workers process multiple jobs in parallel. These workers are stateless and scale horizontally.
After transcoding, the processed video is stored in Google Cloud Storage, with a reference to the video ID stored in Bigtable.

Advantages of this flow:

Fault Tolerance: If any video chunk fails to upload or transcode, the system can retry without reprocessing the entire video.
Parallelism: Video transcoding is parallelized across multiple machines, improving throughput.
Scalability: Google Cloud Storage is inherently scalable, capable of handling YouTube’s petabyte-scale storage needs.

2. Video Streaming Architecture

Streaming Flow:

Client Request: A user requests to play a video by clicking on it.
Load Balancer: The request is sent to the Load Balancer, which determines the best backend node to serve the request.
Content Delivery: The CDN (Google’s Edge Network) handles delivering the actual video stream to the user. The edge server closest to the user serves the video.
Adaptive Bitrate Streaming: YouTube uses MPEG-DASH or HLS for streaming. These protocols support adaptive bitrate streaming, which adjusts video quality in real-time based on the user’s network conditions.

Why Not Alternatives:

MPEG-DASH and HLS are the standard protocols for high-quality video streaming. They allow seamless switching between video resolutions, minimizing buffering.

3. Search Architecture

Search Flow:

Client Request: A search request is sent to the Search Service through the API Gateway.
Elasticsearch: The query is executed against the Elasticsearch index, which contains metadata for millions of videos.
Ranking and Relevance: Search results are ranked based on factors like video relevance, popularity, and personalization data (watch history, subscriptions).

Elasticsearch Design:

The index is sharded across multiple Elasticsearch nodes, allowing horizontal scalability.
Shards are replicated to ensure high availability.

Why Elasticsearch over Solr:

Scalability and distributed search are better supported in Elasticsearch.
Elasticsearch has better integration with other Google services, such as Kibana for real-time monitoring.

🔄 Modernizing YouTube's Architecture with New Tech

If YouTube were to modernize its system using the latest technologies, here are some improvements they could make:

1. Microservices with Service Mesh

YouTube could leverage Istio or Linkerd to implement a service mesh. This would help manage microservice communication, improve security, and monitor service performance better than traditional RPC mechanisms.

2. GraphQL for APIs

Instead of REST APIs, YouTube could adopt GraphQL for flexible and efficient querying. This would allow clients (mobile/web) to retrieve exactly the data they need, minimizing over-fetching or under-fetching.

3. Real-Time Recommendations with Kafka Streams

YouTube’s recommendation engine could evolve to use Kafka Streams for real-time processing of user events (likes, watch behavior, etc.), which would lead to more dynamic and personalized recommendations.

4. Cloud-Native Infrastructure

Kubernetes is already used in many parts of YouTube’s architecture, but deeper integration could allow for better management of containerized microservices, auto-scaling, and self-healing features.

Conclusion: In-Depth Recap

We’ve explored the high-level and low-level designs of YouTube, diving into the technical choices behind each component, like why Google Cloud Storage and Bigtable are used for scalability, how Elasticsearch supports video search, and why FFmpeg is YouTube’s go-to transcoding tool. We've also discussed potential modern improvements to YouTube’s architecture using service mesh, GraphQL, and real-time streaming.

YouTube’s architecture is a brilliant example of solving challenges related to scale, latency, and availability using the right combination of tools and infrastructure.

Top comments (12)

Chhavi Sachdeva • Nov 7

Can you share some insight about the actual architecture of YouTube, I found multiple architectures and that too with slight variation over the placement of CDNs.
Also does YouTube follow monolithic or microservices or hybrid architecture?
(A slight confusion over this)

Harshit Singh • Nov 9

Indeed there are several different system design posts over the internet and all of them are saying something of their own and that can be confusing but the reason behind that is because the system is getting upgraded regularly to catch up with daily increase of users and match the latest technology for better user experience. However as the answer of your question; "YouTube primarily follows a microservices architecture, although it has evolved from a more monolithic structure in its earlier days. The microservices model allows for independent scaling, development, and deployment of individual components (e.g., user management, video encoding, content recommendations)."
"Some aspects may seem hybrid because certain legacy or tightly integrated systems might still resemble monolithic behavior. This mix helps facilitate the migration and maintenance of core services."
For the answer related to use of CDN :- "YouTube uses Google's global CDN, part of their extensive edge network. This network caches content closer to users, reducing latency and improving load times. The placement of CDNs ensures that videos are distributed efficiently, handling requests through the nearest edge location to the user."
"Typically, CDNs are positioned at strategic locations across different regions, serving as intermediaries between the user and YouTube's primary data centers. These CDNs cache popular and frequently accessed videos to minimize the load on origin servers."
"While static video content is cached in CDNs, dynamic aspects like live streaming still route through the main servers before being cached at the edge for subsequent viewers."

For DATA MANAGEMENT : "Google Bigtable and Spanner are utilized for their high-throughput and globally consistent data management capabilities, handling everything from user data to video metadata and recommendations."
"The architecture uses Maglev, Google's load balancer, which helps distribute traffic efficiently across global data centers."
For MICROSERVICES COMMUNICATION: "gRPC and REST APIs are used."
For COMMUNICATION PROTOCOL: "QUIC and HTTP/2 are often used for faster, more reliable video transmission."

Hope this will clear your doubts. If You still have any question/query, feel free to ask or connect with me in any social media platform. I will try my best to explain everything as per my knowledge. @sachdeva_2303

Chhavi Sachdeva • Nov 10

Really grateful for this response.

Harshit Singh • Oct 27

Do Share Your Feedbacks.

Aryan Pro • Oct 27

Thanks for sharing this. Well written.
Somehow you managed to say about everything, but still there are lot of leftovers.
Lot of new terms were unknown. But we cannot explain or read everything. Wondering how they manage to handle this dataflow and design implementation.

Harshit Singh • Oct 27

I tried to provide an detailed overview; still if anything particular, you would like to know about. Do mention the topic/question, i'll try to explain it in the next post.

Nitesh-nkj • Oct 28

How does YouTube live streaming works?

Harshit Singh • Oct 29

I have posted an article with the explanation "How Youtube's Live Streaming and Content Delivery Works". You can check it out : dev.to/wittedtech-by-harshit/unvei...

Zidan • Oct 28

How about their frontend, it's seems to be a mix of SSR and SPA to me. Can you shed some light onto it?

Harshit Singh • Oct 29

@zidan_ba82bf8632fb0c70223 || You're absolutely correct! YouTube's frontend is indeed a hybrid approach that combines both Server-Side Rendering (SSR) and Single Page Application (SPA) characteristics. This blend provides YouTube with a fast and interactive user experience, optimized for both SEO and performance. || I think you'll find this article of mine useful - dev.to/wittedtech-by-harshit/insid...

Sasikumar Pallekonda • Oct 29

Well written, best part I liked was the choices at each component.

Harshit Singh • Oct 29

Thanks for the feedback sir.😊

View full discussion (12 comments)

DEV Community

System Design of YouTube: A Detailed Deep Dive into the Video Giant

🏗 High-Level Design (HLD) of YouTube

Core High-Level Components

🖼 HLD Diagram with Core Components

🛠 Low-Level Design (LLD) of YouTube: Deep Dive into Core Services

1. Video Upload Flow and Processing

Upload Flow:

Video Processing (Transcoding):

Advantages of this flow:

2. Video Streaming Architecture

Streaming Flow:

Why Not Alternatives:

3. Search Architecture

Search Flow:

Elasticsearch Design:

Why Elasticsearch over Solr:

🔄 Modernizing YouTube's Architecture with New Tech

1. Microservices with Service Mesh

2. GraphQL for APIs

3. Real-Time Recommendations with Kafka Streams

4. Cloud-Native Infrastructure

Conclusion: In-Depth Recap

Top comments (12)

Read next

Unveiling the ConFoo 2025 edition!

Authorizing endpoints of external apps in k8s

Create new action in Laravel nova for download PDF for all websites pages

Text compression & Code splitting & Modern image formats - Performance optimization