🐦 How Would You Design Twitter? (Plus: Threads vs Processes, Choosing Databases, and Unique ID Generation)

In this deep-dive post, we explore system design insights, foundational CS concepts, and architecture patterns from real-world use cases. Let’s unpack 👇

💡 Interview Essential: Process vs Thread

Understanding the difference between processes and threads is a must-have for any backend or systems engineer.

🔹 A Program is just a passive set of instructions on disk.

🔹 A Process is a program in action — it’s loaded into memory, with its own resources (stack, registers, etc.)

🔹 A Thread is the smallest unit of execution, running within a process — multiple threads can share memory and resources.

Key differences:

🔹 Processes are isolated; threads run within the same memory space.
🔹 Context switching is heavier for processes than threads.
🔹 Threads allow faster communication but require careful synchronization.
🔹 Creating processes is resource-intensive; threads are lightweight.

💬 Over to you:
1️⃣ How do coroutines differ from threads in languages like Go or Python?
2️⃣ How would you list all running processes in Linux?

🛠️ System Design Interview: Design Twitter

Based on a 2013 Twitter tech talk, here’s how a tweet travels through Twitter’s architecture:

The Life of a Tweet
1️⃣ Tweet comes in via the Write API
2️⃣ Routed to the Fanout service
3️⃣ Stored and processed in Redis cache
4️⃣ Timeline service locates the relevant Redis shard
5️⃣ User pulls the timeline via the Timeline service

Search & Discovery

🔹 Ingester: Tokenizes tweets for indexing
🔹 Earlybird: Stores the searchable index
🔹 Blender: Builds search and discovery timelines

Push Compute

🔹 HTTP Push
🔹 Mobile Push

🔍 Note: Based on Twitter’s 2013 architecture — still valuable for understanding scalable social media backends. Original Talk

💬 What are the architecture differences between LinkedIn and Twitter? How do their use cases influence design?

🧩 Choosing the Right Database – A Visual Guide

Databases are not one-size-fits-all. Always choose the right DB for the workload:

Common types:

🔹 Relational (SQL) – Great for structured data and ACID compliance
🔹 Key-Value / In-Memory – Speed first (e.g., Redis)
🔹 Time Series – Optimized for time-stamped data
🔹 Document / JSON – Flexible schema (e.g., MongoDB)
🔹 Graph – Best for relationships (e.g., Neo4j)
🔹 Blob / Text Search / Geospatial / Ledger – Specialized needs

💬 Which databases have you used? How did they perform for your workload?

Thanks to Satish Chandra Gupta for the visual inspiration!

🔐 Unique ID Generator – A Must for Scalable Systems

Large-scale systems like Facebook, Twitter, and LinkedIn need unique IDs that meet tough requirements:

🔹 Globally unique
🔹 Roughly time-sorted
🔹 Numeric-only
🔹 64-bit
🔹 Low-latency & scalable

Think of this as the backbone of tweet IDs, post IDs, user IDs. The implementation details vary, but the goal remains the same — fast, distributed, and conflict-free identity.

💬 What kind of ID generation strategies have you used (UUIDs, Snowflake, etc.)?

Let’s keep learning from real-world architectures and build scalable, resilient systems together!