DEV Community

Akshat Jain
Akshat Jain

Posted on • Originally published at Medium

Unique ID generator — Twitter snow flake design

In large distributed systems, something that looks simple on the surface — generating a unique ID — becomes surprisingly hard.

When millions of users are posting, liking, commenting, and messaging at the same time across thousands of servers, how do you generate IDs without conflicts, delays, or bottlenecks?

This is the problem Twitter faced.

Their solution is called Snowflake.

This article takes a deep, theoretical, and beginner-friendly look at Twitter Snowflake:

  • Why it exists
  • How it works internally
  • How its bit-level design enables scale
  • Why it’s still widely used today

The Problem: Why ID Generation Is Hard in Distributed Systems

In a single machine system, generating IDs is easy:

  • Use an auto-increment integer
  • Use a database sequence

But in distributed systems, things break quickly.

Traditional Approaches and Their Problems

1. Central Database Auto-Increment

  • All services ask one database for IDs
  • Becomes a single point of failure
  • Doesn’t scale

2. UUIDs (Universally Unique Identifiers)

  • Very low collision chance

But:

  • Long and unreadable
  • Not time sortable
  • Poor index performance in databases

3. Coordination-Based Systems

  • Use locks or consensus
  • Adds latency
  • Reduces throughput

Twitter needed something better.

What Is Twitter Snowflake?

Twitter Snowflake is a distributed unique ID generator that creates 64-bit integers with special properties.

Each generated ID is:

  • Globally unique — No collisions across machines or data centers
  • Time sortable — Newer IDs are always larger than older ones
  • High performance — Thousands of IDs per second per machine
  • Decentralized — No central coordination required

This makes Snowflake ideal for large-scale systems like Twitter.

High-Level Idea Behind Snowflake

Snowflake embeds time and machine information directly into the ID itself.

Instead of storing metadata separately, the ID is the metadata.

At a high level:

  • Part of the ID represents time
  • Part represents which machine generated it
  • Part represents a counter for that millisecond

This design guarantees uniqueness without communication between servers.

Snowflake ID Structure (64-bit Layout)

Snowflake uses a fixed 64-bit integer.

Each bit has a purpose.

Standard Snowflake Bit Allocation

Standard Snowflake Bit Allocation

Deep Explanation of Each Component

i. Sign Bit (1 bit)

  • Always set to 0
  • Ensures the ID is positive
  • Allows compatibility with signed 64-bit integers

This bit is unused but reserved.

ii. Timestamp (41 bits)

The timestamp stores:

Milliseconds elapsed since a custom epoch

Why Not Unix Epoch?

  • Unix epoch starts at 1970
  • Wastes bits storing old time
  • Custom epoch starts closer to system creation

Capacity

  • 41 bits can represent:
  • ~2.2 trillion milliseconds
  • ~69 years

This is more than enough for long-running systems.

iii. Machine ID (10 bits)

This identifies which machine generated the ID.

  • 10 bits → 1024 unique machines
  • Often split internally:
  • Data center ID
  • Worker ID

This guarantees uniqueness across servers.

iv. Sequence Number (12 bits)

This handles multiple IDs in the same millisecond.

  • 12 bits → 4096 IDs per millisecond per machine
  • Reset every new millisecond

This is what gives Snowflake its massive throughput.

How Snowflake Works (Step by Step)

Let’s walk through the algorithm logically.

Step 1: Get Current Time

  • Current timestamp in milliseconds

Step 2: Compare with Last Timestamp

  • If new millisecond → reset sequence
  • If same millisecond → increment sequence
  • If clock moved backwards → error or wait

Step 3: Handle Sequence Overflow

  • If sequence reaches 4096
  • Wait until the next millisecond

Step 4: Assemble the ID

  • Shift timestamp left
  • Shift machine ID
  • Add sequence
  • Combine using bitwise OR

Step 5: Return ID

  • Single 64-bit integer
  • Unique and sortable

Snowflake Pseudocode (Theoretical Implementation)

import time  
import threading  

classSnowflakeGenerator:  
    def\_\_init\_\_(self, machine\_id):  
        self.machine\_id = machine\_id &0x3FF# 10 bits  
        self.sequence =0  
        self.last\_timestamp = -1  
        self.lock = threading.Lock()  
        self.epoch =1672531200000# Custom epoch (Jan 1, 2023)  

    def\_current\_millis(self):  
        returnint(time.time() \*1000)  

    def\_wait\_next\_millis(self, last\_ts):  
         ts =self.\_current\_millis()  
        while ts <= last\_ts:  
            ts =self.\_current\_millis()  
        return ts  

    defnext\_id(self):  
        withself.lock:  
            ts =self.\_current\_millis()  

        if ts <self.last\_timestamp:  
            raise Exception("Clock moved backwards!")  

        if ts ==self.last\_timestamp:  
          self.sequence = (self.sequence +1) &0xFFF  
          ifself.sequence ==0:  
              ts =self.\_wait\_next\_millis(self.last\_timestamp)  
        else:  
              self.sequence =0  

        self.last\_timestamp = ts  

        return ((ts -self.epoch) <<22) | (self.machine\_id <<12) |self.sequence  

Enter fullscreen mode Exit fullscreen mode

Why Bit Shifting Matters (Theory)

Snowflake relies heavily on bit manipulation.

Why Shift Left?

  • Shifting moves values into their bit positions
  • Prevents overlap between fields
  • Makes decoding possible

Example:

  • Timestamp occupies highest bits
  • Machine ID sits in the middle
  • Sequence stays at the bottom

This design is compact, fast, and deterministic.

Advantages of Snowflake

i. Scalability

  • Thousands of IDs per millisecond
  • Linear scaling with machines

ii. No Central Coordination

  • Machines generate IDs independently
  • No network calls

iii. Time Ordering

  • IDs sort naturally by creation time
  • Great for databases and logs

iv. Storage Efficiency

  • 64-bit integers
  • Smaller and faster than UUIDs

Limitations and Edge Cases

i. Clock Rollback

  • If system clock goes backward
  • Can break ordering
  • Must be handled carefully

ii. Machine ID Management

  • IDs must be unique
  • Requires configuration or coordination

iii. Fixed Bit Limits

  • 1024 machines max (by default)
  • Design must be adjusted for larger clusters

Variants and Extensions

Many systems customize Snowflake:

  • Add data center ID
  • Use base62 encoding for shorter strings
  • Use logical clocks
  • Combine with Zookeeper / Redis

Popular systems inspired by Snowflake:

  • Instagram
  • Discord
  • Flake IDs

When Should You Use Snowflake?

Use Snowflake when:

  • You need distributed ID generation
  • You want time-orderable IDs
  • You need high throughput
  • You want database-friendly keys

Avoid it when:

  • You don’t control clocks
  • You need cryptographically random IDs

Final Thoughts

Twitter Snowflake is a beautiful example of systems thinking.

It shows how:

  • Bit-level design
  • Time-based logic
  • Distributed systems constraints

Can come together into a simple but powerful solution.

Understanding Snowflake doesn’t just teach ID generation — it teaches how to think like a system engineer.

Top comments (0)