Akshat Jain

Posted on Mar 5 • Originally published at Medium

Unique ID generator — Twitter snow flake design

#distributedsystems #identity #snowflake #unique

In large distributed systems, something that looks simple on the surface — generating a unique ID — becomes surprisingly hard.

When millions of users are posting, liking, commenting, and messaging at the same time across thousands of servers, how do you generate IDs without conflicts, delays, or bottlenecks?

This is the problem Twitter faced.

Their solution is called Snowflake.

This article takes a deep, theoretical, and beginner-friendly look at Twitter Snowflake:

Why it exists
How it works internally
How its bit-level design enables scale
Why it’s still widely used today

The Problem: Why ID Generation Is Hard in Distributed Systems

In a single machine system, generating IDs is easy:

Use an auto-increment integer
Use a database sequence

But in distributed systems, things break quickly.

Traditional Approaches and Their Problems

1. Central Database Auto-Increment

All services ask one database for IDs
Becomes a single point of failure
Doesn’t scale

2. UUIDs (Universally Unique Identifiers)

Very low collision chance

But:

Long and unreadable
Not time sortable
Poor index performance in databases

3. Coordination-Based Systems

Use locks or consensus
Adds latency
Reduces throughput

Twitter needed something better.

What Is Twitter Snowflake?

Twitter Snowflake is a distributed unique ID generator that creates 64-bit integers with special properties.

Each generated ID is:

✅ Globally unique — No collisions across machines or data centers
✅ Time sortable — Newer IDs are always larger than older ones
✅ High performance — Thousands of IDs per second per machine
✅ Decentralized — No central coordination required

This makes Snowflake ideal for large-scale systems like Twitter.

High-Level Idea Behind Snowflake

Snowflake embeds time and machine information directly into the ID itself.

Instead of storing metadata separately, the ID is the metadata.

At a high level:

Part of the ID represents time
Part represents which machine generated it
Part represents a counter for that millisecond

This design guarantees uniqueness without communication between servers.

Snowflake ID Structure (64-bit Layout)

Snowflake uses a fixed 64-bit integer.

Each bit has a purpose.

Standard Snowflake Bit Allocation

Deep Explanation of Each Component

i. Sign Bit (1 bit)

Always set to 0
Ensures the ID is positive
Allows compatibility with signed 64-bit integers

This bit is unused but reserved.

ii. Timestamp (41 bits)

The timestamp stores:

Milliseconds elapsed since a custom epoch

Why Not Unix Epoch?

Unix epoch starts at 1970
Wastes bits storing old time
Custom epoch starts closer to system creation

Capacity

41 bits can represent:
~2.2 trillion milliseconds
~69 years

This is more than enough for long-running systems.

iii. Machine ID (10 bits)

This identifies which machine generated the ID.

10 bits → 1024 unique machines
Often split internally:
Data center ID
Worker ID

This guarantees uniqueness across servers.

iv. Sequence Number (12 bits)

This handles multiple IDs in the same millisecond.

12 bits → 4096 IDs per millisecond per machine
Reset every new millisecond

This is what gives Snowflake its massive throughput.

How Snowflake Works (Step by Step)

Let’s walk through the algorithm logically.

Step 1: Get Current Time

Current timestamp in milliseconds

Step 2: Compare with Last Timestamp

If new millisecond → reset sequence
If same millisecond → increment sequence
If clock moved backwards → error or wait

Step 3: Handle Sequence Overflow

If sequence reaches 4096
Wait until the next millisecond

Step 4: Assemble the ID

Shift timestamp left
Shift machine ID
Add sequence
Combine using bitwise OR

Step 5: Return ID

Single 64-bit integer
Unique and sortable

Snowflake Pseudocode (Theoretical Implementation)

import time  
import threading  

classSnowflakeGenerator:  
    def\_\_init\_\_(self, machine\_id):  
        self.machine\_id = machine\_id &0x3FF# 10 bits  
        self.sequence =0  
        self.last\_timestamp = -1  
        self.lock = threading.Lock()  
        self.epoch =1672531200000# Custom epoch (Jan 1, 2023)  

    def\_current\_millis(self):  
        returnint(time.time() \*1000)  

    def\_wait\_next\_millis(self, last\_ts):  
         ts =self.\_current\_millis()  
        while ts <= last\_ts:  
            ts =self.\_current\_millis()  
        return ts  

    defnext\_id(self):  
        withself.lock:  
            ts =self.\_current\_millis()  

        if ts <self.last\_timestamp:  
            raise Exception("Clock moved backwards!")  

        if ts ==self.last\_timestamp:  
          self.sequence = (self.sequence +1) &0xFFF  
          ifself.sequence ==0:  
              ts =self.\_wait\_next\_millis(self.last\_timestamp)  
        else:  
              self.sequence =0  

        self.last\_timestamp = ts  

        return ((ts -self.epoch) <<22) | (self.machine\_id <<12) |self.sequence

Why Bit Shifting Matters (Theory)

Snowflake relies heavily on bit manipulation.

Why Shift Left?

Shifting moves values into their bit positions
Prevents overlap between fields
Makes decoding possible

Example:

Timestamp occupies highest bits
Machine ID sits in the middle
Sequence stays at the bottom

This design is compact, fast, and deterministic.

Advantages of Snowflake

i. Scalability

Thousands of IDs per millisecond
Linear scaling with machines

ii. No Central Coordination

Machines generate IDs independently
No network calls

iii. Time Ordering

IDs sort naturally by creation time
Great for databases and logs

iv. Storage Efficiency

64-bit integers
Smaller and faster than UUIDs

Limitations and Edge Cases

i. Clock Rollback

If system clock goes backward
Can break ordering
Must be handled carefully

ii. Machine ID Management

IDs must be unique
Requires configuration or coordination

iii. Fixed Bit Limits

1024 machines max (by default)
Design must be adjusted for larger clusters

Variants and Extensions

Many systems customize Snowflake:

Add data center ID
Use base62 encoding for shorter strings
Use logical clocks
Combine with Zookeeper / Redis

Popular systems inspired by Snowflake:

Instagram
Discord
Flake IDs

When Should You Use Snowflake?

Use Snowflake when:

You need distributed ID generation
You want time-orderable IDs
You need high throughput
You want database-friendly keys

Avoid it when:

You don’t control clocks
You need cryptographically random IDs

Final Thoughts

Twitter Snowflake is a beautiful example of systems thinking.

It shows how:

Bit-level design
Time-based logic
Distributed systems constraints

Can come together into a simple but powerful solution.

Understanding Snowflake doesn’t just teach ID generation — it teaches how to think like a system engineer.

DEV Community