Lucas Lomeu

Posted on Feb 10

Building a Scalable URL Shortener

#systemdesign #architecture #java #performance

I thought a URL shortener was literally a simple CRUD. It turned out to be the exact opposite, so here’s how I designed and built mine.

The Problem

Let's understand what we're building:

What it does:

Convert https://website.com/very/long/url into short.com/aB3x
Redirect users from short URL back to the original
Handle 1,000+ writes/sec and 10,000+ reads/sec
Never generate duplicate short codes

What makes it hard:

IDs must be globally unique - no collisions
Read-heavy workload - 10:1 read/write ratio
Must scale horizontally

The Math Behind It

Let's do some napkin math to understand scale:

100 million URLs/day x (24 hours / 3600 ms) = ~1,160 writes/sec
10:1 read ratio - 1,160 writes/sec x 10 = ~11,600 reads/sec
Over 10 years - 100M x 365 days * 10 years = ~365 billion URLs

Architecture: Hexagonal Pattern

I chose Hexagonal Architecture because it keeps the business logic independent from infrastructure. This makes it easier to swap databases, frameworks, or transport layers without touching the core of the application. It also aligns with what I’ve been studying lately, since I plan to dive deeper into this architecture and eventually write an article about it.

Database Choice: PostgreSQL

Strong consistency – Ensures there are no duplicate short codes, even during network partitions
ACID transactions – ID generation and insertion happen atomically
Simple operations – Around 99% of the queries are direct key lookups

Schema Design

CREATE SEQUENCE urls_id_seq;

CREATE TABLE urls (
  id BIGINT PRIMARY KEY DEFAULT nextval('urls_id_seq'),
  short_code VARCHAR(10) NOT NULL UNIQUE,
  long_url TEXT NOT NULL,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE UNIQUE INDEX idx_urls_short_code ON urls (short_code);

Why this works:

id is the source of truth (sequence-generated)
short_code has unique index for fast O(log n) lookups
No index on long_url (expensive, rarely queried)

The Core Problem: ID Generation

Generating short codes sounds simple, but in practice you need IDs that:
1. Are globally unique (no collisions)
2. Stay short (7 characters or less)
3. Don't require coordination between servers

Strategy: Sequence + Base62 Encoding

Here's the actual implementation:

public class ShortCodeGenerator implements ShortCodeGeneratorPort {
    private static final String BASE62 =
        "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";

    @Override
    public String generate(long id) {
        if (id <= 0) {
            throw new IllegalArgumentException("ID must be positive");
        }

        StringBuilder sb = new StringBuilder();
        long n = id;

        while (n > 0) {
            int remainder = (int) (n % 62);
            sb.append(BASE62.charAt(remainder));
            n /= 62;
        }

        return sb.reverse().toString();
    }
}

Why this is collision-free:

PostgreSQL sequence guarantees unique IDs (1, 2, 3, ...)
Base62 is a bijective function: unique ID -> unique code
If ID1 != ID2, then Base62(ID1) != Base62(ID2)
No retry logic needed

Capacity Analysis

Length	Possible URLs	Years at 1k/sec
5 chars	916 million	29 years
6 chars	56.8 billion	1,800 years
7 chars	3.52 trillion	111,000 years

With 7 characters, we’re set for a while. We only need about 365 billion.

Caching

At ~10k reads/sec, PostgreSQL was fine. But I knew the moment traffic spiked, the DB would start being the bottleneck.

What surprised me the most was how visible the change became after adding cache.

The Problem

Every request:
1. App -> PostgreSQL: "SELECT * FROM urls WHERE short_code = 'aB3x'"
2. PostgreSQL -> Disk: Read from index + table
3. PostgreSQL -> App: Return result
4. App -> Client: 302 redirect

Latency: 5-10ms per request

Even with indexes, disk I/O is expensive and most users keep hitting the same short URLs.

Solution: Cache‑Aside with Redis

The pattern is simple:

1. Try Redis first
2. If miss -> query in Database
3. Store in Redis for next time

This gave me the best of both worlds: resilience (if Redis dies, the system still works) and speed (most reads never touch the DB).

API Design

Keep it simple:

Create Short URL

POST /api/v1/shorten
Content-Type: application/json

{
  "url": "https://website.com/very/long/url"
}

Response:

{
  "shortCode": "aB3x"
}

Redirect

GET /aB3x
GET /api/v1/aB3x

Returns 302 Found with Location: https://website.com/very/long/url

Why 302 instead of 301?

301 (permanent) gets cached by browsers -> no click tracking
302 (temporary) hits your server every time -> you can measure clicks or any metrics

For a URL shortener with analytics, 302 is the right choice.

Code Structure

Here's the actual file structure:

src/main/java/org/lomeu/
├── application/
│   ├── service/
│   │   └── UrlService.java         (use case orchestration)
│   └── port/
│       ├── in/
│       │   └── UrlUseCase.java     (inbound port)
│       └── out/
│           ├── UrlRepository.java  (outbound port)
│           └── ShortCodeGeneratorPort.java
│
├── domain/
│   └── Url.java                    (pure domain model)
│
├── infrastructure/
│   ├── http/
│   │   └── UrlController.java      (HTTP adapter)
│   ├── database/
│   │   ├── Database.java           (connection pool)
│   │   └── JdbcUrlRepository.java  (persistence adapter)
│   └── generator/
│       └── ShortCodeGenerator.java (Base62 implementation)
│
├── config/
│   └── AppConfig.java
│
└── Main.java                        (entry point + shutdown hook)

The complete implementation at github.com/lucaslomeu/url-shortener.

DEV Community

Building a Scalable URL Shortener

The Problem

The Math Behind It

Architecture: Hexagonal Pattern

Database Choice: PostgreSQL

Schema Design

The Core Problem: ID Generation

Strategy: Sequence + Base62 Encoding

Capacity Analysis

Caching

The Problem

Solution: Cache‑Aside with Redis

API Design

Create Short URL

Redirect

Code Structure

Top comments (0)