DEV Community

Cover image for Building a Scalable URL Shortener
Lucas Lomeu
Lucas Lomeu

Posted on

Building a Scalable URL Shortener

I thought a URL shortener was literally a simple CRUD. It turned out to be the exact opposite, so here’s how I designed and built mine.

The Problem

Let's understand what we're building:

What it does:

  • Convert https://website.com/very/long/url into short.com/aB3x
  • Redirect users from short URL back to the original
  • Handle 1,000+ writes/sec and 10,000+ reads/sec
  • Never generate duplicate short codes

What makes it hard:

  • IDs must be globally unique - no collisions
  • Read-heavy workload - 10:1 read/write ratio
  • Must scale horizontally

The Math Behind It

Let's do some napkin math to understand scale:

  • 100 million URLs/day x (24 hours / 3600 ms) = ~1,160 writes/sec
  • 10:1 read ratio - 1,160 writes/sec x 10 = ~11,600 reads/sec
  • Over 10 years - 100M x 365 days * 10 years = ~365 billion URLs

Architecture: Hexagonal Pattern

I chose Hexagonal Architecture because it keeps the business logic independent from infrastructure. This makes it easier to swap databases, frameworks, or transport layers without touching the core of the application. It also aligns with what I’ve been studying lately, since I plan to dive deeper into this architecture and eventually write an article about it.

Hexagonal Architecture


Database Choice: PostgreSQL

Strong consistency – Ensures there are no duplicate short codes, even during network partitions
ACID transactions – ID generation and insertion happen atomically
Simple operations – Around 99% of the queries are direct key lookups

Schema Design

CREATE SEQUENCE urls_id_seq;

CREATE TABLE urls (
  id BIGINT PRIMARY KEY DEFAULT nextval('urls_id_seq'),
  short_code VARCHAR(10) NOT NULL UNIQUE,
  long_url TEXT NOT NULL,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE UNIQUE INDEX idx_urls_short_code ON urls (short_code);
Enter fullscreen mode Exit fullscreen mode

Why this works:

  • id is the source of truth (sequence-generated)
  • short_code has unique index for fast O(log n) lookups
  • No index on long_url (expensive, rarely queried)

The Core Problem: ID Generation

Generating short codes sounds simple, but in practice you need IDs that:
1. Are globally unique (no collisions)
2. Stay short (7 characters or less)
3. Don't require coordination between servers

Strategy: Sequence + Base62 Encoding

Here's the actual implementation:

public class ShortCodeGenerator implements ShortCodeGeneratorPort {
    private static final String BASE62 =
        "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";

    @Override
    public String generate(long id) {
        if (id <= 0) {
            throw new IllegalArgumentException("ID must be positive");
        }

        StringBuilder sb = new StringBuilder();
        long n = id;

        while (n > 0) {
            int remainder = (int) (n % 62);
            sb.append(BASE62.charAt(remainder));
            n /= 62;
        }

        return sb.reverse().toString();
    }
}
Enter fullscreen mode Exit fullscreen mode

Why this is collision-free:

  • PostgreSQL sequence guarantees unique IDs (1, 2, 3, ...)
  • Base62 is a bijective function: unique ID -> unique code
  • If ID1 != ID2, then Base62(ID1) != Base62(ID2)
  • No retry logic needed

Capacity Analysis

Length Possible URLs Years at 1k/sec
5 chars 916 million 29 years
6 chars 56.8 billion 1,800 years
7 chars 3.52 trillion 111,000 years

With 7 characters, we’re set for a while. We only need about 365 billion.


Caching

At ~10k reads/sec, PostgreSQL was fine. But I knew the moment traffic spiked, the DB would start being the bottleneck.

What surprised me the most was how visible the change became after adding cache.

The Problem

Every request:
1. App -> PostgreSQL: "SELECT * FROM urls WHERE short_code = 'aB3x'"
2. PostgreSQL -> Disk: Read from index + table
3. PostgreSQL -> App: Return result
4. App -> Client: 302 redirect

Latency: 5-10ms per request
Enter fullscreen mode Exit fullscreen mode

Even with indexes, disk I/O is expensive and most users keep hitting the same short URLs.

Solution: Cache‑Aside with Redis

The pattern is simple:

1. Try Redis first
2. If miss -> query in Database
3. Store in Redis for next time

This gave me the best of both worlds: resilience (if Redis dies, the system still works) and speed (most reads never touch the DB).


API Design

Keep it simple:

Create Short URL

POST /api/v1/shorten
Content-Type: application/json

{
  "url": "https://website.com/very/long/url"
}
Enter fullscreen mode Exit fullscreen mode

Response:

{
  "shortCode": "aB3x"
}
Enter fullscreen mode Exit fullscreen mode

Redirect

GET /aB3x
GET /api/v1/aB3x
Enter fullscreen mode Exit fullscreen mode

Returns 302 Found with Location: https://website.com/very/long/url

Why 302 instead of 301?

  • 301 (permanent) gets cached by browsers -> no click tracking
  • 302 (temporary) hits your server every time -> you can measure clicks or any metrics

For a URL shortener with analytics, 302 is the right choice.


Code Structure

Here's the actual file structure:

src/main/java/org/lomeu/
├── application/
   ├── service/
      └── UrlService.java         (use case orchestration)
   └── port/
       ├── in/
          └── UrlUseCase.java     (inbound port)
       └── out/
           ├── UrlRepository.java  (outbound port)
           └── ShortCodeGeneratorPort.java

├── domain/
   └── Url.java                    (pure domain model)

├── infrastructure/
   ├── http/
      └── UrlController.java      (HTTP adapter)
   ├── database/
      ├── Database.java           (connection pool)
      └── JdbcUrlRepository.java  (persistence adapter)
   └── generator/
       └── ShortCodeGenerator.java (Base62 implementation)

├── config/
   └── AppConfig.java

└── Main.java                        (entry point + shutdown hook)
Enter fullscreen mode Exit fullscreen mode

The complete implementation at github.com/lucaslomeu/url-shortener.

Top comments (0)