I thought a URL shortener was literally a simple CRUD. It turned out to be the exact opposite, so here’s how I designed and built mine.
The Problem
Let's understand what we're building:
What it does:
- Convert
https://website.com/very/long/urlintoshort.com/aB3x - Redirect users from short URL back to the original
- Handle 1,000+ writes/sec and 10,000+ reads/sec
- Never generate duplicate short codes
What makes it hard:
- IDs must be globally unique - no collisions
- Read-heavy workload - 10:1 read/write ratio
- Must scale horizontally
The Math Behind It
Let's do some napkin math to understand scale:
- 100 million URLs/day x (24 hours / 3600 ms) = ~1,160 writes/sec
- 10:1 read ratio - 1,160 writes/sec x 10 = ~11,600 reads/sec
- Over 10 years - 100M x 365 days * 10 years = ~365 billion URLs
Architecture: Hexagonal Pattern
I chose Hexagonal Architecture because it keeps the business logic independent from infrastructure. This makes it easier to swap databases, frameworks, or transport layers without touching the core of the application. It also aligns with what I’ve been studying lately, since I plan to dive deeper into this architecture and eventually write an article about it.
Database Choice: PostgreSQL
Strong consistency – Ensures there are no duplicate short codes, even during network partitions
ACID transactions – ID generation and insertion happen atomically
Simple operations – Around 99% of the queries are direct key lookups
Schema Design
CREATE SEQUENCE urls_id_seq;
CREATE TABLE urls (
id BIGINT PRIMARY KEY DEFAULT nextval('urls_id_seq'),
short_code VARCHAR(10) NOT NULL UNIQUE,
long_url TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE UNIQUE INDEX idx_urls_short_code ON urls (short_code);
Why this works:
-
idis the source of truth (sequence-generated) -
short_codehas unique index for fast O(log n) lookups - No index on
long_url(expensive, rarely queried)
The Core Problem: ID Generation
Generating short codes sounds simple, but in practice you need IDs that:
1. Are globally unique (no collisions)
2. Stay short (7 characters or less)
3. Don't require coordination between servers
Strategy: Sequence + Base62 Encoding
Here's the actual implementation:
public class ShortCodeGenerator implements ShortCodeGeneratorPort {
private static final String BASE62 =
"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
@Override
public String generate(long id) {
if (id <= 0) {
throw new IllegalArgumentException("ID must be positive");
}
StringBuilder sb = new StringBuilder();
long n = id;
while (n > 0) {
int remainder = (int) (n % 62);
sb.append(BASE62.charAt(remainder));
n /= 62;
}
return sb.reverse().toString();
}
}
Why this is collision-free:
- PostgreSQL sequence guarantees unique IDs (1, 2, 3, ...)
- Base62 is a bijective function: unique ID -> unique code
- If ID1 != ID2, then Base62(ID1) != Base62(ID2)
- No retry logic needed
Capacity Analysis
| Length | Possible URLs | Years at 1k/sec |
|---|---|---|
| 5 chars | 916 million | 29 years |
| 6 chars | 56.8 billion | 1,800 years |
| 7 chars | 3.52 trillion | 111,000 years |
With 7 characters, we’re set for a while. We only need about 365 billion.
Caching
At ~10k reads/sec, PostgreSQL was fine. But I knew the moment traffic spiked, the DB would start being the bottleneck.
What surprised me the most was how visible the change became after adding cache.
The Problem
Every request:
1. App -> PostgreSQL: "SELECT * FROM urls WHERE short_code = 'aB3x'"
2. PostgreSQL -> Disk: Read from index + table
3. PostgreSQL -> App: Return result
4. App -> Client: 302 redirect
Latency: 5-10ms per request
Even with indexes, disk I/O is expensive and most users keep hitting the same short URLs.
Solution: Cache‑Aside with Redis
The pattern is simple:
1. Try Redis first
2. If miss -> query in Database
3. Store in Redis for next time
This gave me the best of both worlds: resilience (if Redis dies, the system still works) and speed (most reads never touch the DB).
API Design
Keep it simple:
Create Short URL
POST /api/v1/shorten
Content-Type: application/json
{
"url": "https://website.com/very/long/url"
}
Response:
{
"shortCode": "aB3x"
}
Redirect
GET /aB3x
GET /api/v1/aB3x
Returns 302 Found with Location: https://website.com/very/long/url
Why 302 instead of 301?
- 301 (permanent) gets cached by browsers -> no click tracking
- 302 (temporary) hits your server every time -> you can measure clicks or any metrics
For a URL shortener with analytics, 302 is the right choice.
Code Structure
Here's the actual file structure:
src/main/java/org/lomeu/
├── application/
│ ├── service/
│ │ └── UrlService.java (use case orchestration)
│ └── port/
│ ├── in/
│ │ └── UrlUseCase.java (inbound port)
│ └── out/
│ ├── UrlRepository.java (outbound port)
│ └── ShortCodeGeneratorPort.java
│
├── domain/
│ └── Url.java (pure domain model)
│
├── infrastructure/
│ ├── http/
│ │ └── UrlController.java (HTTP adapter)
│ ├── database/
│ │ ├── Database.java (connection pool)
│ │ └── JdbcUrlRepository.java (persistence adapter)
│ └── generator/
│ └── ShortCodeGenerator.java (Base62 implementation)
│
├── config/
│ └── AppConfig.java
│
└── Main.java (entry point + shutdown hook)
The complete implementation at github.com/lucaslomeu/url-shortener.

Top comments (0)