Erik for Allscreenshots

Posted on Jan 3

Day 3: Architecture Sketch on a Napkin

#programming #architecture

Day 3 of 30. Today we're drawing boxes and arrows before writing our code.

As a software developer, it's very tempting to just start coding. But while tempting, we've been burned before,
by diving in without a clear picture of how the pieces connect. A few hours of sketching and brainstorming now
could potentially save us days (or more!) of rework later, so that's exactly what we'll do in this session.

For those who are just jumping in: we're building allscreenshots, our SaaS for screenshot automation.

Just a small note: this isn't a "formal" architecture document. It's just the napkin sketch, the document we'll use
for discussions and some our decision-making, but with the full understanding that our end solution will most likely
deviate from this in the future, but it's the mental model we'll hold in our heads while building our target state.

Two API modes: sync and async

Before we've even started, we've already introduced some version of feature creep: we're offering two ways to capture screenshots.
We thought long and hard about this, like can we get away with just 1 version for simplicity reasons, but given our
domain, and our core focus on making screenshots, we've decided that we need to offer both (but our implementation phase
will focus on 1 to start with).

So, after thinking through the use cases, we're offering two ways to capture screenshots:

Synchronous (v1 focus): A request comes in, we capture the screenshot, and we return it directly.
This is simple (from the consumer perspective), immediate and great for most use cases.

Asynchronous (v2): A request comes in, we queue a job, and return a job id immediately.
The client polls or receives a webhook call when done. This solution would be better for high-volume batch processing and for cases
where direct feedback isn't an immediate requirement.

Our goal is to build the sync API first. The reason for this is that it's simpler, it covers 80% of use our currently identified use cases,
and gets us to market faster, which allows us to get immediate feedback. The async API will come later, but will reuse a lot of the building
blocks we introduced while building the sync API.

The sync happy path

Here's what happens when someone requests a screenshot synchronously:

Simple flow:

Client sends URL
We validate (API key, quota, URL format)
We capture the screenshot
We upload to storage
We return the image URL

Total time: 2-5 seconds depending on the target site.

The async happy path (future)

For batch processing or very slow sites, async makes more sense:

The async flow decouples accepting work from doing work:

Client sends request with async: true
We create a job and return immediately (50ms)
Background worker processes the job
Client polls for result (or receives webhook)

We're not building this yet, but the architecture supports it.

Why sync first?

Simpler implementation. No job queue, no polling logic, no webhook infrastructure. Less code, fewer bugs.

Faster time to market. We can ship a working product sooner.

Good enough for most cases. If your screenshot takes 3 seconds, waiting 3 seconds is fine. You don't need async complexity.

Easier to explain. "Send URL, get screenshot" is simpler than "send URL, get job ID, poll for result."

We'll add async when:

Customers ask for batch processing
We need to handle very slow sites (30+ seconds)
We want to offer webhooks

The components

Here's what we're actually building:

┌────────────────────────────────────────────────────────┐
│                        VPS                             │
│                                                        │
│   ┌────────────────────────────────────────────────┐   │
│   │              Spring Boot API                   │   │
│   │                                                │   │
│   │  ┌──────────┐  ┌──────────┐  ┌──────────────┐  │   │
│   │  │  REST    │  │  Auth &  │  │  Screenshot  │  │   │
│   │  │ Endpoints│─▶│  Quota   │─▶│   Service    │  │   │
│   │  └──────────┘  └──────────┘  └──────────────┘  │   │
│   │                                     │          │   │
│   │                              ┌──────▼───────┐  │   │
│   │                              │   Browser    │  │   │
│   │                              │    Pool      │  │   │
│   │                              └──────────────┘  │   │
│   └────────────────────────────────────────────────┘   │
│          │                                             │
│   ┌──────▼───────┐                                     │
│   │   Postgres   │                                     │
│   │  (users,     │                                     │
│   │   keys,      │                                     │
│   │   usage)     │                                     │
│   └──────────────┘                                     │
│                                                        │
└────────────────────────────────────────────────────────┘
           │
           ▼
    ┌─────────────┐     ┌─────────────┐
    │   React     │     │  Cloudflare │
    │   Frontend  │     │     R2      │
    │             │     │  (images)   │
    └─────────────┘     └─────────────┘

Spring Boot API - The main application. For sync mode, everything happens in the request thread or in coroutines.

Browser Pool - Reusable browser instances. Creating a browser is slow (~500ms), so we keep a few warm.

Postgres - Users, API keys, usage tracking. Most likely not used for job queuing in sync mode.

R2 Storage - Screenshots uploaded here. Depending on the configuration, we return signed URLs that expire after a period of time.

React Frontend - Landing page and dashboard. Static files.

Sync API design

Create Screenshot (sync)

The current design is just a proposal. There's a high chance we'll make this a GET API instead of post, so it's easier
to call the API from an image tag, but in terms of functionality, the offered features will be similar.

POST /api/v1/screenshots

Request:

{
  "url": "https://example.com",
  "device": "desktop",
  "full_page": false,
  "format": "png"
}

Response (200 OK):

{
  "id": "scr_abc123def456",
  "url": "https://example.com",
  "image_url": "https://storage.../scr_abc123.png?signature=...",
  "created_at": "2024-01-15T10:30:00Z",
  "metadata": {
    "width": 1920,
    "height": 1080,
    "file_size": 245678,
    "format": "png",
    "capture_time_ms": 2340
  }
}

or:

<binary data>

The response includes the image URL immediately, or will include some meta-data. No polling is required to capture the screenshot.

Timeout handling

Sync requests have a configurable timeout. If the page doesn't load by then we'll return an error to stop the client from waiting.

{
  "error": {
    "code": "timeout",
    "message": "Page did not load within 30 seconds"
  }
}

For sites that regularly take longer, or multiple pages need to be captured, async mode will be the answer.

Async API design (future)

Create Screenshot (async)

POST /api/v1/screenshots

Request:

{
  "url": "https://example.com",
  "async": true,
  "webhook_url": "https://yoursite.com/webhook"
}

Response (202 Accepted):

{
  "id": "scr_abc123def456",
  "status": "pending",
  "created_at": "2026-01-15T10:30:00Z"
}

Poll for status

GET /api/v1/screenshots/scr_abc123def456

Response when pending:

{
  "id": "scr_abc123def456",
  "status": "pending"
}

Response when complete:

{
  "id": "scr_abc123def456",
  "status": "completed",
  "image_url": "https://storage.../scr_abc123.png",
  "metadata": { ... }
}

Webhook callback

If webhook_url provided, we POST to it when done:

{
  "event": "screenshot.completed",
  "data": {
    "id": "scr_abc123def456",
    "status": "completed",
    "image_url": "https://..."
  }
}

What we're deliberately not building yet

There are a lot of things we're not building yet as part of this iteration, and the list is too long to include here,
but some notable items which are not built yet, but which will be built later, are things like:

Multiple browser types - Chromium only. We'll implement Firefox/WebKit and other browsers later.

Screenshot caching - We want to offer some form of configurable caching, but at this moment, we'll always make a screenshot.

Horizontal scaling - At this moment, we'll run a simple VPS. We can scale both horizontally and vertically when we need to, and will
do so depending on our needs.

What we did today

Mostly thinking and sketching:

Drew the architecture diagrams
Designed both sync and async APIs
Decided to build sync first
Identified what we're not building, yet

We haven't pushed any code today, on purpose. Our focus is on getting the foundation right, and having clarity on where we're going, which is
worth our time as well.

Tomorrow: CI/CD on day one

Tomorrow, we're getting our hands dirty, and we'll finally(?) be writing some code, plus we're setting up the deployment pipeline,
where every successful build will end up straight in production, which will allow us to iterate fast.

Book of the day

Designing Data-Intensive Applications by Martin Kleppmann

This is the book we wish we'd read earlier in our careers. It fundamentally changed how we think about systems.

Kleppmann covers databases, distributed systems, batch processing, stream processing - all the building blocks of modern applications. But more importantly, he explains why things work the way they do.

The chapter on message queues vs. request/response patterns directly informed our sync-first decision. Sync is simpler and sufficient for our use case,
and async adds complexity that we don't need just yet.

Designing Data-Intensive Applications is not a quick read. But if you're building anything that handles large amounts of data, then this
book is an essential read.

Current stats:

Hours spent: 5 (1 + 2 + 2 today)
Lines of code: ~50 (still mostly boilerplate)
Revenue: $0
Paying customers: 0
Architecture documented: ✓
API design (sync): ✓
API design (async): ✓ (documented for later)

DEV Community