DEV Community

Kristiyan
Kristiyan

Posted on

I Built a Free PDF Toolkit in Go + Astro — Here's What I Learned

Every few months I find myself needing to merge two PDFs or pull a few pages out of a big document. I open a search, find a tool, and immediately get hit with "sign up to continue" or "buy premium to process files over 5MB." For a 30-second task.

So I built PDFCrush — a free online PDF toolkit that does merge, split, and compress without accounts or upsells. Here is what the process looked like and what I learned along the way.

The Stack

  • Backend: Go with the chi router
  • Frontend: Astro with Tailwind CSS
  • Database: SQLite via modernc.org/sqlite
  • PDF processing: pdfcpu (pure Go)
  • Hosting: Hetzner ARM64 VPS, Caddy reverse proxy, Cloudflare

No frameworks on the backend. No ORM. No Redis. The entire thing is a single Go binary serving both the API and the static frontend.

Why Go?

I wanted a single binary I could scp to a cheap ARM64 VPS and run. No runtime, no dependency hell, no node_modules on the server. Go gives me that plus genuinely good concurrency for handling file uploads.

The server setup is minimal — chi for routing, slog for structured JSON logging, and standard library for everything else:

r := chi.NewRouter()

r.Use(chimw.RequestID)
r.Use(chimw.RealIP)
r.Use(chimw.Logger)
r.Use(chimw.Recoverer)
r.Use(middleware.SecurityHeaders)
r.Use(middleware.CORS("*"))

r.Route("/api/v1", func(r chi.Router) {
    r.Get("/health", handlers.HealthCheck(store))

    // PDF operations — rate limited for free users.
    r.With(rateLimiter.Limit).Post("/pdf/merge", pdfHandlers.Merge)
    r.With(rateLimiter.Limit).Post("/pdf/split", pdfHandlers.Split)
    r.With(rateLimiter.Limit).Post("/pdf/compress", pdfHandlers.Compress)
})
Enter fullscreen mode Exit fullscreen mode

Chi's middleware chaining with .With() is one of those things that looks obvious but makes route-level concerns (rate limiting, auth) really clean.

Pure Go PDF Processing (No CGO)

This was the decision that shaped the whole project. Most PDF libraries in Go lean on C bindings (poppler, MuPDF, etc.), which means CGO, which means cross-compilation headaches and no static binaries on ARM64 without pain.

Instead I used pdfcpu, which is pure Go. The processing layer is thin:

func (p *Processor) Merge(files []string) (string, error) {
    if len(files) < 2 {
        return "", fmt.Errorf("pdf: merge requires at least 2 files, got %d", len(files))
    }

    outFile := filepath.Join(p.uploadDir, uuid.New().String()+".pdf")

    conf := model.NewDefaultConfiguration()
    if err := api.MergeCreateFile(files, outFile, false, conf); err != nil {
        return "", fmt.Errorf("pdf: merge failed: %w", err)
    }

    slog.Info("pdf merge completed", "input_count", len(files), "output", outFile)
    return outFile, nil
}
Enter fullscreen mode Exit fullscreen mode

Split and compress follow the same pattern — validate inputs, call pdfcpu, return a path. The Processor struct owns the upload directory and max file size, and every handler cleans up temp files with defer:

defer h.processor.Cleanup(savedPaths...)
Enter fullscreen mode Exit fullscreen mode

The trade-off: pdfcpu does not do image resampling for compression, so "compress" is really "optimize the PDF structure" (remove duplicate objects, linearize streams). For most office-generated PDFs this still cuts 20-40% off the size. For image-heavy PDFs, less so. Honest about it, and it still beats "please sign up first."

SQLite for Everything

I use modernc.org/sqlite — a pure Go translation of SQLite's C source. No CGO needed. It works on ARM64, M-series Macs, and Linux AMD64 without any build flags.

db, err := sql.Open("sqlite", dbPath)

pragmas := []string{
    "PRAGMA journal_mode=WAL",
    "PRAGMA foreign_keys=ON",
    "PRAGMA busy_timeout=5000",
    "PRAGMA synchronous=NORMAL",
}
Enter fullscreen mode Exit fullscreen mode

SQLite handles rate-limit counters, operation logs, subscriptions, and email signups. For a tool that processes maybe a few hundred requests a day, it is absurd overkill to reach for Postgres. WAL mode gives me concurrent reads while writes are happening, and the entire database is a single file I can back up with cp.

Rate Limiting Without Redis

The rate limiter is SQLite-backed. It counts operations per IP within a sliding window:

type RateLimiter struct {
    store     *storage.Store
    maxOps    int
    window    time.Duration
    proSecret string
}

func (rl *RateLimiter) Limit(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        ip := extractIP(r)

        // Pro users bypass rate limiting.
        proEmail := proauth.ReadProEmail(r, rl.proSecret)
        if proEmail != "" {
            isPro, _ := rl.store.IsProUser(r.Context(), proEmail)
            if isPro {
                next.ServeHTTP(w, r)
                return
            }
        }

        since := time.Now().Add(-rl.window)
        count, _ := rl.store.GetOperationCount(r.Context(), ip, since)

        if count >= rl.maxOps {
            w.Header().Set("Retry-After", fmt.Sprintf("%d", int(rl.window.Seconds())))
            writeJSONError(w, http.StatusTooManyRequests,
                fmt.Sprintf("rate limit exceeded: %d operations per %s", rl.maxOps, rl.window))
            return
        }

        rl.store.RecordOperation(r.Context(), ip)
        next.ServeHTTP(w, r)
    })
}
Enter fullscreen mode Exit fullscreen mode

Free users get 5 operations per 24 hours. Pro users (identified by an HMAC-signed HTTP-only cookie) bypass the limit entirely. No Redis, no in-memory maps that reset on deploy — the counts survive restarts because they live in SQLite.

Security Without Overthinking It

A middleware function sets the standard hardening headers:

func SecurityHeaders(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        w.Header().Set("X-Content-Type-Options", "nosniff")
        w.Header().Set("X-Frame-Options", "DENY")
        w.Header().Set("X-XSS-Protection", "1; mode=block")
        w.Header().Set("Referrer-Policy", "strict-origin-when-cross-origin")
        w.Header().Set("Content-Security-Policy",
            "default-src 'self'; script-src 'self' 'unsafe-inline'; ...")
        next.ServeHTTP(w, r)
    })
}
Enter fullscreen mode Exit fullscreen mode

Uploaded files get validated with magic bytes (%PDF- header check) and saved with UUID filenames to prevent path traversal. Files are deleted immediately after the response is sent. There is also a background cleanup job that sweeps stale uploads every hour, just in case.

Why Astro for the Frontend

Astro ships zero JavaScript by default. For a tool where the landing page is static marketing content and the actual app is one interactive page, that is perfect. The landing page loads fast (just HTML + CSS), and the interactive PDF uploader is a React island hydrated only on the app page.

The entire frontend builds to static files that the Go server serves directly — no separate Node process in production.

What I Would Do Differently

Start with better compression. pdfcpu's optimize is solid for structural cleanup but does not resample images. If I were starting over, I would evaluate whether Ghostscript via a subprocess (ugly but effective) is worth it for the compress feature specifically.

Skip the dual payment provider setup. I wired up both Stripe and LemonSqueezy. In practice, Stripe handles everything. The abstraction cost was not worth the optionality.

Add WebSocket progress for large files. Right now, large merges just hang with a spinner until the response comes back. A progress stream would make the UX feel faster even if the processing time is the same.

The Numbers

The whole thing runs on a single Hetzner CAX21 (ARM64, 4 vCPUs, 8GB RAM) that costs about 7 EUR/month — and that VPS also hosts two other products. Memory usage hovers around 30MB for the Go process. Deploys are rsync + docker build on the VPS, done in under a minute.

Try It

If you need to merge, split, or compress PDFs, give it a try at pdfcrush.dev — it is free, no signup required, and your files are deleted immediately after processing.

The source is part of a larger monorepo where I am experimenting with AI agents that build and manage products autonomously, but that is a story for another post.

Top comments (0)