TL;DR
Day 8 of building SwapFile.io (privacy-first image/PDF converter) in public. Yesterday I shipped a feature I needed myself: PDF Merge. Total time: ~7 hours from "do we have this?" to production. Stack: Go + Fiber + PostgreSQL + Next.js 16. The interesting bits are about what I didn't have to write.
Live: swapfile.io/en/tools/pdf-merge
The problem
Most online PDF mergers do something I didn't realize until I tested four of them: they rasterize your PDFs.
You upload a contract with selectable text, signed and form-filled. You merge it with an addendum. You download the result. The output is... a stack of images. Text is no longer searchable. Form fields are flat pixels. Bookmarks gone. File size often triples.
Smallpdf does this. iLovePDF does this. PDF24 does this in their default flow. The reason is the same for all three — they use server-side rendering tools that go through a raster pipeline because it's simpler and gives them more control over the output.
I wanted the opposite: take what I uploaded, glue them together at the structural level, give me back exactly what went in. Searchable text. Embedded fonts. Bookmarks. Form fields. All intact.
The unsexy answer: pdfunite
Poppler-utils ships with a binary called pdfunite. It's been around for over a decade. It does exactly one thing:
pdfunite in1.pdf in2.pdf in3.pdf out.pdf
It doesn't rasterize. It doesn't recompress. It opens each input as a PDF object graph, appends pages in order, fixes up the cross-reference table, writes the result. Output is byte-for-byte equivalent to the inputs for text/fonts/images.
Smallpdf could have used this. So could iLovePDF. They chose not to because their pipeline is built around server-rendered images for the preview/print path, and merging via pdfunite means having a second code path. For a solo founder, having one less code path is a feature, not a downside.
The Go wrapper
Bridging Go to pdfunite is 50 lines of stdlib subprocess code. Here's the core:
type PDFUnite struct {
binary string
timeout time.Duration
}
func NewPDFUnite(timeout time.Duration) (*PDFUnite, error) {
path, err := exec.LookPath("pdfunite")
if err != nil {
return nil, ErrPDFUniteMissing
}
return &PDFUnite{binary: path, timeout: timeout}, nil
}
func (p *PDFUnite) Merge(ctx context.Context, inputPaths []string, outPath string) (*MergeResult, error) {
if len(inputPaths) < 2 {
return nil, fmt.Errorf("need at least 2 input files")
}
args := append([]string{}, inputPaths...)
args = append(args, outPath)
cctx, cancel := context.WithTimeout(ctx, p.timeout)
defer cancel()
cmd := exec.CommandContext(cctx, p.binary, args...)
var stderr bytes.Buffer
cmd.Stderr = &stderr
start := time.Now()
if err := cmd.Run(); err != nil {
return nil, fmt.Errorf("pdfunite failed: %v — %s", err, stderr.String())
}
fi, _ := os.Stat(outPath)
return &MergeResult{
OutputPath: outPath,
OutputSize: fi.Size(),
Duration: time.Since(start),
}, nil
}
Fail-fast at startup if the binary isn't in PATH. Hard timeout via context (120 seconds, generous for a 20-file merge). Stderr captured so failures surface to the user instead of vanishing. Done.
The migration I didn't write
This is the bit I'm most pleased with.
My conversion_jobs table looks like this:
CREATE TABLE conversion_jobs (
id UUID PRIMARY KEY,
user_id UUID NULL,
source_format VARCHAR(16) NOT NULL,
target_format VARCHAR(16) NOT NULL,
source_size BIGINT NOT NULL,
output_size BIGINT NOT NULL,
duration_ms INTEGER NOT NULL,
output_path TEXT NOT NULL,
page_count INTEGER NOT NULL DEFAULT 1,
expires_at TIMESTAMPTZ NULL
);
For convert flows, source_format and target_format are things like 'jpg' and 'webp'. The first instinct for PDF merge was to add a kind enum column ('convert' | 'pdf_merge' | ...) and store input file paths in a new JSONB column.
I almost wrote that migration. Then I noticed: the existing schema already represents everything I need. For PDF merge:
source_format = 'pdf'-
target_format = 'pdf-merge'(synthetic — won't match any AllowedTarget, so it never accidentally hits the convert flow) -
page_count = Nwhere N is the number of input files (reuses the column meaningfully; "page count" of an N→1 merge = input count) - The existing 1-hour anon TTL applies
- The existing cleanup goroutine cleans up the output
- The existing per-IP anonymous quota counts merges as conversions
Zero migration. Zero new columns. Zero data model debt. The whole feature ships with a parallel MergePDFs service method and a new HTTP handler, both reusing the existing repository and storage paths.
This pattern — synthetic target format as a discriminator — extends cleanly to upcoming features (image → PDF will be source='image', target='pdf-create'; OCR will be target='pdf-ocr'). I get an enum-like discriminator without paying the migration tax.
The day 8 honesty section
Building in public means showing the unflattering numbers, so here are mine after one week:
- 15 unique visitors total (mostly me testing — self-traffic exclude was set up on day 6)
- 0 confirmed real conversions from external users
- 0 email signups (the form has been live for a week)
- 3 reactions, <25 views, 0 comments on the day-4 AVIF crosspost on DEV.to
Almost every solo-founder Twitter post conveniently leaves these numbers out. They're the reality of week 1. The funnel data on my dashboard says the product converts at 33% (visitor → upload → output), which sounds impressive — and is meaningless when 4 of those 5 visitors are me checking a deploy.
The bottleneck is distribution, not the product. I built PDF merge partially because I needed it, partially because "pdf merge" is a 600K/month search query and my privacy angle has a real wedge against Smallpdf et al. Whether that wedge is enough to grow past 2 visitors/day is the experiment of the next 4 weeks.
What's next
- Image → PDF (combine multiple images into one PDF) — same pdfunite-style pipeline, just with ImageMagick on the way in
- OCR via Tesseract — much harder; weeks of work
- Distribution: awesome-list PRs, this DEV.to post, then HackerNews "Show HN" once the feature has 2 weeks of stability
Try the merger here: swapfile.io/en/tools/pdf-merge
The code is closed-source today (might change post-monetization in November). Everything you see in this post is exactly how it runs in production — no oversimplification for the article.
Building SwapFile.io in public — privacy-first image and PDF tools. Files auto-delete in 1 hour, no Google Analytics, free for the first 6 months. Feedback welcome via reply or Twitter @swapfileio.
Top comments (0)