Generating PDF's without breaking GDPR

#privacy #eu #pdf #saas

Generating GDPR-Compliant PDFs in Python Without the Compliance Headache

If you're building software for legal, financial, or healthcare customers in Europe, PDF generation is a minefield. Every time you render an invoice, a contract, or a medical summary, you're potentially passing personal data through a third-party service. Where does it go? Who can access it? Does it leave the EU?

Most PDF APIs were designed for simplicity, not compliance. They store your rendered documents on US infrastructure, lack a formal Data Processing Agreement, and put you in the position of hoping they're GDPR-clean. When a customer asks "where does my data go?", "we use a US SaaS vendor" is not the answer they want.

This post covers how to drop a GDPR-compliant PDF rendering pipeline into an existing Python project in under ten minutes, using pdfserve.eu; an API built specifically around the constraint that document content must never be stored.

Why Most PDF APIs Fail GDPR

The problem is architectural. A typical PDF API works like this:

You POST your HTML (which may contain names, addresses, account numbers)
The API renders it and stores the PDF
You GET the PDF from a storage URL

Step 2 is the problem. The PDF (potentially containing personal details) written to disk somewhere. If that somewhere is AWS in Virginia, you've transferred personal data outside the EU without an Article 46 mechanism. Even with Standard Contractual Clauses, you need a Data Transfer Impact Assessment, and you're still relying on the provider's interpretation of their own obligations.

That's before we even get into the CLOUD act (The US government can order any US company to hand over whatever they want, regardless of where in the world that is! Even eu-west-1 is not safe!)

The cleaner solution is a stateless renderer: the data comes in, the PDF comes out, nothing is written anywhere.

How pdfserve.eu Works

pdfserve.eu is a single-endpoint API hosted on Scaleway in Paris. You send HTML, you get a PDF. The entire pipeline runs in process memory. The output is streamed directly back in the HTTP response body.

The only thing stored is the timestamp, size of the content, and an anonymous internal user ID. The content you render lives in memory, and it dies in memory.

The DPA (available at pdfserve.eu/dpa) formalises this in contractual language:

"The Processor does not persist customer-submitted template content, fetched URL bodies, or generated PDF output to any storage medium beyond process memory for the duration of the request."

Every sub-processor in the data path is EU-controlled. There are no US entities involved — not for compute, storage, database, or DNS. The full list is in the DPA.

What is retained is a content-free audit trail: timestamps, status codes, output sizes, and render durations. No HTML, no PDF bytes. This gives you a render history without touching document content.

Try it out for free today at pdfserve we take HTML and turn it into PDFs, check out our ever growing site today :)