Custodia-Admin

Posted on Mar 27 • Originally published at app.custodia-privacy.com

GDPR for APIs: What Developers Need to Know About Privacy by Design

#gdpr #api #privacy #webdev

Most GDPR guides are written for marketers and business owners. They talk about cookie banners, privacy policies, and consent management. All useful — but none of it helps you when you're the developer responsible for an API that processes personal data.

This guide is for you. We'll cover what GDPR actually requires at the API level, how to implement privacy by design in concrete engineering terms, and the compliance traps that catch developers off guard.

APIs Are Data Processors (and Sometimes Controllers) Under GDPR

Before diving into implementation, get the legal framing right.

Under GDPR, the data controller decides why data is processed. The data processor handles data on behalf of a controller. If your API receives personal data from a client and processes it as instructed, you are likely a data processor. If your API decides what to do with personal data independently, you may be a controller — or a joint controller — for that processing.

Why does this matter for developers?

Because processors have direct obligations under GDPR. Article 28 requires a Data Processing Agreement (DPA) between controller and processor. Article 32 requires processors to implement appropriate technical security measures. Processors can also be fined directly for GDPR violations — not just the controllers.

If you're building an internal API used by your own product, your organisation is the controller and processor. If you're building an API offered to third-party customers, you're almost certainly a data processor processing data on behalf of many controllers.

Either way, how you design your API has compliance implications.

Privacy by Design at the API Level (Article 25)

Article 25 of GDPR requires data protection by design and by default. This is not a nice-to-have — it is a legal obligation. The principle means:

Privacy protections should be built into your system from the start, not bolted on later
By default, only the minimum necessary data should be processed

For API developers, this translates to a series of concrete design decisions.

Data Minimisation in API Responses

One of the most common GDPR violations at the API level is returning more personal data than the caller actually needs.

Consider a /users/{id} endpoint. The database record might contain: name, email, phone, address, date of birth, hashed password, payment method tokens, internal risk scores, account creation IP address, and marketing preferences.

A frontend display screen might only need: name, email, and subscription tier.

Returning everything when only a subset is needed violates the data minimisation principle. It also creates unnecessary exposure — data that travels across the network can be intercepted, logged, cached, or accidentally displayed.

The fix: Design your API responses to include only the fields required for the stated purpose of each endpoint. If you have multiple consumers with different needs, use field projection (e.g., ?fields=name,email) or create separate endpoint resources tailored to each use case.

Ask of every response field: does the caller actually need this? If not, don't include it.

Purpose Limitation in API Design

Article 5(1)(b) requires that data collected for one purpose is not used for another incompatible purpose. In API design, purpose limitation shows up in how you structure your endpoints and what access you grant.

If you have an analytics endpoint that aggregates usage statistics, it should not return individual user records — even if the underlying query could do so. The purpose of an analytics endpoint is aggregation, not individual user lookup.

Similarly, an endpoint built for customer support lookups should not be accessible to marketing automation systems pulling contact lists. Even if the data is the same, the purpose is different.

Practical implementation:

Design endpoints around specific use cases and purposes, not just data entities
Use scoped API tokens that are limited to specific endpoints
Document the stated purpose of each endpoint and enforce it at the access control layer

Access Control and Authentication as Privacy Controls

Authentication and authorisation are not just security concerns — they are privacy controls.

Under GDPR, you must ensure that personal data is only accessible to those with a legitimate need. Inadequate access control is one of the most common causes of personal data breaches. And under GDPR, a breach that results from foreseeable, preventable technical failure will be judged harshly by supervisory authorities.

What good API access control looks like from a privacy perspective:

Scoped tokens over broad credentials. An API key that can access every endpoint is a single point of failure. Issue scoped tokens that can only access what they need.
User-level vs. admin-level separation. An authenticated user should be able to access their own data — not other users' data. Enforce object-level authorisation on every request, not just at the login gate.
Short-lived tokens. Long-lived tokens create long windows of exposure if compromised. Use short-lived access tokens with refresh token rotation.
Audit who accessed what. Log every access to personal data with timestamp, token/identity, endpoint, and record identifier. This supports GDPR accountability obligations and makes breach investigations tractable.

Logging and Audit Trails That Contain Personal Data

Logs are one of the most overlooked privacy risks in API development.

Developers add verbose logging for debugging. That logging often captures request headers, request bodies, query parameters, and response payloads. And those logs often contain personal data: email addresses in query strings, user IDs in paths, names and addresses in request bodies, authentication tokens in headers.

This creates several GDPR problems:

Logs become unintended personal data stores. If you haven't documented that your application logs constitute a personal data repository, you may be missing them from your Records of Processing Activities (ROPA).
Retention periods are often undefined. Logs frequently accumulate indefinitely because no one set a retention policy. GDPR requires purpose-appropriate retention periods.
Access is often broad. Logs in a centralised logging system may be accessible to many engineers and services, far beyond what is necessary.

What to do:

Scrub personal data from logs before writing them. Replace email addresses, names, and other personal identifiers with pseudonymous identifiers (e.g., user ID hash) where possible.
Never log authentication credentials, session tokens, or payment details.
Set retention policies on your logging infrastructure and enforce them.
Include your logging systems in your data mapping and ROPA documentation.

Webhook Payloads and Third-Party Data Exposure

Webhooks are a common pattern for real-time event delivery — your API sends a POST request to a customer-supplied URL when something happens. But webhooks routinely carry personal data, and developers often don't think through the privacy implications.

When you send a webhook payload containing user data to a customer's endpoint:

You are transmitting personal data to a third party
You need a lawful basis and appropriate contractual protections
The customer's endpoint may not be secure (expired TLS certificates, logging raw payloads, storing everything in a database without thought)

Privacy-first webhook design:

Include only the minimum necessary data in webhook payloads. Consider sending an event notification with an ID rather than the full data object — the receiver can then fetch what they need with a scoped API call.
Always use HTTPS. Reject non-HTTPS webhook URLs.
Sign webhook payloads with HMAC signatures so receivers can verify authenticity.
Document what personal data appears in each webhook event type in your developer documentation and privacy policy.

API Versioning and Data You Keep in Old Versions

API versioning creates a privacy edge case that developers rarely consider: old API versions may continue to operate — and may handle personal data differently from your current, more privacy-conscious version.

If you added data minimisation to v2 but v1 is still returning full user objects, you have a problem. If you removed a field containing health data from v3 but v1 and v2 still return it, users who consented to limited processing may have their data exposed through legacy endpoints.

The principle: Privacy improvements should cascade backwards where feasible, not just apply to new API versions. When you deprecate an API version, ensure it is actually turned off — not just "officially deprecated" while still serving traffic.

At minimum, document the personal data returned by each API version and ensure that the deprecation process includes a privacy review: what personal data will stop being exposed when this version is retired?

DSAR Automation via API

The Data Subject Access Request (DSAR) — Article 15 of GDPR — gives individuals the right to obtain a copy of all personal data you hold about them. The Right to Erasure (Article 17) lets them request deletion.

Most companies handle DSARs manually. But if you have the right API architecture, you can automate significant portions of DSAR handling.

Building DSAR-capable API endpoints:

For data export (Article 15):

Create an endpoint that aggregates all personal data for a given user across all services
Return data in a machine-readable format (JSON, CSV)
Include data from all sub-systems: profile data, logs (appropriately filtered), transaction records, audit trails, consent records

For deletion (Article 17):

Create a deletion endpoint that removes or anonymises personal data across all linked systems
Be careful about what you must retain for legitimate purposes (legal obligations, fraud prevention, accounting records) — deletion does not always mean deleting everything
Return a deletion confirmation with timestamp for your records

Practical tip: Build DSAR endpoints as a first-class concern, not as an afterthought. If you find yourself unable to enumerate all personal data for a user programmatically, that is a sign your data architecture needs attention from a privacy perspective. Running a scan of your website and infrastructure can surface data flows you may have missed — try a free scan at Custodia.

Rate Limiting, Abuse Prevention, and IP Logging

Rate limiting is a standard API security practice. But rate limiting commonly involves logging IP addresses — and IP addresses are personal data under GDPR.

This creates a tension: you need IP address data to detect and block abuse, but you need a lawful basis to retain it, and you should only keep it as long as necessary.

Getting this right:

Legitimate interest is generally an appropriate lawful basis for rate limiting and abuse prevention, provided you conduct and document a legitimate interest assessment.
Set a short, defined retention period for IP-based rate limiting data — typically hours or days, not months.
If you log full IP addresses for longer periods, consider truncating the last octet for IPv4 (and the last 80 bits for IPv6) to reduce identifiability while retaining enough for geolocation and abuse pattern analysis.
Document IP logging in your privacy policy and ROPA.

Encryption in Transit and at Rest

Article 32 of GDPR requires technical measures appropriate to the risk. Encryption is explicitly mentioned as an example of an appropriate measure.

For API developers:

In transit:

TLS 1.2 minimum, TLS 1.3 preferred. Disable older versions (TLS 1.0, 1.1, SSL 3.0).
Use HSTS (HTTP Strict Transport Security) to prevent downgrade attacks.
Validate TLS certificates on any outbound API calls you make — don't accept self-signed certificates in production without pinning.

At rest:

Encrypt the data store. Most managed database services provide encryption at rest by default — verify that it is enabled.
Encrypt particularly sensitive fields at the application layer in addition to disk-level encryption. If someone compromises your database credentials, field-level encryption for things like payment tokens, health data, or social security numbers provides an additional layer of protection.
Manage encryption keys carefully — separate key management from data storage.

Handling Personal Data in Error Messages and Logs

Error messages are a persistent source of personal data leakage.

Classic examples: an error response that includes the full database query (which contained a user's email in the WHERE clause), a stack trace that dumps a user object to the log, a validation error that echoes back a submitted form field value (including the personal data in it).

The rules:

Never include personal data in error messages returned to clients. Return a generic error message and an error ID. Log the detail server-side against that error ID for debugging.
Never include personal data in stack traces that get logged. Ensure your exception handling strips or masks personal data before logging.
Use structured logging and scrub personal fields. If your log format is structured (JSON), you can apply field-level scrubbing rules before writing to your log store.
Test this. Add error condition testing to your test suite that validates error responses do not contain personal data fields from your test dataset.

A Practical Privacy Review Checklist for API Developers

Before shipping or reviewing an API:

Does each endpoint return only the data the caller actually needs?
Are endpoints scoped by purpose, with access tokens that enforce those scopes?
Is object-level authorisation enforced (not just authentication)?
Are logs scrubbed of personal data before writing?
Is there a defined retention period for all log data?
Are webhook payloads minimised, signed, and HTTPS-only?
Are old API versions scheduled for genuine deprecation?
Is there a programmatic way to export all personal data for a user?
Is there a programmatic way to delete or anonymise personal data for a user?
Are IP addresses retained for the shortest time necessary for their purpose?
Is TLS 1.2+ enforced on all endpoints?
Do error responses avoid echoing personal data back to clients?
Are all API data flows documented in your ROPA?

Check Your Site's Data Flows

If you're not sure what personal data your APIs and website are actually processing and exposing, a good starting point is a technical scan.

Scan your website free at Custodia — get a report on trackers, personal data flows, and compliance gaps in 60 seconds, no signup required. It won't replace a full API privacy review, but it gives you a concrete starting point and often surfaces third-party integrations and data flows that weren't documented anywhere.

This guide provides general information about GDPR requirements relevant to API development. It does not constitute legal advice. Privacy requirements vary based on the nature of processing activities, data types, and applicable jurisdictions. Consult a qualified privacy professional for advice specific to your situation.

DEV Community