Custodia-Admin

Posted on Mar 27

GDPR for Developers: The Technical Compliance Guide Every Engineer Needs

#gdpr #privacy #webdev #security

GDPR for Developers: The Technical Compliance Guide Every Engineer Needs

GDPR is not just a legal problem. It is a technical problem. Most data protection failures stem from architectural decisions made in sprint planning, not legal strategy sessions. As the engineer writing the code, you are on the front line of compliance — whether your company has a DPO or not.

This guide covers what GDPR means for the code you write: schema design, API contracts, logging practices, third-party dependencies, and the workflows you need to handle data subject requests programmatically.

Privacy by Design: Article 25 in Practice

Article 25 of GDPR mandates "data protection by design and by default." This is not a vague aspiration — it has direct engineering implications.

Data minimisation by default. Your system should collect only what it needs. If your signup form asks for a phone number but your application never uses it, remove the field. Every column in your database that stores personal data is a compliance obligation. Treat PII like a dependency — only add it if you have a clear reason.

Least privilege access. Not every service needs access to every table. Your email sending microservice does not need read access to your payments table. Scope your database users and API tokens accordingly.

Retention limits built into the schema. If you retain user data for longer than necessary, you are in breach. Add a scheduled_deletion_at column to user records. Build a background job that hard-deletes or anonymises records past their retention window. Do not leave this as a manual process.

Default to private. New features should ship with the most privacy-preserving settings as the default. Users can opt into sharing more data; they should not have to opt out of sharing data they never agreed to share.

Implementing Data Deletion: Soft Delete vs Hard Delete

When a user requests erasure (the "right to be forgotten" under Article 17), you need a plan. The common pattern of soft deletes — setting deleted_at = NOW() and filtering it out of queries — does not satisfy GDPR erasure requirements unless you combine it with actual data removal.

Soft delete alone is not enough. A soft-deleted row still contains the user's email, name, and any other PII. It is still personal data. If you keep soft-deleted records indefinitely, you are retaining data beyond its purpose.

A compliant deletion flow looks like this:

-- Step 1: Anonymise the record immediately on erasure request
UPDATE users
SET
  email = CONCAT('deleted_', id, '@redacted.invalid'),
  name = 'Deleted User',
  phone = NULL,
  ip_address = NULL,
  deleted_at = NOW(),
  erasure_requested_at = NOW()
WHERE id = :user_id;

-- Step 2: Schedule hard delete after your retention window
-- (run via a background job)
DELETE FROM users WHERE deleted_at < NOW() - INTERVAL '30 days';

Cascade deletes matter. When you delete a user, you must also delete their associated records: orders, comments, support tickets, audit logs, analytics events. Use foreign key constraints with ON DELETE CASCADE where appropriate, or build an explicit deletion orchestrator that walks all tables.

Watch out for backups. GDPR acknowledges that data in backups is harder to delete immediately. Document your backup retention policy and ensure that backups are automatically rotated within a reasonable window. Your deletion obligation is met once the data is gone from live systems and any backups expire.

Anonymisation vs Pseudonymisation in Your Schema

These two terms have specific legal meanings under GDPR, and mixing them up has compliance consequences.

Anonymisation means the data can no longer be linked back to an individual — even with additional information. Truly anonymised data falls outside GDPR entirely. In practice, genuine anonymisation is hard. Aggregated analytics data, k-anonymised datasets, and summary statistics can qualify.

Pseudonymisation means replacing direct identifiers with a key (a pseudonym), keeping the key in a separate system. The data is still personal data under GDPR — but pseudonymisation is a risk-reduction technique that regulators look favourably upon.

A practical schema pattern for pseudonymisation:

-- users table: holds the mapping
CREATE TABLE users (
  id UUID PRIMARY KEY,
  pseudonym UUID DEFAULT gen_random_uuid(), -- stored separately
  email TEXT NOT NULL,
  ...
);

-- events table: references pseudonym, not id
CREATE TABLE analytics_events (
  id BIGSERIAL PRIMARY KEY,
  user_pseudonym UUID NOT NULL, -- no FK to users
  event_type TEXT NOT NULL,
  created_at TIMESTAMPTZ NOT NULL
);

If you delete the users row, the analytics_events rows become effectively anonymised — there is no remaining link. This is a defensible architecture for retaining aggregate analytics while honouring erasure requests.

Encryption at Rest and in Transit

GDPR Article 32 requires "appropriate technical measures" including encryption. Here is what that looks like in practice:

In transit: TLS 1.2 minimum, TLS 1.3 preferred. This applies to all connections: client-to-server, service-to-service, server-to-database. If your internal microservices communicate over HTTP, fix that.

At rest: Encrypt your database volumes. Most cloud providers (AWS RDS, GCP Cloud SQL, Azure Database) enable encryption at rest by default — but verify it is enabled and understand who holds the keys. For high-sensitivity fields (Social Security numbers, financial data, health information), consider column-level encryption:

from cryptography.fernet import Fernet

# Store ENCRYPTION_KEY in your secrets manager, not in code
key = settings.FIELD_ENCRYPTION_KEY
f = Fernet(key)

# Before storing
encrypted_value = f.encrypt(sensitive_data.encode()).decode()

# After reading
decrypted_value = f.decrypt(encrypted_value.encode()).decode()

Key management: Never hardcode encryption keys. Use a secrets manager (AWS Secrets Manager, HashiCorp Vault, GCP Secret Manager). Rotate keys on a schedule and ensure your application can re-encrypt data after a key rotation.

Logging and Audit Trails: Don't Log PII

Your application logs are probably full of personal data, and you may not have noticed. Stack traces include email addresses. Request logs capture IP addresses and query strings. Error messages echo back form inputs.

What not to log:

Email addresses and usernames in plain text
IP addresses (these are personal data under GDPR)
Full request URLs with query parameters that include user data
Session tokens or authentication cookies
Any data from sensitive form fields (passwords, obviously, but also DOB, health info)

What to log instead:

User IDs (opaque identifiers, not PII themselves)
Request IDs for tracing
Action types, not action payloads
Error codes, not error messages containing user input

# Bad — logs PII
logger.info(f"User login attempt: {email}")
logger.error(f"Payment failed for {user.email}: {error}")

# Good — logs identifiers
logger.info(f"User login attempt: user_id={user_id}")
logger.error(f"Payment failed: user_id={user_id} error_code={error.code}")

Audit logs for compliance actions are different. You need an append-only audit trail of data subject requests (DSARs, erasure requests, consent changes). This should be a separate, protected log store — not your application debug logs. Include: timestamp, request type, user_id, action taken, operator_id.

Right to Data Portability: Building Export Endpoints

Article 20 gives users the right to receive their personal data in a "structured, commonly used and machine-readable format." This means you need an export endpoint.

A minimal DSAR export endpoint:

// GET /api/user/export
export async function GET(req: Request) {
  const userId = await getUserIdFromSession(req);

  const [profile, orders, preferences, consentHistory] = await Promise.all([
    db.users.findUnique({ where: { id: userId } }),
    db.orders.findMany({ where: { userId } }),
    db.preferences.findMany({ where: { userId } }),
    db.consentLogs.findMany({ where: { userId }, orderBy: { createdAt: 'desc' } }),
  ]);

  const export_data = {
    exported_at: new Date().toISOString(),
    profile: sanitiseForExport(profile),
    orders,
    preferences,
    consent_history: consentHistory,
  };

  return new Response(JSON.stringify(export_data, null, 2), {
    headers: {
      'Content-Type': 'application/json',
      'Content-Disposition': 'attachment; filename="your-data.json"',
    },
  });
}

Make sure you cover every table that holds data associated with the user. A common mistake is exporting the users table but forgetting analytics_events, support_tickets, newsletter_subscriptions, and anything else that links back to the user ID.

Handling DSARs Programmatically

A Data Subject Access Request is a user asking: "What data do you hold about me?" You have one calendar month to respond. If your data is spread across 20 tables and 4 microservices, doing this manually is not sustainable.

Build a DSAR search utility:

# tables that may contain user data
DSAR_TABLES = [
    ('users', 'id'),
    ('orders', 'user_id'),
    ('comments', 'user_id'),
    ('support_tickets', 'requester_id'),
    ('newsletter_subscriptions', 'user_id'),
    ('analytics_events', 'user_pseudonym'),  # via pseudonym mapping
    ('audit_logs', 'actor_id'),
    ('consent_logs', 'user_id'),
]

def generate_dsar_report(user_id: str) -> dict:
    report = {}
    for table, column in DSAR_TABLES:
        rows = db.execute(
            f"SELECT * FROM {table} WHERE {column} = %s",
            [user_id]
        ).fetchall()
        if rows:
            report[table] = [dict(row) for row in rows]
    return report

Keep this list maintained. Every time a new feature adds a table that stores user data, add it to the DSAR search. Code review should include a check: "Does this PR create a new table with PII? Is it in the DSAR query?"

API Design for Consent

Consent under GDPR must be freely given, specific, informed, and unambiguous. For developers, this means your consent storage needs to be granular and auditable.

What to store:

CREATE TABLE consent_logs (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id UUID NOT NULL REFERENCES users(id),
  consent_type TEXT NOT NULL,      -- 'marketing', 'analytics', 'functional'
  granted BOOLEAN NOT NULL,
  consent_version TEXT NOT NULL,   -- version of your privacy policy
  ip_address TEXT,                 -- for audit purposes
  user_agent TEXT,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

Never overwrite consent records. Always insert a new row. This gives you a full history: what consent was given, when, under which version of your policy. If your privacy policy changes in a material way, you need to re-obtain consent — and you need to know which users consented under the old policy.

Expose consent state via API:

// GET /api/consent
// Returns current consent state for the authenticated user

// POST /api/consent
// Body: { marketing: true, analytics: false }
// Inserts a new consent_logs row for each changed preference

Session Data and GDPR

HTTP sessions often contain personal data — user IDs, names, email addresses, shopping cart contents. Apply the same data minimisation principles:

Store only the user ID in the session, not full user objects. Fetch the profile from the database when you need it.
Set appropriate session expiry. Indefinitely persistent sessions are a GDPR risk.
Ensure session stores (Redis, database sessions) are encrypted.
On logout, invalidate the session server-side — do not rely on the client deleting a cookie.
On erasure request, delete all active sessions for the user.

Third-Party SDKs as Data Processors

When you npm install an analytics library, a chat widget, or an error monitoring SDK, you are instructing that vendor to process personal data on your behalf. Under GDPR, they are a data processor and you are the data controller. You are legally responsible for ensuring they comply.

What this means in practice:

Inventory your dependencies. Every third-party SDK that touches user data needs a Data Processing Agreement (DPA) with the vendor. Check that Segment, Datadog, Sentry, Intercom, and any other tools you use have signed DPAs in place.
Consent before load. SDKs that set cookies or send data to third-party servers must not load until the user has consented. Do not load your analytics SDK in _app.tsx unconditionally. Load it conditionally based on consent state:

if (consentState.analytics) {
  loadAnalyticsSDK();
}

Review what each SDK collects. Check the vendor's documentation. Some SDKs capture full request URLs (which may contain query params with PII), user-agent strings, and IP addresses. Configure them to minimise collection where possible.
Server-side tagging reduces risk. Instead of loading vendor tags in the browser, use server-side tag management. You control what data leaves your server. You can strip PII before forwarding events.

Developer GDPR Compliance Checklist

Use this checklist in code review and sprint planning:

Data collection

[ ] Only collecting data with a documented legal basis
[ ] Signup/contact forms request only fields the application actually uses
[ ] Consent obtained before loading non-essential third-party scripts

Schema and storage

[ ] PII columns identified and documented in data map
[ ] Sensitive fields encrypted at column level where required
[ ] Database volumes encrypted at rest
[ ] Retention periods defined; automated deletion jobs in place

Deletion and erasure

[ ] Erasure flow anonymises then hard-deletes records
[ ] All related tables included in cascade delete logic
[ ] DSAR search covers every table containing user data
[ ] Active sessions terminated on erasure request

Logging

[ ] Application logs contain no PII (email, name, IP in plain text)
[ ] Compliance audit log (DSAR actions, consent changes) maintained separately

APIs and exports

[ ] Data portability export endpoint returns all user data
[ ] Consent stored with timestamp, version, and type granularity
[ ] Consent history append-only (no overwrites)

Third parties

[ ] DPAs in place with all data processors
[ ] Third-party SDKs loaded conditionally on consent
[ ] Server-side tagging evaluated where possible

Access control

[ ] Database users scoped to least privilege
[ ] Internal services use service accounts, not shared credentials
[ ] Encryption keys stored in secrets manager, rotated regularly

Automate the Parts You Can

Manually scanning your website for third-party trackers, checking consent coverage, and auditing what each cookie does is tedious and easy to get wrong. Custodia scans your website automatically — identifying every tracker, cookie, and third-party script, checking your consent banner configuration, and flagging compliance gaps.

Run a free scan at app.custodia-privacy.com/scan. No signup required. Results in 60 seconds.

This article provides general technical guidance on GDPR implementation patterns. It does not constitute legal advice. Your specific obligations depend on your jurisdiction, the nature of your data processing, and your business model. Consult a qualified data protection lawyer for advice tailored to your situation.

DEV Community

GDPR for Developers: The Technical Compliance Guide Every Engineer Needs

GDPR for Developers: The Technical Compliance Guide Every Engineer Needs

Privacy by Design: Article 25 in Practice

Implementing Data Deletion: Soft Delete vs Hard Delete

Anonymisation vs Pseudonymisation in Your Schema

Encryption at Rest and in Transit

Logging and Audit Trails: Don't Log PII

Right to Data Portability: Building Export Endpoints

Handling DSARs Programmatically

API Design for Consent

Session Data and GDPR

Third-Party SDKs as Data Processors

Developer GDPR Compliance Checklist

Automate the Parts You Can

Top comments (0)