Privacy by design (Pragmatic Privacy for Programmers, Part 3)

#privacy #security #design #architecture

In part 1 and 2 we have been building a very simple "Kitchen Sink" app to demonstrate how we can pragmatically deal with privacy, specifically when following the European Union's General Data Protection Regulation (GDPR). In this instalment, we'll try to tackle a rather vague GDPR requirement: privacy by design.

A bit of theory again

In part 2, we discussed six responsibilities processors of personal data have under GDPR (lawfulness, purpose limitation, data minimisation, accuracy, storage limitation, integrity & confidentiality). At the time, we skipped a seventh responsibility: accountability. Accountability is specified as both being responsible for adhering to the other six responsibilities, and being able to demonstrate compliance.

The law text proceeds to mention several measures to be taken. Many of these, such as processor contracts or data protection officers are either legal or organisational in nature, and are better discussed elsewhere. One of the measures, however, is of particular interest to developers: "data protection by design and by default".

Privacy by default is a useful concept - it means a person's data is protected without them having to actively do anything about that. In the context of GDPR, it's mostly redundant in my opinion, as it's defined as ensuring purpose limitation, data minimisation, etc.

As typical is for laws, the definition of privacy by design is rather fuzzy, however. Fortunately, one of the inspirations for GDPR on this topic is of use. The 7 principles of Privacy by Design (PbD), which dates back from the 1990s, specifies that "[privacy] is not bolted on as an add-on, after the fact." Let's see how to apply this principle for Kitchen Sink.

Increment 2 (continued)

In increment 2 we started working on a feature that allows users to fill in their postal address so we can send them a hard copy of a photo of a kitchen sink. These addresses are personal data, and are subject to several protective principles under GDPR. We made an overview of these principles, how we've fulfilled them and how we check this fulfilment is still in place (ideally automatically). We ended up with this table:

Principle	Fulfilment	Check
Lawfulness	Contractual obligation	Contract (foreign key)
Purpose limitation	Access limitation	Test ✅
Data minimisation	UPU suggested address fields	Test ✅
Accuracy	User input validation	Contract (column checks) ✅
Storage limitation	Auto removal	Test ✅
Storage limitation	Backup rollover	Test ✅
Integrity & confidentiality	HTTPS	Test ✅
Integrity & confidentiality	Secure components	Test ✅
Integrity & confidentiality	Strong authentication	—

We'll continue with this approach and add "Privacy by Design" as a principle to the list. Now we'll have to see how we can fulfil it and check it:

Principle	Fulfilment	Check
Privacy by design	TBD	TBD

Fulfilment

Doing something "by design" is one of these things that are easy to agree on - it sounds like a good thing - but hard to do. There are no clear steps or activities to perform to achieve the goal. The best PbD has to offer is a documentation standard called "Privacy by Design for Software Engineers". Unfortunately, it's mostly a list of documentation to be generated for each principle. Apart from (seemingly) not being very Agile, it's not really specific. It states rules like "documentation shall contain a privacy architecture", but doesn't go to say what that is and what it should look like.

Fortunately, others have been more concrete. ENISA, the EU's agency for network and information security, gives a very good privacy by design overview (the EU's agency for network and information security). Specifically, it provides one tool that I think is ideal for our purposes here: privacy design strategies.

These strategies have been devised by associate professor Hoekman, who also wrote an excellent booklet on them. In total, he identifies eight strategies for dealing with privacy when making a design. We can incorporate this into our checks above by making sure we've considered all eight strategies for the personal data in question.

For brevity's sake, I'll only go through the first four strategies (Minimize, Separate, Abstract and Hide). The other four are "process oriented strategies". While these are definitely valuable to consider, they are, well, more process oriented and they also overlap with GDPR requirements which are handled elsewhere in this series. That's not a reason to always skip these, since these principles can go beyond what's strictly required by law and provide additional approaches.

Strategy 1: Minimize

In part 2, we already discussed minimisation of personal data. We only store attributes we really need, and only for people who actually request a photo. But we could go further and consider if we could do it without storing personal data at all. I'm not sure how we could do this for letters, but it's possible in other situations. For example, if you're building an iOS app, it's possible to send push messages to the user without knowing their phone number, Apple ID, or any other personal data. Instead, you get a token that you use to send data to Apple, which then makes sure it ends up on the right phone.

Another angle would be to give people the option not to share their personal data and offer an alternative. Perhaps people want to pick up their photo in person at my place or, more likely, at a pick up point near them. Some delivery services offer that option, and it's worth considering giving users some choice in the matter. The impact of this doesn't end with privacy: it may be more costly to use such a service, it may scare away users that don't live near a pickup point, etc. As with all design decisions, we should briefly document the trade-off and the decision so we can honestly say we've considered the options here.

Strategy 2: Separate

In an ideal case of the separation strategy, personal data would remain in control of the owner, and not end up at our servers at all. I don't see any realistic options for this for Kitchen Sink, but in the future initiatives like that of Tim Berners-Lee might allow for this.

But simpler separation options are available to us today. We could make sure the application part that retrieves the addresses from the database is separate from the web site. We could also ensure different (physical or virtual) servers are used to host the database and the web server. Both of these are reasonable steps to take.

Strategy 3: Abstract

Hoepman describes abstraction as "zooming out" on the data. We applied this strategy in the first part of the series, when we left out part of IP addresses to remove precise information while still allowing us to use it for traffic analysis.

For the address data we're collecting, such an approach doesn't work as we need it to be precise to send the photo. But if we wanted to collect statistics on where we've sent photos, we could use this approach and for example only keep the city or state/province.

Strategy 4: Hide

We've taken some steps to hide the personal data already in Kitchen Sink. Access to the data is restricted, as only the person sending the envelopes is allowed to read data from the database. Furthermore, the address data is not linked to any other personal data (e.g. IP addresses).

Further hiding is still possible: we can choose to encrypt the data in the database, for example, but that seems of limited benefit while increasing technical complexity of the application.

Conclusion

With the above design considerations, we can say we've tried to apply privacy by design. Of course, we should present some evidence to illustrate our compliance. A simple link to the above decisions and considerations would be enough, for example if they're summarised in a design document. This can be a formal versioned document, a wiki, or just a plain text document with some contents.

This should give the following table: