In part 1 and 2 we have been building a very simple "Kitchen Sink" app to demonstrate how we can pragmatically deal with privacy, specifically when following the European Union's General Data Protection Regulation (GDPR). In this instalment, we'll try to tackle a rather vague GDPR requirement: privacy by design.
In part 2, we discussed six responsibilities processors of personal data have under GDPR (lawfulness, purpose limitation, data minimisation, accuracy, storage limitation, integrity & confidentiality). At the time, we skipped a seventh responsibility: accountability. Accountability is specified as both being responsible for adhering to the other six responsibilities, and being able to demonstrate compliance.
The law text proceeds to mention several measures to be taken. Many of these, such as processor contracts or data protection officers are either legal or organisational in nature, and are better discussed elsewhere. One of the measures, however, is of particular interest to developers: "data protection by design and by default".
Privacy by default is a useful concept - it means a person's data is protected without them having to actively do anything about that. In the context of GDPR, it's mostly redundant in my opinion, as it's defined as ensuring purpose limitation, data minimisation, etc.
As typical is for laws, the definition of privacy by design is rather fuzzy, however. Fortunately, one of the inspirations for GDPR on this topic is of use. The 7 principles of Privacy by Design (PbD), which dates back from the 1990s, specifies that "[privacy] is not bolted on as an add-on, after the fact." Let's see how to apply this principle for Kitchen Sink.
In increment 2 we started working on a feature that allows users to fill in their postal address so we can send them a hard copy of a photo of a kitchen sink. These addresses are personal data, and are subject to several protective principles under GDPR. We made an overview of these principles, how we've fulfilled them and how we check this fulfilment is still in place (ideally automatically). We ended up with this table:
|Lawfulness||Contractual obligation||Contract (foreign key)|
|Purpose limitation||Access limitation||Test ✅|
|Data minimisation||UPU suggested address fields||Test ✅|
|Accuracy||User input validation||Contract (column checks) ✅|
|Storage limitation||Auto removal||Test ✅|
|Storage limitation||Backup rollover||Test ✅|
|Integrity & confidentiality||HTTPS||Test ✅|
|Integrity & confidentiality||Secure components||Test ✅|
|Integrity & confidentiality||Strong authentication||—|
We'll continue with this approach and add "Privacy by Design" as a principle to the list. Now we'll have to see how we can fulfil it and check it:
|Privacy by design||TBD||TBD|
Doing something "by design" is one of these things that are easy to agree on - it sounds like a good thing - but hard to do. There are no clear steps or activities to perform to achieve the goal. The best PbD has to offer is a documentation standard called "Privacy by Design for Software Engineers". Unfortunately, it's mostly a list of documentation to be generated for each principle. Apart from (seemingly) not being very Agile, it's not really specific. It states rules like "documentation shall contain a privacy architecture", but doesn't go to say what that is and what it should look like.
Fortunately, others have been more concrete. ENISA, the EU's agency for network and information security, gives a very good privacy by design overview (the EU's agency for network and information security). Specifically, it provides one tool that I think is ideal for our purposes here: privacy design strategies.
These strategies have been devised by associate professor Hoekman, who also wrote an excellent booklet on them. In total, he identifies eight strategies for dealing with privacy when making a design. We can incorporate this into our checks above by making sure we've considered all eight strategies for the personal data in question.
For brevity's sake, I'll only go through the first four strategies (Minimize, Separate, Abstract and Hide). The other four are "process oriented strategies". While these are definitely valuable to consider, they are, well, more process oriented and they also overlap with GDPR requirements which are handled elsewhere in this series. That's not a reason to always skip these, since these principles can go beyond what's strictly required by law and provide additional approaches.
In part 2, we already discussed minimisation of personal data. We only store attributes we really need, and only for people who actually request a photo. But we could go further and consider if we could do it without storing personal data at all. I'm not sure how we could do this for letters, but it's possible in other situations. For example, if you're building an iOS app, it's possible to send push messages to the user without knowing their phone number, Apple ID, or any other personal data. Instead, you get a token that you use to send data to Apple, which then makes sure it ends up on the right phone.
Another angle would be to give people the option not to share their personal data and offer an alternative. Perhaps people want to pick up their photo in person at my place or, more likely, at a pick up point near them. Some delivery services offer that option, and it's worth considering giving users some choice in the matter. The impact of this doesn't end with privacy: it may be more costly to use such a service, it may scare away users that don't live near a pickup point, etc. As with all design decisions, we should briefly document the trade-off and the decision so we can honestly say we've considered the options here.
In an ideal case of the separation strategy, personal data would remain in control of the owner, and not end up at our servers at all. I don't see any realistic options for this for Kitchen Sink, but in the future initiatives like that of Tim Berners-Lee might allow for this.
But simpler separation options are available to us today. We could make sure the application part that retrieves the addresses from the database is separate from the web site. We could also ensure different (physical or virtual) servers are used to host the database and the web server. Both of these are reasonable steps to take.
Hoepman describes abstraction as "zooming out" on the data. We applied this strategy in the first part of the series, when we left out part of IP addresses to remove precise information while still allowing us to use it for traffic analysis.
For the address data we're collecting, such an approach doesn't work as we need it to be precise to send the photo. But if we wanted to collect statistics on where we've sent photos, we could use this approach and for example only keep the city or state/province.
We've taken some steps to hide the personal data already in Kitchen Sink. Access to the data is restricted, as only the person sending the envelopes is allowed to read data from the database. Furthermore, the address data is not linked to any other personal data (e.g. IP addresses).
Further hiding is still possible: we can choose to encrypt the data in the database, for example, but that seems of limited benefit while increasing technical complexity of the application.
With the above design considerations, we can say we've tried to apply privacy by design. Of course, we should present some evidence to illustrate our compliance. A simple link to the above decisions and considerations would be enough, for example if they're summarised in a design document. This can be a formal versioned document, a wiki, or just a plain text document with some contents.
This should give the following table:
|Privacy by design||Checklist & Documentation||
In the next part of the series, we'll continue our iteration as we still need to address the data owner's rights for our address data.
In the meanwhile, the two resources above are recommended for further reading:
- Privacy and Data Protection by Design – from policy to engineering by ENISA
- Privacy Design Strategies (The Little Blue Book) by Jaap-Henk Hoepman