Jeroen Heijmans for Rijks ICT Gilde

Posted on Feb 19, 2019

Pragmatic Privacy for Programmers (Part 2)

#privacy #security #agile

In part 1 of this series, we started with a simple approach to handling (GDPR) privacy concerns in an imaginary Kitchen Sink app.

We developed a simple checklist for each feature to see if we were compliant with all privacy requirements. Even for a simple static site, there was a privacy issue which resulted in this tiny compliance table:

Information Related To Person	Person identifiable?	Depersonalized?	Duties fulfilled?	Owner Rights supported?
Website visit	No	Yes (Masking) ✅	NA	NA

In this second part, we'll do a second increment on Kitchen Sink.

Another bit of theory

One of the key elements of privacy (as given in part 1) is that processors* of personal data have certain duties to fulfil. If we actually want to fulfil these duties, it's time to know what they are.

(*) GDPR actually distinguishes between responsibilties of processors and controllers, but we lump these responsibilities together here.

Once again following the General Data Protection Regulation (GDPR), it outlines six principles related to the processing of personal data:

Lawfulness, fairness and transparency: have a valid reason for processing personal data, use personal data in fair and expected ways, and communicate clearly and openly about this
Purpose limitation: only use personal data for the stated purpose
Data minimisation: only collect the minimum amount of personal data needed
Accuracy: try to ensure personal data is correct
Storage limitation: keep the personal data only as long as is needed
Integrity & confidentiality: safeguard personal data from unauthorised reading or modification

Being principles, these are not ready to go duties, but they do help understanding and grouping the actual duties as specified by GDPR. They also make a good list to use when working on our app.

Those of you who followed the above link to GDPR may have noticed there's actually a seventh duty: accountability. Since GDPR lists it separately, and it's a big topic, we'll treat it in a separate article.

Increment 2: sending pictures of kitchen sinks

In order to make our static Kitchen Sink app a bit more "exciting", we're going to add a new feature: visitors to the site can order the picture of the Kitchen Sink, and we'll send them a hard copy in the mail. To be able to do so, they only have to provide their address information. Since we don't expect lots of people wanting a kitchen sink photograph, we'll be handling the mailing manually and free of charge.

Clearly, address information relates to a person, and it's certainly possible to identify a person based on an address. So we'll add it to our list below:

Information Related To Person	Person identifiable?	Depersonalized?	Duties fulfilled?	Owner Rights supported?
Website visit	No	Yes (Masking) ✅	NA	NA
Address	Yes	N/A	TBD	TBD

Since we have six duties, we'll fill the field above with another (small) table:

Principle	Fulfillment
Lawfulness	TBD
Purpose limitation	TBD
Data minimisation	TBD
Accuracy	TBD
Storage limitation	TBD
Integrity & confidentiality	TBD

Lawfulness, fairness and transparency

The first principle is really more a group of principles. The most important one is lawfulness: do we have a valid reason for processing personal data? In our case, there's a clear functional need: otherwise we cannot mail the photograph! Under GDPR, there are six valid reasons for processing personal data, and one of them is contractual obligation. While our contract is informal, it still applies here.

This means we should be able to show that every address we store is linked to a contract. A simple way to do that - assuming we have a relational database - is to create a one-to-one relationship between the contracts and addresses tables, and enforce it using a foreign key constraint. This ensures that no address is stored without a corresponding contract, and that if the contract is removed, so is the address. (If we want to use NoSQL or store the address data in the contracts table, we'll have to write some tests to checks ourselves.) Our table now looks like this:

Principle	Fulfillment	Check
Lawfulness	Contractual obligation	Contract (foreign key)

You might have noticed that we're missing the other two parts of this principle: fairness and transparency. Both are concepts not directly related to duties that we can test for here. Since they are also strongly related to one of the owner's rights (the right to be informed), we will deal with them in an upcoming part of the series.

Purpose limitation

We stated we will use the address to fulfil a contractual obligation: mailing a photo. That means we cannot use it for anything else, unless it's "compatible" with the original purpose. In our case, we don't have any uses outside sending the photo.

While that means we are not legally allowed to, say, use these addresses to subscribe everybody to my new Kitchen Sink Monthly magazine, there's very little physically stopping me from doing so. And while trusting your future self is already risky, the risk grows if more people are involved. The two principles of storage limitation and data minimisation (see below) will help with this, but we can do more.

One is that nobody should have access to the addresses, except the people or processes that actually need it. This fits under the broader principle of least privilege.In our case, it means that the Kitchen Sink web app should only have insert access to the address table, because it only has to store new entries, but no select, update or delete privileges.

We can test if these privileges have been set up with a test that reads the database privileges and checks only the mail-sender user has read access to the addresses table.

Principle	Fulfillment	Check
Lawfulness	Contractual obligation	Contract (foreign key)
Purpose limitation	Access limitation	Test ✅

We could do more here. For example, we can enable audit logging. This will not prevent addresses from being used for other purposes, but can help detecting if it happens. Setting up audit logging in a secure and useful way can be quite a hassle, however, and for our simple application this is a large investment to cover a tiny risk. GPDR allows such considerations when discussing measures to protect data, and it is valid to not take measures provided the risk is reasonable.

Data minimisation

We should only collect personal data that is "adequate, relevant and limited to what is necessary". That means we cannot collect information we don't need for our purpose of sending a letter, such as a date of birth. But it also means we don't have to get overly minimalistic. In the Netherlands, where I live, in practice sending a letter will in many cases work with just a postal code and house number. A name, street and town are relevant to the purpose, and are in fact recommend by in the Universal Postal Union's addressing guidelines for Netherlands.

Sticking to Dutch addresses only, we need to add four fields to our addresses database table: name, streetAndNumber, postal Code and town. In order to prevent unverified personal data fields from being added to the database table, we can add a simple test that checks the column names of the addresses table, and fails if it encounters a non-whitelisted column.

Principle	Fulfillment	Check
Lawfulness	Contractual obligation	Contract (foreign key)
Purpose limitation	Access limitation	Test ✅
Data minimisation	UPU suggested address fields	Test ✅

Accuracy

When storing personal data, we are responsible for making sure personal data is accurate, up-to-date and rectified as soon as possible. Since we only keep the addresses until we've jotted them on an envelope (see storage limitation), an update or rectification process is not necessary for us. We still have to take "reasonable" steps to ensure the data we record is accurate, though.

Since the impact of this data being wrong is very limited, we can get away with some minimal steps here. An obvious one would be to do some user input validation. In our cases, we could require that all fields are non-empty, and that postal code have the correct format. A more advanced approach could be a look-up of the post code to verify (or auto-fill) it matches the street and place name. But since Kitchen Sink isn't bringing in any money, and these are paid services, so we'll leave out that check.

In addition to unit tests for the validation code, we can test this by looking at the actual data in the database and perform the same checks (non-empty, valid postal code). However, that has the downside of requiring our checks to read the data, which is not ideal. It would be better to add these as a contract to the database model. SQL databases can do this by using CHECK.

Principle	Fulfillment	Check
Lawfulness	Contractual obligation	Contract (foreign key)
Purpose limitation	Access limitation	Test ✅
Data minimisation	UPU suggested address fields	Test ✅
Accuracy	User input validation	Contract (column checks) ✅

Storage limitation

If we no longer need personal data for the reason we acquired it, we must get rid of it. In our simplistic case, we no longer need the address after we've written it on the envelope and put it in the mailbox. (If we'd be a bit more sophisticated, we might want to keep it a bit longer to handle track-and-trace, return deliveries, etc.) We could choose to manually delete every address record after that has happened, but manual actions are error prone and easily forgotten - in which case we'd be stuck with personal data we're not allowed to retain.

An automated cleanup process is more robust. For example, if we record when addresses are retrieved by the person sending the mail, we can run a daily job that deletes all addresses received before that point in time. We can run a test checking if that went well, counting the amount of "old" records. (Note this has the same downside as mentioned above for accuracy. We can work around it here by creating a view on the addresses table with only the old records and give access to that.)

A second issue we should address is backups. It's a good practice to set these up, but they will of course also contain personal data. Since we'd like to avoid having to remove individual addresses from a backup file, we need a different approach. It's reasonable that we keep a backup around for a limited amount of days - let's say 30 days - and remove it afterwards. This is a fairly standard setup alled "rolling backups". We can verify by checking the age of the oldest backup is at most 30 days.

Principle	Fulfillment	Check
Lawfulness	Contractual obligation	Contract (foreign key)
Purpose limitation	Access limitation	Test ✅
Data minimisation	UPU suggested address fields	Test ✅
Accuracy	User input validation	Contract (column checks) ✅
Storage limitation	Auto removal	Test ✅
Storage limitation	Backup rollover	Test ✅

Integrity & confidentiality

This is by far the broadest and most difficult principle to ensure. Integrity and confidentiality are two of the three traditional key concepts of information security, known as the "CIA triad". Properly securing applications is way beyond the scope of this post, but fortunately many others have written extensively about this (see the reading suggestions below). Moreover, verifying that applications are properly secured is not trivial either. Just look at OWASPS Application Security Verification Standard (ASVS); there's nearly 200 checks you might perform for a web application. For several of these, automation is either difficult, expensive, or both. I will list two measures we can take in our Kitchen Sink app, though, to give an impression of what we can do here.

Let's start with HTTPS. While slowly becoming the default, there's still many sites that don't protect traffic between browser and web server. There's really no excuse not to use it anymore, with Let's Encrypt offering free certificates, and web servers like Caddy having it enabled by default. We can even verify HTTPS is setup correctly with an automated test using one of the free HTTPS configuration checkers out there. I like the one by SSL Labs, but there's others out there.

A second security is that's getting increasingly common is vulnerabilities in third party components. For a simple web app we likely already depend on a web application framework, a programming language, a web server, a database, an operating system and several libraries to glue these together. All of these, whether open source or proprietary, may contain vulnerabilities, and these are regularly discovered and exploited. Checking all your components regularly and patching, upgrading or moving away from vulnerable ones is the best practice here. While you can make that into a manual process, (partial) support is available. Source control giant GitHub has support to check vulnerable libraries for some languages, as do tools like Snyk or Dependabot.

Earlier, we already mentioned the principle of least privilege, ensuring users have only the minimum privilege required. But to use these user accounts securely, we should secure them properly. This includes protecting them with a password or with SSH keys, but also limiting the IP addresses they can connect from (ideally, the database isn't even reachable from the entire internet). If you use a password or SSH keys, you should have strength requirements for those, ideally enforced by the database (e.g. in MySQL there's the password validation component). Unfortunately, I'm not aware of any (easy) way in which we can verify that this is set up correctly.

Principle	Fulfillment	Check
Lawfulness	Contractual obligation	Contract (foreign key)
Purpose limitation	Access limitation	Test ✅
Data minimisation	UPU suggested address fields	Test ✅
Accuracy	User input validation	Contract (column checks) ✅
Storage limitation	Auto removal	Test ✅
Storage limitation	Backup rollover	Test ✅
Integrity & confidentiality	HTTPS	Test ✅
Integrity & confidentiality	Secure components	Test ✅
Integrity & confidentiality	Strong authentication	—

This last gap in verification is typical for security. While there are (expensive) tools that cover a lot of security issues, ultimately manual verification is usually in order. Be sure to hire a professional for key applications, but don't be afraid to do it yourself either.

This marks the end this post, but not of this iteration. Our checklist still has the "Owner Rights supported?" column marked as to be done. In the next part, we'll address that gap.

In the meanwhile, here's further reading on some of the above subjects:

The Open Web Application Security Project (OWASP) has tons of information about web application security. Its top 10 is no doubt the most famous, but it's just a tiny part of what they do.
The book Software Security - Building Security In by security veteran Gary McGraw is over a decade old but, except for a few parts that focus on code level issues, is still relevant today.
Australian Troy Hunt writes and talks a lot about security, and has many excellent introductory articles or training videos.

This post was previously published at Rijks ICT Gilde (in Dutch)

Photo by Jason Blackeye on Unsplash