Cillian

Posted on Jan 27, 2022 • Edited on May 19, 2022

Privacy-as-Code: Preventing Facebook’s $5B violation using Fides Open-Source

#dataprivacy #devtools #privacyascode #privacy

Introduction

Our team at Ethyca recently launched Fides, the first open-source developer tools to make trust, respect, and privacy part of any tech stack. The reception has been fantastic, and we’re actively helping some of the world’s best privacy engineers and engineering teams to roll out Fides across their tech stack. (I can’t wait to share more about our design partners in the near future!)

Enthusiasm for the idea of open-source Privacy-as-Code is unmistakable, nevertheless, translating that enthusiasm into practical application is a question we sometimes find ourselves fielding.

“Yes, it sounds intriguing, but what sort of value could something like Fides be delivering right here, right now, in our business?”

Today, as the first part of a larger article series, I want to illustrate how a few lines of fideslang can enforce an important set of data guardrails across a large, distributed system. We’re going to be writing a Fides policy that prohibits an application from sharing data with third-parties for purposes other than those specifically agreed to by the user or stated by the organization.

How can this add value for a business? Well, the absence of this exact set of guardrails we’ll be coding today was the catalyst for a $5 billion FTC fine in 2019, levied against Facebook for continued collection of user data by third-party app developers without user consent.

It’s safe to say that this is a situation where privacy engineers using Fides open-source tools could have delivered remarkable value for one of the biggest companies in the world.

Let’s see how…

About these articles

A quick sidebar to present the context for this piece. In a four-part series, I’m going to demonstrate the power of a Privacy-as-Code approach is by answering the following questions:

What does a Privacy-as-Code approach mean practically and how does it benefit agile engineers and governance teams?
How could Fides’ open standard taxonomy for governance prevent some of the world’s biggest privacy failures?
How can Fides, today in its present form, help to assure that a business can be trusted to respect its users and complex local laws with as little friction as possible?

To answer these questions, I’ll examine some of the world’s most talked-about privacy cases of recent times. I’ll distill them into a summation of what went wrong at a technical level and show, by writing real policies in Fides, how these failures could have been prevented with minimal friction for every engineer.

Facebook’s violation under FTC decree

One necessary disclaimer as we charge into writing some fideslang policies is to observe that, of course, technology does not operate in a vacuum. As part of their investigation, the FTC did identify organizational process failures throughout Facebook’s handling of privacy program management. That is to say, culture is important, and Facebook has some macro issues related to governance that are beyond the scope of this post. Our focus here is to illustrate technical measures that could have been taken to ensure technology systems behaved in accordance with the organization’s stated policy at the time.

With that in mind, I’ve summarized a relevant section of the findings from the FTC with some color related to what went wrong:

Taken directly from the FTC:

“Facebook announced in April 2014 that it would stop allowing third-party developers to collect data about the friends of app users (‘affected friend data’). Despite this promise, the company separately told developers that they could collect this data until April 2015 if they already had an existing app on the platform. The FTC alleges that Facebook waited until at least June 2018 to stop sharing user information with third-party apps used by their Facebook friends.”

So what happened here?

From the FTC’s expansive investigation, I’d highlight three core data governance issues:

Facebook was making public promises to users about the degree of control they had over their data.
Due to the inevitable sprawl of iterative, fast growing tech companies, Facebook didn’t have adequate tools to simply track where user data was and what permissions were related to it.
As a result, Facebook continued to share data with third-party app developers because they didn’t have the contextual visibility or control layers to prevent that from happening.

Let’s talk about what should be technically enforceable.

If we, as privacy engineers, make a promise to our users about how we use their data, irrespective of the scale of our infrastructure, we want to be able to keep that promise. Of course, the reality is that systems get built rapidly and incrementally in response to user demand and new business requirements, so if you go from one transactional db to petabyte scale distributed data infrastructure, you have data duplicated to multiple locations, often running asynchronously with separate enforcement tools. Existing tech systems don’t have the tools needed to respect users’ data at scale.

At its simplest level, Facebook was unable to be trusted with this promise because they didn’t have context over all the data flowing across their systems (the categories of data) or what it was being used for (the category of use) and its associated limitations. By limitations here, I mean purposes or uses for which the data was not approved.

How Fides’ Privacy-as-Code could have helped

Fides is built to solve for problems like this. In its current release, you can already draft a policy in YAML using fideslang and enforce that policy to ensure engineers across a team can’t accidentally or intentionally misuse data in a way that deviates from the promises a business or application makes to its users.

(A sidebar; today Fides supports these enforcements in your CI pipeline for your own engineering teams. In the near future, Fides will extend to provide the same enforcement in your runtime environment as queries or executed as well as against external APIs. This will ensure you can make a promise to your users and trust that it can be kept across distributed systems, both owned and third-party.)

The management application for Fides, fidesctl (Fides Control), is comprised of:

An evaluation and policy management server that can be integrated directly into checks in your CI pipeline for all engineers
A cli tool that runs locally to allow engineers to quickly evaluate policies and a host of other features.

In its simplest form, Fides is a language to describe the context of how your codebase is handling various categories of data and what purpose it’s using them for. As you can see from the diagram below, the fidesctl server is used to create and store policies that govern what is permitted by your team, organization, or a given regulation. These policies are automatically checked on commit to provide active control and ensure the work you’re doing meets the criteria of the policy and if not, provides helpful notes to allow you to make changes before re-committing. In short: avoiding the risk of deploying code that might not comply with the promises you’ve made to your users.

If you’d like to learn more about Fides, checkout the repos and documentation at https://fid.es/ctl.

Examine our policy

Let’s create a simple policy using Fides to ensure that data is only used for purposes specifically agreed to by the user or stated by Facebook. More specifically, we’re going to build a privacy policy that governs the sharing of user data with third parties.

As you can see in this example, a policy can be extremely detailed and fine-grained, describing specific categories of data or purposes of use. Alternatively it can be wide-sweeping to limit the use of many data types more broadly.

Let’s walk through the policy:

fides_key: data_sharing_policy

fides_key is the key of the organization to which this policy belongs.

name: Data Sharing Policy

name is the human-readable label assigned to the policy.

description: The privacy policy that governs sharing of data with third parties.

description is the human-readable description that provides more context on the purpose of the policy.

Policies may contain multiple rules grouped within the rules:sub-group

data_categories Is the attribute of data governed by the policy as defined in the Fides taxonomy.

data_uses is the attribute that describes the various categories of data processing or purposes for which data may be used in your company.

data_subject describes the individual person types to which the data belongs, such as customer, employee, patient, etc.

data_qualifier describes the acceptable or non-acceptable level of de-identification permitted.

matches is an enumerated list of criteria that describes how you would like the rule to be evaluated. These basic logic gates determine whether the array of privacy attributes will be fully included (ALL), not included at all (NONE), only included if at least 1 item in the array matches (ANY), or excluded with any additional attributes included (OTHER).

Policy in action

As Fides is intended to be lightweight and human-readable, it quickly becomes clear what this policy’s intended outcome is. However, let’s walk through what it’s doing:

In essence the rule is a conditional statement that can be read as:

“Where any of n categories of data are found to be in use for any of n purposes of use for customer data, reject or block this activity.”

As you can see, we’ve provided some context around the policy’s purpose by providing it with a name and description. From there, we’ve created just one rejection rule which is intended to disallow (or reject) the use of any account or user identifiable data (whether that is provided or derived) for any purpose related to third_party_sharing. Third-party sharing in the Fides taxonomy represents sharing of data to third-party (external) destinations related to marketing or advertising.

This policy can be loaded into fidesctl server and will prevent engineers from merging and deploying code which shares account or user identifiable data of any kind with third parties.

Let’s quickly look at that close up with the diagram below:

Governance and legal or executive teams can draft policies related to managing sensitive data that are stored securely on the fidesctl server.
Product owners, managers, or software engineers can describe the purposes of use of data for the features they’re working on; these are logged directly to the fidesctl server.
Engineers can quickly declare the types of data in use in their system as they’re writing new code or building their systems.
When code is committed, fidesctl is connected to that process to allow it to inspect and evaluate policies before any code is deployed.
Fidesctl server combines the organization’s defined policies to evaluate the proposed system changes in code.
If new software changes meet the organization policies they are approved and stored as a metadata record in the project history. If they are rejected, the engineer is notified in real-time as part of their commit so that they can make changes to their work.
This evaluation of policies is reported out so that it can be monitored or audited.

Summary

I’ve previously written on the benefits of a Privacy-as-Code approach, and these are now becoming a reality with Fides.

Fides can be integrated with the existing automated CI pipeline checks already in place to ensure that no engineer can accidentally or intentionally bypass these controls. As a result, you can evaluate every commit or PR and ensure that engineers are simply declaring the privacy or governance characteristics of their code and have that checked into git.

This CI integration results in multiple benefits that would directly mitigate some of Facebook’s challenges and the demands the FTC places on Facebook within the consent decree, specifically:

The FTC asks for a robust privacy review process: Fides makes these checks more than a manual review process. Instead, they are an automated condition to enforce business rules and policies on every commit.
The FTC asks for robust audit trails for privacy: Fides creates something analogous to a git activity history for privacy, assurring any team can evidence exactly the decision they’ve taken over time.
Facebook was fined for not preventing data uses it promised its users it would prevent: Fides specifically enables you to write conditions that must be met for code to be deployed, preventing you from shipping code that might break your users’ trust.

The benefit here is plain to see. A standard, interoperable language to describe governance policies and a set of tools to ensure these can be enforced and observed throughout both software development and production environments all sum up to this vital capability: a business can trust that the promises its systems make are kept.

If you consider your own data infrastructure and its related data footprint, how confident are you of the types of data you’re handling, what you’re using it for and what systems they’re flowing into? Most teams feel like they have an abstract mental model for this, but when you examine the details, after a few months or years of creep, this accounting is rarely maintained, and so enforcing a user’s personal rights is complex or impossible. Ask yourself: Do you know all of the systems into which your users’ data flows and what it’s being used for? If you don’t, who does?

The irony is we obsess over either atomicity or eventual consistency depending on our database type, however in parallel, we’ve essentially given up on the idea that we can achieve any level of assured and enforceable consistency for an individual user — it seems like we’ve missed out on engineering one of the most important components of data infrastructure, given how complex most systems’ data flows are.

An open standard like Fides can directly answer the healthy demand the FTC is placing on engineering teams at Facebook for data context and control, while also preventing the major issues that got them here in the first place. If you’re building something new or continuing to iterate on your existing systems, adding Fides to your tech stack will reduce complexity, accelerate your development pipeline, all while ensuring your application can be better trusted by users.

In the next installment we’ll showcase additional capabilities of Fides when applied to another one of the biggest privacy cases from the past decade.

Thanks for reading.

DEV Community