DEV Community: Cillian

Privacy-As-Code: Correcting TikTok’s $92M BIPA violation using Fides Open-Source

Cillian — Thu, 19 May 2022 17:25:14 +0000

We’re applying open-source devtools to the most high-profile privacy cases in recent years. This time, we build a solution to a landmark case in biometric privacy and purpose specification.

Introduction

We spend a lot of time at Ethyca talking about the future of privacy. It makes sense; the Fides open-source privacy engineering platform promises a future where true Privacy by Design is achievable for any business, with any type of technical infrastructure. But in seeking to illustrate just how that future could differ from today’s status quo, it’s useful to look at recent high-profile privacy cases, and show how applying Fides could have led to a different, better outcome for users and businesses.

For example, if you were TikTok in 2019, Fides could have rooted out a certain type of unlawful data use pre-deployment, before offending code ever handled user data. And by doing so, it could have prevented the privacy violations that led to a $92M fine from the State of Illinois.

Could a few lines of code in a vast product ecosystem really save a company ninety-two million dollars? With all the necessary caveats about culture, business model, and correct configuration, the answer in this case is: “absolutely.”

In this post I’ll describe the specifics of TikTok’s unlawful data uses per a $92M settlement in 2019 under the Biometric Information Privacy Act (BIPA) of Illinois. I’ll point out what went wrong on a technical level, and codify guardrails against this behavior by writing a policy in the Fides language.

When I previously considered the case of Fides’ utility to Facebook in an FTC investigation, I noted that a codebase, or a platform like Fides, doesn’t operate in a vacuum. It’s vital to consider the cultural and organizational aspects that contribute to good privacy too. Nevertheless, while flaws in internal processes are often key contributors to privacy infractions, my focus here is on what can be technically achieved to make privacy and respect low-friction and meaningful.

TikTok’s violation under BIPA

When it comes to privacy laws in the United States, BIPA is a heavyweight, in large part because it’s one of the few US privacy laws that gives a private right of action; individuals have the right to sue a company for violations. Beyond its enforcement features, BIPA places tight technical demands on how companies must respect biometric identifiers and biometric information of Illinois residents. The law defines a biometric identifier as:

a retina or iris scan, fingerprint, voiceprint, or scan of hand or face geometry.

And BIPA defines biometric information as:

any information, regardless of how it is captured, converted, stored, or shared, based on an individual’s biometric identifier used to identify an individual.

With these categories of personal information, BIPA contains strict requirements on how companies must collect users’ opt-in consent to process this information. Companies must also respect a suite of other restrictions on biometric data processing, retention, disclosure, and more.

As the 2019 TikTok lawsuit points out, biometric privacy is particularly high-stakes since the information involved is often immutable. While I can change my password or my home address, I’m not going to be able to change my fingerprint.

Looking at Section 15(c) of BIPA:

No private entity in possession of a biometric identifier or biometric information may sell, lease, trade, or otherwise profit from a person’s or a customer’s biometric identifier or biometric information.

Per Blank Rome LLP, courts have found that “for a claim to exist under Section 15(c), actual biometric data *or the *sharing of access *to the underlying biometric data must be *transferred *or *exchanged in return for some benefit.”

The TikTok settlement finds the following:

Defendants [being TikTok] are, and at all relevant times were, ‘in possession of’ the Illinois Plaintiffs’ and the Illinois Subclass’s ‘biometric identifiers,’ including but not limited to their face geometry scans, and ‘biometric information.’ Defendants profited from such ‘biometric identifiers’ and ‘biometric information’ by using them for targeted advertising, improvements to Defendants’ artificial intelligence technologies, Defendants’ patent applications, and the generation of increased demand for and use of Defendants’ other products…

Now, you might contend that this data use can be viewed as integral to the particular defendant’s business model, rather than an unfortunate misalignment between product and legal stakeholders… and you may well be right! But it’s also very easy to imagine the misalignment scenario. Indeed we know that at some of the world’s largest companies, privacy engineers are lamenting that:

“We can’t confidently make controlled policy changes or external commitments such as ‘we will not use X data for Y purpose.’ And yet, this is exactly what regulators expect us to do”

So the example of TikTok and BIPA proves a very suitable candidate to demonstrate Fides’ privacy engineering power. With this context, I’m going to use Fides to proactively flag any code that could violate Section 15(c). In other words, the CI pipeline will have an automatic check that any code handling a biometric identifier or biometric information — I’ll hereafter group these as “biometric data” — cannot be used for any of the cases prohibited above.

Examine our policy

As with the Facebook/FTC example I discussed in my previous post, let’s translate the legal requirement into a technical guardrail on the codebase. The Fides policy would be:

For a walkthrough of each of these fields, I encourage you to see the first post in this series. Here, I’ll focus on the new material.

First, I’ve used the Fides taxonomy to pinpoint which kinds of data this policy applies to. In this case, I am interested in any kind of derived user data as well as any labels that apply to biometric data. That’s where I get the values for data_categories:

user.derived.identifiable.biometric_health, user.provided.identifiable.credentials.biometric_credentials, and user.provided.identifiable.biometric

Under the data_uses, I want my policy to identify any instances of code processing the data categories I’ve previously specified. Again referencing the Fides taxonomy, I identify any data uses that involve commercialization, which might be in the form of advertising, training an AI system, improving the product, or sharing data with third parties.

Next, I specify that the biometric data in question is that of a customer, so I specify the data_subjects as such. And finally, I describe the degree of identifiability of this data: it applies to data that directly identifies an individual:

aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified

With these pieces together, this policy could be summarized as:

If any form of customers’ biometric data is processed for purposes of advertising, training an AI system, improving a product, or sharing with third parties; then trigger a violation in the automated privacy check.

So for example, if the TikTok product team wants to ship a release that shares biometric data with third parties, at the time of commit, their automated Fides privacy check will fail, and the release will not proceed.

This policy, in tandem with up-to-date annotation of the codebase’s privacy behaviors (here is how a dev can do that), becomes an indispensable tool in aligning the tech stack with modern laws like BIPA. There are myriad organizational and governance benefits to integrating privacy checks into the CI pipeline, and proactively flagging code for non-compliance cuts out the technical debt that makes privacy improvements elusive for so many companies today.

Conclusion

I’ve been writing for a while now on why we at Ethyca believe low-friction devtools are the key to solving technical privacy challenges. Software developers are like civil engineers, building vital infrastructure that billions of people rely on — for employment, payment, education, entertainment. With this significant power comes a need for transparent, rigorous standards in how personal data is respected.

Ultimately, users deserve systems that are trustworthy: systems that behave as users expect them to. The common thread of the biggest privacy stories is that companies break their promises around personal data processing. Even when engineers deeply care about users and seek to respect their data, it can be an uphill battle to keep track of loose ends across complex data infrastructure. An incomplete picture of data context and data control can cause even the best-intentioned team to expose users to significant privacy risks. In this post, I’ve aimed to share a specific instance of the Fides devtools equipping teams with the context and control that they need, to deliver privacy that their users deserve.

Thanks for reading, and stay tuned for my next post, where I’ll cover another major privacy story and demonstrate how Fides can solve the challenge upstream.

Privacy-as-Code: Preventing Facebook’s $5B violation using Fides Open-Source

Cillian — Thu, 27 Jan 2022 17:36:33 +0000

Introduction

Our team at Ethyca recently launched Fides, the first open-source developer tools to make trust, respect, and privacy part of any tech stack. The reception has been fantastic, and we’re actively helping some of the world’s best privacy engineers and engineering teams to roll out Fides across their tech stack. (I can’t wait to share more about our design partners in the near future!)

Enthusiasm for the idea of open-source Privacy-as-Code is unmistakable, nevertheless, translating that enthusiasm into practical application is a question we sometimes find ourselves fielding.

“Yes, it sounds intriguing, but what sort of value could something like Fides be delivering right here, right now, in our business?”

Today, as the first part of a larger article series, I want to illustrate how a few lines of fideslang can enforce an important set of data guardrails across a large, distributed system. We’re going to be writing a Fides policy that prohibits an application from sharing data with third-parties for purposes other than those specifically agreed to by the user or stated by the organization.

How can this add value for a business? Well, the absence of this exact set of guardrails we’ll be coding today was the catalyst for a $5 billion FTC fine in 2019, levied against Facebook for continued collection of user data by third-party app developers without user consent.

It’s safe to say that this is a situation where privacy engineers using Fides open-source tools could have delivered remarkable value for one of the biggest companies in the world.

Let’s see how…

About these articles

A quick sidebar to present the context for this piece. In a four-part series, I’m going to demonstrate the power of a Privacy-as-Code approach is by answering the following questions:

What does a Privacy-as-Code approach mean practically and how does it benefit agile engineers and governance teams?
How could Fides’ open standard taxonomy for governance prevent some of the world’s biggest privacy failures?
How can Fides, today in its present form, help to assure that a business can be trusted to respect its users and complex local laws with as little friction as possible?

To answer these questions, I’ll examine some of the world’s most talked-about privacy cases of recent times. I’ll distill them into a summation of what went wrong at a technical level and show, by writing real policies in Fides, how these failures could have been prevented with minimal friction for every engineer.

Facebook’s violation under FTC decree

One necessary disclaimer as we charge into writing some fideslang policies is to observe that, of course, technology does not operate in a vacuum. As part of their investigation, the FTC did identify organizational process failures throughout Facebook’s handling of privacy program management. That is to say, culture is important, and Facebook has some macro issues related to governance that are beyond the scope of this post. Our focus here is to illustrate technical measures that could have been taken to ensure technology systems behaved in accordance with the organization’s stated policy at the time.

With that in mind, I’ve summarized a relevant section of the findings from the FTC with some color related to what went wrong:

Taken directly from the FTC:

“Facebook announced in April 2014 that it would stop allowing third-party developers to collect data about the friends of app users (‘affected friend data’). Despite this promise, the company separately told developers that they could collect this data until April 2015 if they already had an existing app on the platform. The FTC alleges that Facebook waited until at least June 2018 to stop sharing user information with third-party apps used by their Facebook friends.”

So what happened here?

From the FTC’s expansive investigation, I’d highlight three core data governance issues:

Facebook was making public promises to users about the degree of control they had over their data.
Due to the inevitable sprawl of iterative, fast growing tech companies, Facebook didn’t have adequate tools to simply track where user data was and what permissions were related to it.
As a result, Facebook continued to share data with third-party app developers because they didn’t have the contextual visibility or control layers to prevent that from happening.

Let’s talk about what should be technically enforceable.

If we, as privacy engineers, make a promise to our users about how we use their data, irrespective of the scale of our infrastructure, we want to be able to keep that promise. Of course, the reality is that systems get built rapidly and incrementally in response to user demand and new business requirements, so if you go from one transactional db to petabyte scale distributed data infrastructure, you have data duplicated to multiple locations, often running asynchronously with separate enforcement tools. Existing tech systems don’t have the tools needed to respect users’ data at scale.

At its simplest level, Facebook was unable to be trusted with this promise because they didn’t have context over all the data flowing across their systems (the categories of data) or what it was being used for (the category of use) and its associated limitations. By limitations here, I mean purposes or uses for which the data was not approved.

How Fides’ Privacy-as-Code could have helped

Fides is built to solve for problems like this. In its current release, you can already draft a policy in YAML using fideslang and enforce that policy to ensure engineers across a team can’t accidentally or intentionally misuse data in a way that deviates from the promises a business or application makes to its users.

(A sidebar; today Fides supports these enforcements in your CI pipeline for your own engineering teams. In the near future, Fides will extend to provide the same enforcement in your runtime environment as queries or executed as well as against external APIs. This will ensure you can make a promise to your users and trust that it can be kept across distributed systems, both owned and third-party.)

The management application for Fides, fidesctl (Fides Control), is comprised of:

An evaluation and policy management server that can be integrated directly into checks in your CI pipeline for all engineers
A cli tool that runs locally to allow engineers to quickly evaluate policies and a host of other features.

In its simplest form, Fides is a language to describe the context of how your codebase is handling various categories of data and what purpose it’s using them for. As you can see from the diagram below, the fidesctl server is used to create and store policies that govern what is permitted by your team, organization, or a given regulation. These policies are automatically checked on commit to provide active control and ensure the work you’re doing meets the criteria of the policy and if not, provides helpful notes to allow you to make changes before re-committing. In short: avoiding the risk of deploying code that might not comply with the promises you’ve made to your users.

If you’d like to learn more about Fides, checkout the repos and documentation at https://fid.es/ctl.

Examine our policy

Let’s create a simple policy using Fides to ensure that data is only used for purposes specifically agreed to by the user or stated by Facebook. More specifically, we’re going to build a privacy policy that governs the sharing of user data with third parties.

As you can see in this example, a policy can be extremely detailed and fine-grained, describing specific categories of data or purposes of use. Alternatively it can be wide-sweeping to limit the use of many data types more broadly.

Let’s walk through the policy:

fides_key: data_sharing_policy

fides_key is the key of the organization to which this policy belongs.

name: Data Sharing Policy

name is the human-readable label assigned to the policy.

description: The privacy policy that governs sharing of data with third parties.

description is the human-readable description that provides more context on the purpose of the policy.

Policies may contain multiple rules grouped within the rules:sub-group

data_categories Is the attribute of data governed by the policy as defined in the Fides taxonomy.

data_uses is the attribute that describes the various categories of data processing or purposes for which data may be used in your company.

data_subject describes the individual person types to which the data belongs, such as customer, employee, patient, etc.

data_qualifier describes the acceptable or non-acceptable level of de-identification permitted.

matches is an enumerated list of criteria that describes how you would like the rule to be evaluated. These basic logic gates determine whether the array of privacy attributes will be fully included (ALL), not included at all (NONE), only included if at least 1 item in the array matches (ANY), or excluded with any additional attributes included (OTHER).

Policy in action

As Fides is intended to be lightweight and human-readable, it quickly becomes clear what this policy’s intended outcome is. However, let’s walk through what it’s doing:

In essence the rule is a conditional statement that can be read as:

“Where any of n categories of data are found to be in use for any of n purposes of use for customer data, reject or block this activity.”

As you can see, we’ve provided some context around the policy’s purpose by providing it with a name and description. From there, we’ve created just one rejection rule which is intended to disallow (or reject) the use of any account or user identifiable data (whether that is provided or derived) for any purpose related to third_party_sharing. Third-party sharing in the Fides taxonomy represents sharing of data to third-party (external) destinations related to marketing or advertising.

This policy can be loaded into fidesctl server and will prevent engineers from merging and deploying code which shares account or user identifiable data of any kind with third parties.

Let’s quickly look at that close up with the diagram below:

Governance and legal or executive teams can draft policies related to managing sensitive data that are stored securely on the fidesctl server.
Product owners, managers, or software engineers can describe the purposes of use of data for the features they’re working on; these are logged directly to the fidesctl server.
Engineers can quickly declare the types of data in use in their system as they’re writing new code or building their systems.
When code is committed, fidesctl is connected to that process to allow it to inspect and evaluate policies before any code is deployed.
Fidesctl server combines the organization’s defined policies to evaluate the proposed system changes in code.
If new software changes meet the organization policies they are approved and stored as a metadata record in the project history. If they are rejected, the engineer is notified in real-time as part of their commit so that they can make changes to their work.
This evaluation of policies is reported out so that it can be monitored or audited.

Summary

I’ve previously written on the benefits of a Privacy-as-Code approach, and these are now becoming a reality with Fides.

Fides can be integrated with the existing automated CI pipeline checks already in place to ensure that no engineer can accidentally or intentionally bypass these controls. As a result, you can evaluate every commit or PR and ensure that engineers are simply declaring the privacy or governance characteristics of their code and have that checked into git.

This CI integration results in multiple benefits that would directly mitigate some of Facebook’s challenges and the demands the FTC places on Facebook within the consent decree, specifically:

The FTC asks for a robust privacy review process: Fides makes these checks more than a manual review process. Instead, they are an automated condition to enforce business rules and policies on every commit.
The FTC asks for robust audit trails for privacy: Fides creates something analogous to a git activity history for privacy, assurring any team can evidence exactly the decision they’ve taken over time.
Facebook was fined for not preventing data uses it promised its users it would prevent: Fides specifically enables you to write conditions that must be met for code to be deployed, preventing you from shipping code that might break your users’ trust.

The benefit here is plain to see. A standard, interoperable language to describe governance policies and a set of tools to ensure these can be enforced and observed throughout both software development and production environments all sum up to this vital capability: a business can trust that the promises its systems make are kept.

If you consider your own data infrastructure and its related data footprint, how confident are you of the types of data you’re handling, what you’re using it for and what systems they’re flowing into? Most teams feel like they have an abstract mental model for this, but when you examine the details, after a few months or years of creep, this accounting is rarely maintained, and so enforcing a user’s personal rights is complex or impossible. Ask yourself: Do you know all of the systems into which your users’ data flows and what it’s being used for? If you don’t, who does?

The irony is we obsess over either atomicity or eventual consistency depending on our database type, however in parallel, we’ve essentially given up on the idea that we can achieve any level of assured and enforceable consistency for an individual user — it seems like we’ve missed out on engineering one of the most important components of data infrastructure, given how complex most systems’ data flows are.

An open standard like Fides can directly answer the healthy demand the FTC is placing on engineering teams at Facebook for data context and control, while also preventing the major issues that got them here in the first place. If you’re building something new or continuing to iterate on your existing systems, adding Fides to your tech stack will reduce complexity, accelerate your development pipeline, all while ensuring your application can be better trusted by users.

In the next installment we’ll showcase additional capabilities of Fides when applied to another one of the biggest privacy cases from the past decade.

Thanks for reading.

Devtools for Data Privacy — Step 3: An Ontology

Cillian — Fri, 29 Oct 2021 14:15:20 +0000

Introduction

Continuing on from previous pieces on a shared privacy taxonomy and privacy devtools, In this article, I posit that a shared privacy ontology is the final piece to the puzzle; it unlocks a world of privacy engineering power, including the following key benefits: eliminating the need for privacy code reviews, no more manual data mapping, evaluate risk quickly in CI, and ultimately, privacy practices as interfaces. Here’s how…

Taxonomies, Ontologies and Privacy

If you’re following our work at Ethyca you’ll know that over the past three years we’ve been working with design partners to build an ecosystem of developer tools for data privacy.

In a previous post I wrote about the need for a taxonomy as a basis for a community-agreed way to describe privacy in a tech stack, such that universal tools can be built to solve common problems of privacy. But a taxonomy is just the starting point. It’s the foundation of a comprehensive ontology that can allow any developer to easily describe privacy concepts and behaviors of the code they write and the data they process and store.

Let’s first clarify the difference — a taxonomy describes entities using hierarchical relationships. An ontology is more expressive: it can describe a variety of relationships between entities and their role in a complex system. Ontological grammar should easily declare those roles and relationships, such as “is-a”, *“reports-to” *and so on.

We’ve spent years of human dev and product time on this problem at Ethyca in order to devise a way to map ontological concepts easily in a lightweight, human-readable syntax. Here, I want to walk you through some ontological concepts and their benefits as it relates to privacy and what we’re trying to achieve with them.

Why Solving this Problem Matters

If you’ve read my earlier post on why building privacy dev tools matters you already know how important this is. If you’re unfamiliar, I urge you to read that post. To summarize, governments, driven by the general public’s fear of Big Tech, are increasingly regulating software development companies. We’re the new oil, automotive, and financial services industries combined — all of which are regulated. The same is happening to software. As developers we’re in a position to build world-shaping technology, used by millions of people. That means our job now is more like a civil engineer, and with engineering systems for society at large comes tremendous responsibility.

Unfortunately the tools we use today make it hard to quickly and thoughtfully build systems that respect a user’s right to privacy. We need to re-invent the tools and optimize our dev pipelines to make it frictionless for us to describe and test our privacy strategies in our code. Checking privacy should be as easy as security-related static code analysis. Solving that problem ensures that future software will respect users so the tech industry can continue building systems and successful businesses alongside challenging regulations.

So developer tools for privacy provide a path to safer, respectful technology and will become a baseline for good development practices in future. If that’s to happen, we need to standardize the tooling used for privacy in development. The first step is a comprehensive ontology.

What is an Ontology?

In its simplest form, an ontology is a model that allows you to represent classes (our taxonomy) and their relationships to each other. In this modelling, we describe the behavior of those entities relative to each other. Typically an ontology is a language of nouns and verbs that is limited enough to make it easy to read and write, while being complete enough to encapsulate all of the concepts proposed.

For this reason, ontological design is subjective, complex and — in our experience at Ethyca — very iterative. We continue to refine the models and tools to support our language every day as we test new scenarios.

If you’re curious to learn more about ontologies in general, there’s a brief primer on ontologies which you can read here.

Objective of this Privacy Ontology

The objective of any valuable ontology is to create a shareable and reusable knowledge representation of a particular domain — in this case, privacy. The reason we care about this ontological work at Ethyca is that at present, there are multiple legal frameworks and global regulations, as well as cultural points of view on privacy that compete with each other.

This makes understanding and applying privacy for devs very challenging. How can we ensure we don’t misuse personal data if we’re struggling to agree on a definition for types of personal data?

As such, the objective of the ontological work we’ve been doing at Ethyca is not to define an entirely separate ontology but to study the state of the art across various points of view. We then aim to consolidate those thoughts with a single ontology, designed for developers first to make it as easy to apply to a tech stack as possible.

In summary, our goal is to offer a consolidation of much of the privacy work that’s been done and synthesize those into a grammar that’s designed for developers to apply to their work as easily as possible.

Immediate Benefits of a Privacy Ontology

If you’re a developer who’s had to contribute in any part in data privacy work within your team, you’ll likely understand the benefits of a uniform and agreed-upon privacy grammar, and the benefits are real:

Obviate the questionnaires for privacy reviews: For many devs, privacy reviews with your legal team entail questionnaires and meetings with lawyers and privacy specialists. It’s a slow, human-in-the-loop cycle to get approvals. A definition language for privacy allows devs to describe their code in their projects and automatically parse readable reports for privacy specialists. This doesn’t remove humans. It accelerates time to value and allows engineers and privacy experts to get directly to the areas of risk in an application.
No more manual data mapping and annotation: If you’ve been asked to annotate a dataset or review a service and provide descriptions for its use, you’ll know how sorely missing context can be for data flows. Instead of having to do this on production systems in runtime environments, a grammar allows devs to describe their datasets and codebase before deployment — no more requests to go and review databases and evaluate schema. Instead, as you ship new features, the privacy metadata managed in git ships with them, augmenting the metadata view of your data flows.
Quicker in CI approvals: If your team is privacy-aware and requires approvals on data processes before you can commit new code, this can delay your progress. But if code is self-described in an understandable format, approvals can be approved against policies set by the organization. Your commits and PRs don’t get held up by privacy reviews. Instead, you keep moving quickly, knowing that you have automated CI checks for privacy against organization policies.
Privacy by Design at the core: We’ve all heard mention of Privacy by Design (PbD), but achieving this requires a new domain of knowledge for many developers. A well-understood ontology provides that knowledge in a simple-to-implement format assuring that you’re quickly and more naturally folding PbD into your dev processes.
Automated privacy requests: Privacy rights requests by users to retrieve their data (access request) or delete their data (right to be forgotten/erasure request) result in data and engineering teams writing custom scripts on an ever-evolving data stack to manage user data across distributed systems. By describing privacy with a well-defined ontology during the implementation process, all new features and datasets are deployed with a metadata layer that allows you to identify the types of data you’re processing. Now, permanently deleting a user across distributed datasets becomes a scalable, automated task.

Future Benefits of a Privacy Ontology

If we can work together to define a standard ontology for privacy that is freely available and easily adopted as part of existing tools in our dev process, the longer-term benefits for engineers are endless:

Prebuilt privacy libraries and modules: An agreed standard ontology will give rise to libraries for languages and frameworks, frontend and backend, so that data collection can be tracked and managed centrally by applications. It would be easy to tag data collected in a react application with its appropriate data type, such that you could carefully manage how it’s used by other system processes.
Semantic privacy CI checks: Instead of needing manual review by legal teams or privacy consultants, policies of what is permitted in development can be written in the ontology and checked against PRs before they’re merged. Privacy checks can be automated like SAST, before code even gets to production and risks doing something wrong with data. This automation ensures that high-risk issues are surfaced to developers and privacy specialists to fix together.
Semantic, fine-grained ACL: Roles-based access controls are inadequate in a future state where individual users have rights over their data. With an adequate privacy grammar it’s possible to enforce individual users’ rights at the db level. Of course, to avoid latency this requires work at the buffer/cache level, but it would permit an application and distributed datasets to ensure individuals rights are respected across systems.
Privacy practises as interfaces: Today, deciding whether to work with a new cloud provider or third party vendor is an evaluation of privacy policies, legal docs and data processing agreements. The privacy behaviors of any system could and should be exposed as part of the interface. Simply implementing the API would allow a developer to understand how the receiving system is going to use each type of data it has access to.

Summary

In summary, an open standard for privacy grammar would encourage transparency of system design and behavior, as well as simplifying the implementation of privacy for developers. No longer a set of questionnaires to be filled in, privacy becomes a declared set of statements in your project code that validate, report to a log and can be investigated easily.

Providing every developer with an easy way to make privacy a basic hygiene factor of good coding practises — that’s what a great privacy ontology does. It’s what we’re working towards, and I’m excited to share more soon.

Devtools for Data Privacy — Step 2: Imagining the Benefits

Cillian — Fri, 22 Oct 2021 17:01:07 +0000

Why Solving This Problem Matters

If you’re in an agile dev team, particularly at a startup where shipping code matters most, it’s easy to think of privacy as a tax that slows down building things in order for legal teams to evaluate risks.

That doesn’t necessarily mean you don’t care about privacy. It means you have no efficient way to make it a part of your agile process. I believe it’s impossible to solve the world’s privacy problems without first making it easier for product-builders to do the right, respectful thing regarding user data. That’s why building devtools for privacy matters so much.

We can all see this around us already. Governments, driven by user fears of Big Tech, are increasingly regulating software development companies. Tech is the new oil, automotive, and financial services industries combined — all of which are highly regulated. In our regulated future, software and the engineers that build it will be required to ensure their systems comply with a myriad of complex regulations. These regulations often differ by geography, so if you’re building internet-scale tech that crosses continents, simply building a compliant product becomes a considerable challenge.

FTC Consent Decree

If you want to understand the consequence of this concretely, read the FTC’s 2019 Consent Decree against Facebook. The events that brought about the decree are multifaceted, but here is a brief synopsis: In contradiction of Facebook’s commitment to protecting users’ privacy, third-party developers were able to collect users’ personal information, even of users who had configured their settings to limit sharing to their Facebook friends. In the eyes of the FTC, Facebook failed to adequately assess and address third-party risk (recall that this transpired in the wake of the Cambridge Analytica scandal). Facebook had also asked users for phone numbers, with the stated purpose of security measures, to authenticate users who needed to reset their account passwords. Beyond the stated purpose, though, Facebook used phone numbers to deliver advertisements.

Aside from fining Facebook $5B — the largest fine ever imposed for a privacy violation, and some members of the FTC thought the penalties should have gone further — the decree laid out for Facebook a set of changes it must adopt to avoid further penalty.

The FTC outlined for Facebook the following, which trickles down to the entire dev community: If you write code that results in software that might collect, process or in some way munge users’ personal data, you must be able to describe what types of data you’re using, what you’re using them for and evidence of the risk mitigation strategies that were considered in doing so. This has to happen before you deploy!

A passage below from Section VII. E. 2(b), page 10 of the FTC Modifying Order Consent Decree:

IT IS FURTHER ORDERED that Respondent, in connection with any product, service, or sharing of Covered Information, shall establish and implement … safeguards that control for the material internal and external risks identified…
Such safeguards shall include…For each new or modified product, service, or practice that presents a material risk to the privacy, confidentiality, or Integrity of the Covered Information (e.g., a completely new product, service, or practice that has not been previously subject to a privacy review; a material change in the sharing of Covered Information with a Facebook-owned affiliate;)… producing a written report that describes…
The type(s) of Covered Information that will be collected, and how that Covered Information will be used, retained, and shared;…[and] existing safeguards that would control for the identified risks to the privacy, confidentiality, and Integrity of the Covered Information and whether any new safeguards would need to be implemented to control for such risks…

The implications of this directive are striking to consider. The FTC has directed the tech community to ensure that when code is written, we know what data it’s using, for what purpose, and how we’re going to limit those risks.

As I’ve written before, it’s analogous to the shift of security over the last ten years from being a post-deployment testing problem to being part of healthy development practises. Now, security is a normal and healthy part of any good dev’s attitude to coding.

What the FTC and regulators around the world are asking the people who write software to do is this: be accountable and thoughtful about the decisions you make in the code you write. Declare what data you’re using and why, and ensure that it’s not used for the wrong reasons at any time in the future.

That seemingly simple business requirement demands an entire set of tools that are missing in the average agile development process. In order to ensure your team can continue shipping as quickly — and thoughtfully — as possible, you’re going to need a process and tools to describe privacy and data processing characteristics of your systems according to agreed-upon definitions. With these tools, you can then review your systems, mitigate risks, and report on them if necessary.

What Does Privacy Look Like Today?

So what would those tools look like? A good starting point here is to consider what your tech stack looks like today, and how it might look in the future. As with any good software architect, plan for scale. Don’t just imagine what it will look like in your first deployment. Rather, ask yourself what will your tech stack look like when your company is successful and has a lot of users.

A good reference point for this is this article by A16Z’s Martin Casado, Matt Bornstein and Jennifer Li on Emerging Architectures for Modern Data Infrastructure.

The diagram below shows what data infrastructure at scale ends up looking like — we’ve added what traditional privacy management looks like here, driven by the GRC (Governance, Risk, Compliance or Legal) teams.

As you can see, your venerable application has become a behemoth of data sources, ETL, storage and various layers of historical and predictive analytics.

In this, you’ll see that privacy and governance become roles shared by legal/governance and security but sit outside the architecture of the system with many tasks being managed manually — it is essentially a complex reporting structure rather than an active layer of the system.

There’s a simplified view of this that helps to parse each tranche of the data infrastructure which includes where privacy operations typically sit in the data lifecycle. You can see in the diagram below these happen after deployment, so code is already in production and much of the work revolves around post hoc data identification and risk mitigation.

While the largest companies with endless engineering resources are able to conduct manual privacy reviews during the building of each layer, the reality is much different in most cases. In a typical case, privacy happens after you’re deployed to production, after data collection has started and once there’s a realization of a potential risk.

And resolving it at this point becomes nearly impossible — data has been collected, its source or use is often unclear, and you’re already badly exposed to accidentally using it in a way that might break the law (Facebook’s painful example above).

The result is bolt-on tools that sit external to your core architecture. It’s quite literally slapping a bandaid over a deep structural issue in a complex system. That bandaid allows you to check a box, but your systems are still opaque. You still don’t understand where data is, so you have to do the work manually. Governance, compliance and metadata management become a manual burden for both legal and data engineering teams. For most developers, that burden shows up every so often as an unwanted, painful ticket that slows them down from the work they love most. The industry’s current approach to privacy is not only inefficient. It’s also detrimental to building a culture of privacy, pitting privacy against innovation. We can and must rework our collective approach to privacy, and that calls for new tools.

What Might Privacy Devtools Look Like?

Imagine an alternative to this painful remediative and refactoring work to mitigate risks after you’ve already deployed. Instead, you could add tools to your CI pipeline that would help to describe the data processing characteristics of your project and warn you if there are any risks as part of each commit or PR.

The result would be two powerful concepts:

Data Context: because you’re tracking privacy metadata directly in git, you have context right from your source as to what types of data you’re using and for what purpose. Suddenly you have comprehensive oversight of data ingress and egress and its use. No more guessing.
Data Control: because that metadata is now attached to every project that is deployed, your entire infrastructure is described. You know what type of data is flowing where and why. Controlling that data now becomes a breeze; after all, the hardest part is enforcement on a system with blind spots.

Since founding Ethyca three years ago, our incredible team of product, engineering and privacy specialists have been working to understand what this could look like.

The solution to this is developer tools built to integrate with existing developer workflows. We believe that the solution comprises the following set of tools, overlaid on the data infrastructure diagram below:

Description Language for Privacy

An ontology and open standard for how to describe privacy characteristics of your code and databases. Rather like Infrastructure as Code tools for DevOps, such as Terraform or Ansible, a Privacy as Code solution would allow any developer to easily describe and version privacy metadata directly in git, ensuring that your project declares the type of data it uses and the purposes for which it uses them.

If this is driven by an open, easily understood standard ontology and taxonomy, integrating systems and evaluating risk will become API driven and assured.

CI-Integrated Evaluation

Once you have the ability to declare privacy characteristics, it should be possible to describe policies that can be enforced directly on projects in CI, on commit or for each PR. This would mean that instead of complex human-in-the-loop privacy reviews with legal teams, your pipeline can check your projects against agreed company policies and nudge you if something doesn’t look right before you deploy. Like security tools, but to help quickly ensure your work meets the company requirements. This effort usually focuses on dataset metadata — so tagging data at rest. If the code you deploy already describes its context and types of data, this can now sidecar your runtime environment transactions, meaning you can see what data types flow where and for what purpose. Metadata becomes a living component of your software development process and updates synchronously with each change.

Semantic Enforcement

Because you have a consistent definition language for describing data behavior in your application; you can now use this same standard to enforce access or comply with regulations, be they today’s regulations or regulations a decade from now. The integrated nature of the tools and the standard definitions of data use in the language ensure that you have a foundation on which you can enforce new policies in future without having to refactor your complex production infrastructure.

Summary

This brings us full-circle to developer tools for privacy. We need to build tools that work in existing pipelines and allow us to quickly describe the behavior of our code and the data in our systems. From there, we can evaluate risks, report on them and make changes.

Achieving this requires an open standard for privacy behaviors and characteristics as well as risk evaluation tools that are wired directly into git and other CI tools to make privacy a streamlined, low-friction part of your dev process.

I’m confident and optimistic that this is possible. It’s what we’ve been working on for some time at Ethyca, and we’re excited to share results of that work with the dev community in the coming months. In the next year, you’ll see Ethyca publicly step forward to offer solutions and ideas that we’d like to share with the entire engineering community. Our aim is to make privacy a simple, natural and easily enforceable part of your development process. I can’t wait to share this with you. This problem matters for the sake of our industry and the health of privacy and data ethics everywhere.

Devtools for Data Privacy — Step 1: Privacy Taxonomy V1.0

Cillian — Fri, 15 Oct 2021 15:38:55 +0000

NOTE: If you’d like to jump straight to the Github repo or documentation you can get these at:

Introduction
A Proposed Privacy Taxonomy
Objective of a Privacy Taxonomy
Accessing & Exploring the Taxonomy
Privacy Taxonomy Research
Early Decisions
Concepts & Conventions of the Taxonomy
Writing Conventions
Taxonomy Structure
Data Categories
Data Uses
Data Subjects
Data Qualifiers
Conclusion

Introduction

Back in July, I articulated some of Ethyca’s driving ideas in a Stack Overflow feature: Privacy is an afterthought in the software lifecycle. That needs to change.

To solve the problem I described, I believe the dev community needs to agree upon and build an open source definition language and set of tools for privacy.

The purpose of these tools is simple:

Allow anyone working on systems that process sensitive or risky data to consistently describe the types of data they’re handling and what that data is being used for.
Create CI rules or policies for how data can be used and enforce those in the CI pipeline to prevent code that might create risks or misuse data from ever reaching production.
Provide configurable tools to ensure respecting a user’s rights can be a feature of any system, such as data access, erasure, portability and retention.
Create runtime rules or policies for fine-grained, semantic enforcement, thereby ensuring that only the necessary data is shared with systems or people to perform a process.

The two key benefits yielded are (1) ensuring that software systems more easily comply with complex data privacy regulations that are already forcing change on the tech community, and (2) ensuring that the products we build more naturally respect the rights of users.

Over the last three years at Ethyca, we’ve been working hard with technical design partners and engineering teams at some of the world’s biggest tech companies to understand the root cause of privacy challenges and build the tools necessary to solve these from the ground up. I’m excited that in the coming months we’ll finally share the culmination of that work with the first public release of our open source privacy tools.

A Proposed Privacy Taxonomy

The foundation of those tools is a consistent understanding of types and uses of data, and so I want to share the first public release of our data taxonomy with you today.

Its purpose is to create an agreed-upon definition of types and categories of personal data. This is a vital first step in any ontological design as, in order for any privacy standard to be interoperable, we must first achieve a shared definition of what types of data there are.

I’m eager to share this and excited to get any feedback that might improve this standard for every developer. I believe what follows forms the foundation of any realistic solution to privacy engineering.

The rest of this post breaks down our current thinking on what will be a freely available and extensible taxonomy: its components, hierarchy and syntax.

Objective of a Privacy Taxonomy

As stated briefly above and in my other posts, if the dev community is going to solve privacy, we need to agree on a standard definition language on which to base our understanding of systems, the risk they pose and our ability to codify healthy policies into them — ultimately, an ontology for privacy (more on that in a future post).

The starting point for that is a definition of entities in a system: a taxonomy, the ability to describe what types of data we are handling and what we’re using them for. If we can make it easy for any dev to do this as part of their implementation process, we can start to bake privacy naturally into healthy software design and delivery processes.

So a lot rides on getting the dev community to standardize their way of describing privacy data and privacy-related data processes in their systems. The taxonomy that follows is our attempt to publicly start that process, and we want to encourage as many developers as possible to contribute their views so that we can collectively build better technology.

Accessing & Exploring the Taxonomy

This post marks the day we’re starting to release years of development work at Ethyca, as we had always intended, for the open source community. With that in mind, feel free to grab the taxonomy repo available below or use the codepen to explore the structure visually.

https://github.com/ethyca/privacy-taxonomy

or codepen here:

Privacy Taxonomy Research

Our goal at Ethyca was to conduct the detailed research necessary to provide the dev community with a first draft taxonomy robust enough to capture a comprehensive view of privacy, yet intuitive enough for any engineer to easily apply.

To achieve this, we evaluated existing privacy ontologies and their taxonomies, such as PrOnto and COPri v.2; international standards like ISO 19944; and contrasted with major data privacy regulations like the GDPR (the ICO’s website is a helpful read for those curious), CCPA, LGPD and drafts of the Indian PDPB.

Early Decisions

Armed with this analysis and feedback from our technical design partners, we’ve been refining this taxonomy for over a year. We feel that it’s an early but confident first step in capturing everything you will need to describe the privacy behaviors and data types of your tech stack.

To achieve this, we made some early, opinionated and intentional decisions that I’d love feedback on. So the taxonomy repo you’ll see here:

Supports all of the data types and concepts necessary to describe a system for the GDPR, CCPA and LGPD.
Supports the standards of ISO 19944;
Is extensible, so you can add categories of data or data processing definitions to suit your business;
Is intended to be semantic and allow a natural understanding of any label for any user.

Concepts & Conventions of the Taxonomy

Conceptually, the taxonomy is segmented in four groupings as follows:

1. Data Categories

Data Categories are a comprehensive hierarchy of labels to represent types of data in your systems. They can be coarse definitions such as “User Provided Data” or fine grained ones, such as “User Provided Email Address”. We’ll dig into this in a bit more detail shortly.

2. Data Uses

Data Uses are labels that describe how, or for what purposes, you are using the data. This branch of the taxonomy creates a structure for the most common uses of data in software applications. An example might be the use of data for payment processing or first party personalized advertising. These would both be data uses, ways in which your system uses data.

3. Data Subject Categories

“Subjects” is a slightly esoteric term, common in the privacy industry to represent the user type — that is to say, the label applied to describe the provider or owner of the data. E.g., if you have an email address in your system, it might belong to an employee of the company, or to a customer. In this case, employees and customers are both “subjects” of the system. Under various privacy regulations, they are afforded rights over managing how their data is used and that may vary by the subject type, so the distinction between a patient in a medical records system and a customer in an e-commerce system is important.

4. Data Qualifiers

Data Qualifiers provide added context as to the degree of identification and therefore, potential risk using the data might pose relative to identifying an individual. A simple way to think about this is a spectrum: on one end is completely anonymous data, i.e. it is impossible to identify an individual from it, and on the other end is data that specifically identifies an individual. Along this spectrum are various labels that denote the degree of identification that a given data might provide.

Writing Conventions

We worked through various potential syntaxes to ensure it’s as easy as possible to read and write in plain English. Ultimately, dot notation lends itself most appropriately to writing statements that are concise and easy to understand. So the branch structure is dot notation and you’ll see some of the end nodes are complex terms that use snake case for clarity.

As such, if you were attempting to use the taxonomy of data categories to label, for example, an email address, the resulting hierarchical notation would look like:

user.provided.identifiable.contact.email

As you can see, it’s pretty easy to deduce from this that it’s user-provided data, it identifies them and it is considered contact information — more specifically, an email address.

The design of these hierarchical structures is intended to allow any dev implementing this to type out the classification to a level of specificity that suits their needs. If I take the above example, my team and I might decide that we’re satisfied if we know it’s identifiable data about a user, which would look like:

user.provided.identifiable

Or labeling it with slightly more specificity as contact data:

user.provided.identifiable.contact

Taxonomy Structure

As stated, the objective with releasing the taxonomy now is to discuss, debate and over time, iteratively improve the taxonomy so that it can cater to most common scenarios for devs and data teams in describing systems. Here we’ll delve into the current structure and our rationale for their present state.

Data Categories

As you can see from the dot notation example above and the taxonomy visualizer tool in the header, Data Categories are classified into primitive categories with a hierarchy of branches and nodes that allow for degrees of precision when classifying data.

Here’s a breakdown of that structure:

Data Categories — Top Level

There are three top-level categories:

account: Data related to a system account.
system: Data unique to, and under control of the system.
user: Data related to the user of the system, either provided directly or derived based on their usage.

In defining these, we looked at the cross-labeling implications of data types such as an email address, which may be both account data and user data. This is a logical multi-label assignment so that you can manage this data for both purposes, or perhaps create exclusionary rules related to its use.

TLDR: When considering and testing labeling extensively, we found that with three clear primitives you could elegantly construct a series of labels that covered the broadest possible data types while limiting the number of terms needed to do so.

Data Categories — Second Level

For each top-level node, there are multiple branches that provide richer context. You will see for the first two, account and system, these are limited, where the user node provides subclasses suitable to assist in detailed personal data management.

account.contact: Contact data related to a system account.
account.payment: Payment data related to a system account.
system.authentication: Data used to manage access to the system.
system.operations: Data used for system operations.
user.derived: Data derived from user provided data or as a result of user actions in the system.
user.provided: Data provided or created directly by a user of the system.

Most of these are likely self-evident. Of note here are the derived and provided labels, as these respectively describe where data was derived by the system through observation or inference, versus explicitly provided by a user.

Data Categories — Third Level

As you can see, the hierarchy supports simple labels or, where necessary, very precise and fine-grained annotations. It’s easiest from here for you to dive in and play with the classifications yourself. However, we’ll quickly look at level three of the user branch specifically, where you have branches from derived and provided:

You can see this is split into identifiable and non-identifiable data:

user.derived.identifiable: Derived data that is linked to, or identifies a user.
user.derived.nonidentifiable: Non-user identifiable data derived related to a user as a result of user actions in the system.

And a similar split applies to provided, as shown below.

user.provided.identifiable: Data provided or created directly by a user that is linked to or identifies a user.
user.provided.nonidentifiable: Data provided or created directly by a user that is not identifiable.

Data Uses

Similar to Data Categories, for Data Use, we’ve attempted to capture the widest variety of data use cases with the briefest hierarchy we can. In addition to this, we’ve captured all of the use cases described by ISO 19944 and GDPR to ensure that a single taxonomy can describe data uses across data privacy frameworks.

You’ll see if you use the taxonomy explorer in the header that this currently breaks down as follows:

Data Uses — Top Level

At present there are seven top-level nodes to the use categories taxonomy branch. We think this still needs work and are continuing to optimize. However, today they are:

provide: Provide, in the context of providing a product or service.
improve: Improve, similarly relating to the product or service.
personalize: Personalization of the product or service.
advertising: Marketing, Advertising or Promotion.
third_party_sharing: Sharing data with a third party vendor or processor.
collect: Collect data with no currently specified use (you shouldn’t do this but it seems necessary to encompass some poor, non-privacy appropriate, legacy processes).
train_ai_system: Train an AI System.

Data Uses — Second Level

From here, it’s likely quicker to explore the second level data use categories yourself. However, it’s worth noting that we’ve attempted to capture the most common constructs that create privacy risks. For example:

advertising.third_party.personalized: Specifies data received from a third party for the purpose of personalization of advertising to a user or group of users.
third_party_sharing.personalized_advertising: Sharing of data collected by the system with a third party for their use in personalized advertising.

These two examples show really important distinctions of use. The first is where your product is performing personalized marketing/advertising by receiving and processing data from a third party. Whereas the second example declares that your system is sharing data with a third party for their use in advertising — very different uses and privacy implications!

Data Uses — Final Word

As a final word on data use categories, I stated at the top of this post that we’ve designed this with extensibility in mind, and data uses are a really effective example of this. Every business or software system is different and as such, you’re likely to have different or industry specific uses for your system.

The objective therefore with data uses is to create a simple framework to generate a clear hierarchy of classifications, so that you can quickly extend this for your use, whether it’s medical data use or some other sensitive process.

Finally, if you look at the repository history, you’ll see we’ve been iterating on structure from snake_case to dot notation and also hierarchy of terms. I’m hopeful that we’ll continue to do this constantly with feedback from devs implementing this to ensure it satisfies real-world use cases.

Data Subjects

This is likely the easiest group of the taxonomy to understand. At present it’s a flat structure with no hierarchy and represents the various types of users (aka subjects) that may be participants in your system. These could be users, customers, employees, patients, voters, etc.

You might ask why we’ve done this. The benefit of this specificity is future-proofing. As privacy regulations evolve we expect that certain groups of users’ data will be managed differently. The ability to assign one or multiple user types to your data assures that in future you can build policies and enforcements around data for any business or legal requirement. So you might decide that today you treat employee and customer data the same way, but you will have the flexibility to change retention policies on employee data in the future. As with everything in the taxonomy, you can extend this to support specific business cases. This flexibility means that a thoughtful system need not be fragile, rendered unworkable as soon as compliance requirements evolve. To the contrary: the ontology enables the system to be nimble, a vital quality in a landscape as dynamic as modern data and privacy compliance needs.

Subject categories are explicit in their meaning today, so:

anonymous_user: An individual who is truly unknown/anonymous to the system.
citizen_voter: An individual who is a citizen of a nation or state and may be a voter in a state sponsored voting system.
commuter: An individual in transit on any means of transportation where their location may be monitored.
consultant: An individual external service provider to the organization..
customer: An individual who has purchased products or services from the organization.
employee: An individual who is an employee of the organization.
job_applicant: An individual who is in the job application process of an organization, past or present.
next_of_kin: An individual identified as a legal point of contact for another category of individual in the system.
passenger: An individual traveling on transportation provided by the organization.
patient: An individual identified for the purpose of medical or health procedures.
prospect: An individual identified for the purpose of sales and marketing.
shareholder: An individual identified as an owner or shareholder of an organization.
supplier_vendor: An individual or organization providing goods or services to the organization.
trainee: An individual receiving training or tutoring.
visitor: An individual visiting a location of an organization.

Data Qualifiers

Data Qualifiers describe the degree of identification of the given data. Think of this as a spectrum: on one end is completely anonymous data, i.e. it is impossible to identify an individual from it, and on the other end is data that specifically identifies an individual.

Along this spectrum are labels that describe the degree of identification that a given data might provide, such as:

identified: Data that directly identifies an individual.
pseudonymized: Data which has been de-identified (by removing/replacing all identifiers) but used with other data may re-identify an individual.
unlinked_pseudonymized: Data which has been de-identified (by removing/replacing all identifiers) where linkages have also been replaced/broken such that the individual cannot be re-identified.
anonymized: Data which has been unlinked and for which attributes have been modified to assure confidence that the person cannot be re-identified with this data or in combination with other data.
aggregated: Statistical data that does not contain individual data and/or has been combined with sufficient data from multiple persons that no individual is identifiable.

Conclusion

In this post I’ve proposed a first draft of a privacy taxonomy, one that underpins much of the thinking we do at Ethyca. What we’re releasing today is just an up-down taxonomy. However, this precedes an entire ontology that provides a simple grammar to describe complex data flows and privacy-related behaviors in a software system. This has been at the heart of our work for nearly three years now.

I’m excited to finally start sharing that work publicly with the community, and I encourage feedback, debate and changes. By having these conversations, we can build a better standard and the tools necessary to make this easy for every dev to implement.

Over the coming weeks we’ll be releasing more details of our work in this space and the benefits created by these tools. We welcome your feedback, participation and contribution.

If you’d like to chat about anything here you can get me on Twitter, @cillian, or feel free to comment here.

DEV Community: Cillian

Privacy-As-Code: Correcting TikTok’s $92M BIPA violation using Fides Open-Source

Privacy-as-Code: Preventing Facebook’s $5B violation using Fides Open-Source

Introduction

About these articles

Facebook’s violation under FTC decree

So what happened here?

How Fides’ Privacy-as-Code could have helped

Examine our policy

Policy in action

Summary

Devtools for Data Privacy — Step 3: An Ontology

Introduction

Taxonomies, Ontologies and Privacy

Why Solving this Problem Matters

What is an Ontology?

Objective of this Privacy Ontology

Immediate Benefits of a Privacy Ontology

Future Benefits of a Privacy Ontology

Summary

Devtools for Data Privacy — Step 2: Imagining the Benefits

Why Solving This Problem Matters

FTC Consent Decree

What Does Privacy Look Like Today?

What Might Privacy Devtools Look Like?

Description Language for Privacy

CI-Integrated Evaluation

Semantic Enforcement

Summary

Devtools for Data Privacy — Step 1: Privacy Taxonomy V1.0

Table of Contents

Introduction

A Proposed Privacy Taxonomy

Objective of a Privacy Taxonomy

Accessing & Exploring the Taxonomy

Privacy Taxonomy Research

Early Decisions

Concepts & Conventions of the Taxonomy

1. Data Categories

2. Data Uses

3. Data Subject Categories

4. Data Qualifiers

Writing Conventions

Taxonomy Structure

Data Categories

Data Categories — Top Level

Data Categories — Second Level

Data Categories — Third Level

Data Uses

Data Uses — Top Level

Data Uses — Second Level

Data Uses — Final Word

Data Subjects

Data Qualifiers

Conclusion