Devtools for Data Privacy — Step 3: An Ontology

#dataprivacy #privacy #opensource #devtools

Introduction

Continuing on from previous pieces on a shared privacy taxonomy and privacy devtools, In this article, I posit that a shared privacy ontology is the final piece to the puzzle; it unlocks a world of privacy engineering power, including the following key benefits: eliminating the need for privacy code reviews, no more manual data mapping, evaluate risk quickly in CI, and ultimately, privacy practices as interfaces. Here’s how…

Taxonomies, Ontologies and Privacy

If you’re following our work at Ethyca you’ll know that over the past three years we’ve been working with design partners to build an ecosystem of developer tools for data privacy.

In a previous post I wrote about the need for a taxonomy as a basis for a community-agreed way to describe privacy in a tech stack, such that universal tools can be built to solve common problems of privacy. But a taxonomy is just the starting point. It’s the foundation of a comprehensive ontology that can allow any developer to easily describe privacy concepts and behaviors of the code they write and the data they process and store.

Let’s first clarify the difference — a taxonomy describes entities using hierarchical relationships. An ontology is more expressive: it can describe a variety of relationships between entities and their role in a complex system. Ontological grammar should easily declare those roles and relationships, such as “is-a”, *“reports-to” *and so on.

We’ve spent years of human dev and product time on this problem at Ethyca in order to devise a way to map ontological concepts easily in a lightweight, human-readable syntax. Here, I want to walk you through some ontological concepts and their benefits as it relates to privacy and what we’re trying to achieve with them.

Why Solving this Problem Matters

If you’ve read my earlier post on why building privacy dev tools matters you already know how important this is. If you’re unfamiliar, I urge you to read that post. To summarize, governments, driven by the general public’s fear of Big Tech, are increasingly regulating software development companies. We’re the new oil, automotive, and financial services industries combined — all of which are regulated. The same is happening to software. As developers we’re in a position to build world-shaping technology, used by millions of people. That means our job now is more like a civil engineer, and with engineering systems for society at large comes tremendous responsibility.

Unfortunately the tools we use today make it hard to quickly and thoughtfully build systems that respect a user’s right to privacy. We need to re-invent the tools and optimize our dev pipelines to make it frictionless for us to describe and test our privacy strategies in our code. Checking privacy should be as easy as security-related static code analysis. Solving that problem ensures that future software will respect users so the tech industry can continue building systems and successful businesses alongside challenging regulations.

So developer tools for privacy provide a path to safer, respectful technology and will become a baseline for good development practices in future. If that’s to happen, we need to standardize the tooling used for privacy in development. The first step is a comprehensive ontology.

What is an Ontology?

In its simplest form, an ontology is a model that allows you to represent classes (our taxonomy) and their relationships to each other. In this modelling, we describe the behavior of those entities relative to each other. Typically an ontology is a language of nouns and verbs that is limited enough to make it easy to read and write, while being complete enough to encapsulate all of the concepts proposed.

For this reason, ontological design is subjective, complex and — in our experience at Ethyca — very iterative. We continue to refine the models and tools to support our language every day as we test new scenarios.

If you’re curious to learn more about ontologies in general, there’s a brief primer on ontologies which you can read here.

Objective of this Privacy Ontology

The objective of any valuable ontology is to create a shareable and reusable knowledge representation of a particular domain — in this case, privacy. The reason we care about this ontological work at Ethyca is that at present, there are multiple legal frameworks and global regulations, as well as cultural points of view on privacy that compete with each other.

This makes understanding and applying privacy for devs very challenging. How can we ensure we don’t misuse personal data if we’re struggling to agree on a definition for types of personal data?

As such, the objective of the ontological work we’ve been doing at Ethyca is not to define an entirely separate ontology but to study the state of the art across various points of view. We then aim to consolidate those thoughts with a single ontology, designed for developers first to make it as easy to apply to a tech stack as possible.

In summary, our goal is to offer a consolidation of much of the privacy work that’s been done and synthesize those into a grammar that’s designed for developers to apply to their work as easily as possible.

Immediate Benefits of a Privacy Ontology

If you’re a developer who’s had to contribute in any part in data privacy work within your team, you’ll likely understand the benefits of a uniform and agreed-upon privacy grammar, and the benefits are real:

Obviate the questionnaires for privacy reviews: For many devs, privacy reviews with your legal team entail questionnaires and meetings with lawyers and privacy specialists. It’s a slow, human-in-the-loop cycle to get approvals. A definition language for privacy allows devs to describe their code in their projects and automatically parse readable reports for privacy specialists. This doesn’t remove humans. It accelerates time to value and allows engineers and privacy experts to get directly to the areas of risk in an application.
No more manual data mapping and annotation: If you’ve been asked to annotate a dataset or review a service and provide descriptions for its use, you’ll know how sorely missing context can be for data flows. Instead of having to do this on production systems in runtime environments, a grammar allows devs to describe their datasets and codebase before deployment — no more requests to go and review databases and evaluate schema. Instead, as you ship new features, the privacy metadata managed in git ships with them, augmenting the metadata view of your data flows.
Quicker in CI approvals: If your team is privacy-aware and requires approvals on data processes before you can commit new code, this can delay your progress. But if code is self-described in an understandable format, approvals can be approved against policies set by the organization. Your commits and PRs don’t get held up by privacy reviews. Instead, you keep moving quickly, knowing that you have automated CI checks for privacy against organization policies.
Privacy by Design at the core: We’ve all heard mention of Privacy by Design (PbD), but achieving this requires a new domain of knowledge for many developers. A well-understood ontology provides that knowledge in a simple-to-implement format assuring that you’re quickly and more naturally folding PbD into your dev processes.
Automated privacy requests: Privacy rights requests by users to retrieve their data (access request) or delete their data (right to be forgotten/erasure request) result in data and engineering teams writing custom scripts on an ever-evolving data stack to manage user data across distributed systems. By describing privacy with a well-defined ontology during the implementation process, all new features and datasets are deployed with a metadata layer that allows you to identify the types of data you’re processing. Now, permanently deleting a user across distributed datasets becomes a scalable, automated task.

Future Benefits of a Privacy Ontology

If we can work together to define a standard ontology for privacy that is freely available and easily adopted as part of existing tools in our dev process, the longer-term benefits for engineers are endless:

Prebuilt privacy libraries and modules: An agreed standard ontology will give rise to libraries for languages and frameworks, frontend and backend, so that data collection can be tracked and managed centrally by applications. It would be easy to tag data collected in a react application with its appropriate data type, such that you could carefully manage how it’s used by other system processes.
Semantic privacy CI checks: Instead of needing manual review by legal teams or privacy consultants, policies of what is permitted in development can be written in the ontology and checked against PRs before they’re merged. Privacy checks can be automated like SAST, before code even gets to production and risks doing something wrong with data. This automation ensures that high-risk issues are surfaced to developers and privacy specialists to fix together.
Semantic, fine-grained ACL: Roles-based access controls are inadequate in a future state where individual users have rights over their data. With an adequate privacy grammar it’s possible to enforce individual users’ rights at the db level. Of course, to avoid latency this requires work at the buffer/cache level, but it would permit an application and distributed datasets to ensure individuals rights are respected across systems.
Privacy practises as interfaces: Today, deciding whether to work with a new cloud provider or third party vendor is an evaluation of privacy policies, legal docs and data processing agreements. The privacy behaviors of any system could and should be exposed as part of the interface. Simply implementing the API would allow a developer to understand how the receiving system is going to use each type of data it has access to.

Summary

In summary, an open standard for privacy grammar would encourage transparency of system design and behavior, as well as simplifying the implementation of privacy for developers. No longer a set of questionnaires to be filled in, privacy becomes a declared set of statements in your project code that validate, report to a log and can be investigated easily.

Providing every developer with an easy way to make privacy a basic hygiene factor of good coding practises — that’s what a great privacy ontology does. It’s what we’re working towards, and I’m excited to share more soon.