DEV Community

Cover image for The Illusion of Data Ownership
Rizèl Scarlett for TBD

Posted on

The Illusion of Data Ownership

Table of Contents

Introduction

The rise of Mastodon and BlueSky as decentralized Twitter alternatives highlights the need for data ownership. But the slow adoption of a decentralized web reveals a gap in our collective comprehension. For as long as the internet has existed, internet users rarely owned their data, so it's hard to envision a web where data sovereignty is the norm.

I'm a quintessential baby millennial – born in '95. I was born on the cusp of GenZ, but I don't identify with GenZ because I'm not hip to their lingo, dances, or fashion sense. I started surfing the web during the early 2000s. I grew up using PBS Kids, Everything Girl, The Doll Palace, Club Penguin, MySpace, and Tumblr. Today, I use platforms like Twitter, GitHub, and Instagram. Each login, each acceptance of terms and conditions, was an implicit agreement to share fragments of my identity. As a result, corporations capitalize on trading and harnessing my data for profit. I mindlessly accepted it because I didn't know any other way of using the internet.

The idea that users can have full data sovereignty seemed like a utopian fantasy. Listening to company leaders share the organization's vision during a company offsite transformed my interpretation of data ownership.

Why I'm reflecting on this

I am someone who likes to reflect on the philosophy behind a technology, so I can confidently endorse it. While I believe all technology can be used for good and evil, I like to determine its current impact. If it leans towards the negative, I'm driven to influence people to use it in a way that positively impacts society. Take generative AI, for instance. While it poses the risk of job losses for some creatives, I dedicated two years to championing its use as an educational tool—one that offers a sense of psychological safety to learners. Similarly, I want to think holistically about decentralized web technologies such as Web5.

In this blog post, we will discuss the meaning of data ownership on the web and how it impacts society.

The Illusion of Ownership

Many internet users operate under the assumption that our online data is ours.The reality is we merely possess our data. There’s a huge difference between holding our data (possession) and owning our data (property).

  • Possession: This is about having control or physical custody of data. Here’s an example of what that looks like on social media platforms:

    • Access and View: You can log in, view your posts, and interact with content.
    • Modify: Edit captions, comments, or profile details.
    • Interact: Engage with content through likes, comments, and shares.
    • Upload: Add new content.
    • Delete (with limitations): Remove posts or deactivate accounts, but the platform might still retain or use your data.
  • Property: This is about having inherent rights to own, control, and manage data. Truly owning your data on a social media platform looks like:

    • Complete Deletion: You'd have the right to permanently erase all traces of your data from the platform's servers, with no backups or archives retained.
    • Data Portability: You could seamlessly transfer all your data, including posts, comments, and likes, to another platform without any loss or format change.
    • Monetization Control: You'd have the authority to decide if and how your data is used for advertising or other revenue-generating purposes.
    • Data Access Control: You could dictate who, including the platform itself, can access or view your data.
    • No Unilateral Changes: The platform couldn't change terms of service or data policies without your explicit consent.

We are more possessors than owners.

The Tyranny of Social Media

Recent changes on platforms like Twitter/X underscore my point.

Here are examples:

  • Usernames taken - When X rebranded from Twitter, they claimed the username '@X', despite it already being in use by another individual.
  • Vanishing features - Recently, X announced they are removing Twitter Circles. Twitter Circles allows users to select a subgroup of followers to receive particular posts. Many people use this for private sharing, but now that option will not exist. And while X promises to leave the Circle posts private, there have been instances in the past where bugs made Circle posts publicly viewable.
  • Lost content - Integrated newsletters like Revue were suddenly removed, leading to loss of content and subscribers.

Twitter/X is not the only culprit. Google has a history of discontinuing products including Google Podcasts. See: Killed by Google.

Exploitation of Disenfranchised People

“If you know whence you came, there are absolutely no limitations to where you can go.” - James Baldwin

I don’t know my ancestral history, but I want to. All I know is I was born in Antigua and my parents and grandparents were born in Guyana. I want to take an ancestry test, but there are data privacy risks. The powers that be have exploited disenfranchised people enough. I want to shield our history from potential data breaches and commercial interests. I don’t want to offer more of our narrative to those who might exploit it.

AI Thrives On Our Data

I am a huge fan of generative AI because it’s so powerful. However, I recognize that it’s only that powerful because it was trained on our data.

ChatGPT

ChatGPT is an integral part of my daily routine. It helps me brainstorm ideas and refactor code. I'm not sure how I could survive or how I ever survived without it. But there's a catch -- ChatGPT is super helpful because it was trained on public data, including data from users like us. This means that any confidential information we share could become part of its training data. There's a risk that if you tell ChatGPT sensitive information about you or your company, someone else can potentially prompt ChatGPT for that data, and get ahold of it. One of many examples is the case with Samsung where employees inadvertently shared proprietary code and internal business strategies with ChatGPT.

The Art Community

Many artists are upset with the rise of generative AI art. They suspect the tools were trained on their work because they recognize their own styles in generated AI pieces.

Whether these are actual problems or ethical gray areas, one thing is clear: wouldn’t it be better if we had a say in how our data is used and who uses it?

The Original Intent of the Internet

These are some of the reasons why data ownership is important to me. Even Tim Berners-Lee, the inventor of the World Wide Web, is disappointed in how we leveraged data on the Internet.

“I think the public has been concerned about privacy--the fact that these platforms have a huge amount of data, and they abuse it. But I think what they're missing sometimes is the lack of empowerment. You need to get back to a situation where you have autonomy, you have control of all your data.” - Tim Berners-Lee

Web5

Web5 is a platform (currently under development) that puts users in control of their data and identity. It doesn’t aim to replace current technologies, but enhance.

How Web5 Enables Data Sovereignty

Here’s how Web5 puts users in control of their data:

Decentralized Identifiers

Identity on traditional systems often looks like username and password pairings.

In the Web5 ecosystem, every person has a Decentralized Identifier (DID), represented as an alphanumeric string. DIDs are:

  • a W3C open standard
  • based on cryptographic principles.
  • not tied to one web application or system

Because of these factors, DIDs enable users to securely authenticate to any web app within the Web5 ecosystem.

Decentralized Web Nodes

Your DID gives you access to a Decentralized Web Node (DWN) or a personal data store. You can think of a DWN like your personal Dropbox. However, centralized platforms like Dropbox can change terms of service, access your data, or even shut down services, leaving you without access. Instead, a DWN provides a personal space where your data is stored and you decide who gets access.

Protocols

Protocols are responsible for structuring your data and establishing rules for data access and interaction within a DWN.

In other words, you can control who has access to your DWN and who interacts with it via a protocol. Here's an abridge example of a protocol you can write for a user's interactions on social media applications:

const socialMediaProtocolDefinition = {
    protocol: "https://sovereignsocialmedia.org/protocol",
    published: true,
    types: {
      personalInfo: {
        schema: "https://schema.org/Person",
        dataFormats: ["application/json"],
      },
      preferences: {
        schema: "https://schema.org/UserPreferences",
        dataFormats: ["application/json"],
      },
      posts: {
        schema: "https://schema.org/BlogPosting",
        dataFormats: ["application/json"],
      },
      comments: {
        schema: "https://schema.org/Comment",
        dataFormats: ["application/json"],
      },
      photos: {
        schema: "https://schema.org/ImageObject",
        dataFormats: ["image/jpeg", "image/png"],
      },
      videos: {
        schema: "https://schema.org/VideoObject",
        dataFormats: ["video/mp4"],
      },
    },
    structure: {
      personalInfo: {
        $actions: [
          { who: "author", can: "write" },
          { who: "author", can: "read" },
        ],
      },
      preferences: {
        $actions: [
          { who: "author", can: "write" },
          { who: "author", can: "read" },
        ],
      },
      posts: {
        $actions: [
          { who: "author", can: "write" },
          { who: "anyone", can: "read" },
        ],
      },
      comments: {
        $actions: [
          { who: "author", can: "write" },
          { who: "anyone", can: "read" },
        ],
      },
      photos: {
        $actions: [
          { who: "author", can: "write" },
          { who: "anyone", can: "read" },
        ],
      },
      videos: {
        $actions: [
          { who: "author", can: "write" },
          { who: "anyone", can: "read" },
        ],
      },
    }
};
Enter fullscreen mode Exit fullscreen mode

Here's a breakdown of the permissions (who has access to this data) in this protocol:

  • Personal Info Permissions:

    • Write: Only the user (author) can write or update their personal information.
    • Read: Only the user (author) can view their personal information.
  • Preferences Permissions:

    • Write: Only the user (author) can set or change their preferences.
    • Read: Only the user (author) can view their preferences.
  • Posts Permissions:

    • Write: Only the user (author) can create or update their posts.
    • Read: Both the user (author) and the public can view the posts.
  • Comments Permissions:

    • Write: Only the user (author) can create or update their comments on posts.
    • Read: Both the user (author) and the public can view the comments.

With this protocol, content creation is open to all, but users have control over their personal data and preferences.

Data ownership isn't just a technical decision or a fun concept for developers. It is about creating a more equitable online ecosystem.

What are your thoughts on data ownership and Web5?

Curious about Web5?

Top comments (1)

Collapse
 
manchicken profile image
Mike Stemle

I really need to catch up with your web5 stuff. What’s the best place to start if only have ten minutes a day to read?