My Data, My Stack

#selfhosted #homelab #privacy #devops

I want to introduce a concept I'll be coming back to throughout this blog: Personal Data Sovereignty, or PDS. It's the idea that individuals can and should have meaningful control over where their data lives, who can access it, and what happens to it. Not as a legal right, as something you actually build.

If you've been reading this blog for a while, some of what follows will look familiar. I've written about the network setup, the GPU server, the second brain, the AI inference stack. This post isn't about new technical ground, it's more philosophical. PDS is the concept I've been building toward without naming it directly, and I wanted to put it in one place before the series goes any further.

Personal Data Sovereignty

PDS isn't a legal framework, or a political position. It's a simple idea: that individuals can and arguably should have meaningful control over where their data lives, how it's processed, and who can access it.

The challenge is that "data sovereignty" usually stops at the level of rights and policy. You have the right to your data, but a right you can only exercise by asking nicely isn't sovereignty. If your data lives on someone else's servers, runs through someone else's software, and is governed by someone else's terms, then their decisions are your reality. A privacy policy doesn't change that. At best it creates a new chore: find time to read it, parse the legal language, figure out what it actually permits, and trust that they'll follow it. Most people don't, which is a rational response to being handed a 5-page document before they can use a service. That's not sovereignty, i's asking permission, and then doing homework about it.

There's a common observation about the internet: if you aren't paying for the product, you are the product. It's a useful heuristic for understanding ad-supported services, search engines, social media, free email. Your attention and your data are what's being sold.

But the real principle isn't about who's paying, it's about who's in control. If the infrastructure belongs to someone else, then the rules belong to someone else. Here are some examples of what I mean:

Amazon Ring had a webpage where law enforcement could fill out a form, claim a life-threatening emergency, and access your footage without your consent, a court order, or a warrant. Ring customers paid for their hardware. They paid a subscription for cloud storage. They were, by any normal definition, paying for the product. It didn't matter. The footage still flowed to law enforcement on request. Amazon has since updated their policy to require warrants in most cases, but an emergency exception remains, and the infrastructure that made warrantless access possible in the first place hasn't changed.
Flock Safety runs automated license plate reader cameras mounted on street lights, utility poles, HOA entrances, apartment complexes, and private businesses across the country. Their network aggregates vehicle movement data into a shared system where any subscribed party gets alerts when a tracked vehicle is spotted. Law enforcement, cities, and private organizations are all customers. You don't opt into it. You don't know which intersections, driveways, or parking lots have cameras. Your movements, when you leave, when you come home, where you go, are being catalogued and made available to whoever has a subscription.
The surveillance angle isn't the only one. In April 2025, Google announced it was ending support for first and second generation Nest Learning Thermostats, effective October 25, 2025. The thermostats hadn't broken. The hardware hadn't changed. Google had simply decided they were done with them, and because the devices depended on Google's cloud to function, that decision was theirs to make. Backlash was significant enough that Google offered compensation toward a replacement, but the end of support went ahead as planned. There is now a class action arbitration being organized against them. The device was in your home. The kill switch belonged to someone else.

The Nest situation isn't unique, any cloud-dependent device carries the same risk. But for most categories of home automation, there are alternatives. Home Assistant runs locally and supports thousands of devices without any vendor cloud. Thermostats like the ecobee or Z-wave units can be controlled entirely on your own network. Cameras can run Frigate, which does all its processing on your hardware. There's usually a community-maintained solution, and where there isn't, there's often a commercial option that can be isolated on a separate VLAN and prevented from phoning home. The ecosystem isn't perfect, but it's far more capable than it was five years ago.

These are examples, not the whole story. Your search history, your email, your files, your health data, all of it is subject to the same dynamic. Every service you don't control is a service that can change its terms, respond to a subpoena, get acquired, or decide that your data is useful for something you didn't anticipate.

What actually gets you to PDS is owning the infrastructure. Not necessarily the same infrastructure I own, but infrastructure you control.

A Note on Isolation

Complete isolation isn't the goal, and it wouldn't be very useful if it were. The internet is built on connecting to other systems. Email crosses networks by design. Search requires access to an index that would be expensive and difficult maintain yourself.

What matters is how those connections happen. SearXNG still reaches out to Google, but the query is anonymized. Google sees a request, not a person. Proton handles email that travels across the open internet, but end-to-end encryption means the content stays private. I'm still in the middle of moving my email there from Gmail, which is exactly how these transitions work: gradually, service by service.

The goal isn't a sealed box, it's external connections that respect your privacy rather than exploit it.

What I Actually Run

Before getting into the deep dives, I want to give a plain description of what I'm actually running and why each layer is there.

Network: An OPNsense firewall running on a fanless mini-PC, with VLANs for network segmentation and internal DNS so every service gets a proper hostname instead of an IP address. I covered this in more detail in a separate post.

Compute: A Proxmox hypervisor running LXC containers and VMs for most services, and a separate workstation with a pair of AMD MI60 GPUs for running local AI workloads. The MI60s have 32GB of HBM2 memory each, which is enough to run larger language models comfortably.

Storage: ZFS across the storage nodes, which gives me snapshots, replication, and data integrity checksums. I've had drives fail without losing data, which is the point.

AI: I run my own LLM inference server using vLLM, and a local AI-powered search tool called Perplexica. Perplexica uses my own models and runs entirely on my hardware, so queries don't leave my network. I've also been running Nabu, an AI assistant built on top of this stack, which is a topic for its own post.

Services: Home Assistant for home automation, Frigate for local camera recording, Syncthing for file sync across devices, n8n for workflow automation, and an org-roam knowledge base that I use as a second brain. These are things I use every day, and I'd be reaching for them even if PDS wasn't a consideration.

The Route I Chose

I want to be clear that my hardware choices aren't the point.

MI60 GPUs are not the only way to run local AI. Proxmox is not the only hypervisor. OPNsense is not the only firewall. I landed on these things through a combination of research, opportunity, and the particular shape of my needs. Someone else pursuing PDS might do it on a Raspberry Pi cluster, or a single NUC, or a secondhand server from eBay. The principle scales. The hardware doesn't have to match.

What I'm documenting here is my Sovereign Stack. The decisions that led to it, the trade-offs I made, and what it actually looks like to maintain it. Take what's useful. Ignore what isn't.

What It Costs

The main cost is time. Most of this runs without much attention, but things break occasionally and it's on me to fix them. There's a real learning curve early on that takes a while to get through, and it never fully disappears.

The other cost is hardware. I've spent real money building this out over several years, a mix of new and used equipment. The argument that self-hosting pays for itself in avoided subscriptions is true over the (very) long run, but it requires upfront investment that not everyone can make.

What You Get Back

I know where my data is, and what software is processing it. When I search for something using Perplexica, the query goes to my hardware and nowhere else. That's the practical side of PDS, and it's not abstract.

The less obvious benefit is that running your own infrastructure teaches you things. I understand networking better because I had to configure it. I understand LLM inference better because I had to get it working. That knowledge builds up over time in a way that just using a service doesn't.

What's Coming

This post is the overview. The deep dives are coming, one layer at a time:

The network layer -- OPNsense, VLANs, internal DNS, and how I expose services safely (already published)
Compute -- Proxmox, the GPU server, and why bare metal still matters
Storage -- ZFS and what it means to actually trust your storage
AI -- Local inference, Perplexica, and what PDS looks like when your assistant doesn't phone home
Services -- The glue layer: everything I run and why

Each post stands alone. Read them in order or jump to whatever layer interests you. The goal isn't to convince you to replicate my stack. It's to show you what's possible when you start treating your infrastructure as something you own.

The stack will also keep changing. Hardware gets replaced, better software comes along, requirements shift. I'll document that as it happens.