DEV Community

Dmitry Syntheva
Dmitry Syntheva

Posted on

Your data is never actually being deleted

There's a reasonable assumption most people make when they press a delete button: that the thing gets deleted. It doesn't, and this isn't some secret — it's just how cloud infrastructure works, and it's worth understanding before you invite a robot that moves and listens into your home.

When you delete a file from Google Drive, or a message from any cloud service, what actually happens is that the pointer to that file gets removed from your interface. The data itself stays on the server. The reason for this is straightforward enough once you think about it from the company's perspective: if you're running a multi-billion-dollar cloud service and a government agency shows up with a legal request for a specific piece of data that your user already "deleted," the last thing you want to say is that you no longer have it. So you keep everything. You just stop showing it to the user. Some companies have data retention policies that run into decades. Some simply don't delete anything, ever, because the cost of storage is negligible and the cost of not having something when someone important asks for it is very much not.

The data also rarely stays in one place. Cloud infrastructure is built on layers — the raw data goes in, gets split, gets filtered, gets routed to different systems for different purposes. There are audit logs for forensic analysis, processing queues for different data types, and at each hop there are people and systems that touch what's passing through. If you've ever looked at a cloud provider's terms of service and seen the phrase "we may use your data to improve our services," that clause is doing enormous work. Model training teams need real conversations because synthetic data only goes so far. Customer support needs access to user accounts to actually help anyone. Server administrators have root access to the file system because that's what it means to administer a server. These are all legitimate reasons. None of the people involved are doing anything wrong. The cumulative effect is that your private conversations have, in the normal course of business, passed through dozens of systems and been accessible to far more people than you'd probably guess.

I know this from the inside. Before starting Syntheva, I worked at one of the large technology companies, on a product that listened. What surprised me wasn't the data retention — I expected that. What surprised me was how porous the internal access model was in practice, not because of negligence but because large organisations inevitably accumulate access grants over time. Someone needs to debug a problem, they get access to the relevant logs. Someone is training a model, they get access to the relevant dataset. Over years, at the scale these companies operate, this adds up to a situation where your data has been touched by an enormous number of people for an enormous number of reasons, all of them defensible, none of them visible to you.

For most cloud services, this is uncomfortable but the exposure is mostly passive — data flows in one direction, gets stored, might get used for something you didn't intend. The robot case is different in a way that matters. A cloud-connected robot isn't just sending your data out — it's receiving instructions back. The cloud doesn't just log what you say; it determines how the robot responds, what it does, how it behaves in your home while you're not watching it. That bidirectional flow means that whoever controls the cloud pipeline controls the robot. Not in theory — in practice, in a way that someone with internal access could implement in an afternoon, by inserting a filter into the pipeline that adjusts what responses get generated. This isn't a sophisticated attack. It's a configuration change.

We built Synthia without a cloud connection because we understood this from experience, not from reading about it. There are no wireless modules inside her because there's nothing to compromise if there's no connection to compromise. Updates happen by taking out the SD card and burning a new image — what security people call an air-gapped process, meaning there is no live connection through which anything can be pushed or intercepted. We also can't remotely access your robot to help you if something breaks, which some people read as a limitation and we read as the architecture working exactly as intended. Any company can tell you they don't access your device. We physically cannot, and you can open her up and verify that yourself.

The delete button in Synthia's interface deletes things. It does this because the data never left the device in the first place.

Top comments (0)