DEV Community

Leonardo
Leonardo

Posted on • Edited on

"Never delete data"

TL;DR

  • Use trash-put instead of rm.
  • Don't hard/soft delete your entities, update it to disabled user, discontinued product, cancelled order, filled job, fired/retired employee and so on.

Deleting files

On Windows File Explorer you have ads and a Delete button that moves the file\dir to trash folder, on macOS Finder, GNOME Files or KDE Dolphin the button actually says Move to Trash.

On command line in an Unix-like system, we use rm, which is short for remove but it deletes not reMOVE. On Microsoft cmd.exe the command is del.

Consider this: You wanna run a command on production database, but it hangs and you think (not really sure) it's because of corrupted files, mv them to other path would be enough, but they are just corrupted data, so you use rm and accidentally deletes the main database instead of the replica.

rm deletes data immediately and it's pretty much irreversible. Humans make mistakes all the time and regret it soon after, some times seconds after pressing enter. Moving to trash to be deleted in the future instead, can save you.

Hard deleting was more necessary on the past when storage was so tiny and expensive, but now we not constantly fighting for space anymore.
So treat a rm (and the Unix tradition as a whole) as a product of its time, move undesirable files to the trash instead, you can use this trash-cli for linux.

And as the trash-cli readme says:

Can I alias rm to trash-put?

You can but you shouldn't. In the early days I thought it was a good idea to do that but now I changed my mind.

Although the interface of trash-put seems to be compatible with rm, it has different semantics which will cause you problems. For example, while > rm requires -R for deleting directories trash-put does not.

But sometimes I forget to use trash-put, really can't I?

You could alias rm to something that will remind you to not use it:

alias rm='echo "This is not the command you are looking for."; false'

Then, if you really want to use rm, simply prepend a backslash to bypass the alias:

\rm file-without-hope

Note that Bash aliases are used only in interactive shells, so using this alias should not interfere with scripts that expect to use rm.

Deleting entities on database

You don't, you just filter it out from all queries, right?

Well, imagine the entity is a customer, now when we try to join an order with its customer, what happens?

  • Should the customer soft delete cascade to their orders too?
  • Should we return a corrupted data error? because the foreign key points to a row that the query can't read (FKs should always refers to a valid row).
  • Should we return a customer not found error?
  • Should we return the data without the customer fields or filled with null? so now the UI says the customer is called undefined (or it crashes trying to access a non-existing field).

When someone dies we don't try to erase it from history (normally), we update its status to 'not alive'.

What we do

When an user requests to delete its account, their personally identifiable data should be deleted (ignore the article title) or anonymized because of the data protection laws, the UI should say it's a disabled user and it shouldn't be able to auth and order/post anymore (those are business decisions, not technical ones).

Products aren’t deleted, they’re discontinued. The warehouse manager needs to know that the product is discontinued so they don’t order more stock from the supplier. The product shouldn't suddenly disappear from the system, that would cause confusion and be understood as a bug.

Orders aren’t deleted, they’re cancelled. And there may be fees incurred if the order was canceled too late.

Employees aren’t deleted, they’re fired (or retired). And there is often a compensation package to be handled.

Jobs aren’t deleted, they’re filled (or their requisition is revoked).

You got the pattern. Good naming is part of a well documented code. Treat "deleted" just as one of the states of the state machine that is representing the domain (also, empty values are just an union/enum variant, you don't need a special construct to represent it, like a null pointer).
You update the entity state, perform the transition side effects, the available actions change based on the current state, queries can be filtered by state.

The rest

Okay, what about data that don't represent physical entities, individual-level data of an aggregate, redundant data, temporary data, old data, corrupted data? well, I move to trash/archive or maybe delete immediately (yeah, the title is clickbait).

Top comments (0)