Discussion on: Redefining Event Sourcing

View post

There is one big drawback to the GDPR compliance mechanism described above. Encryption and hashing algorithms may become weak over time. This happened in the not so recent past, e. g. one big social network partly used SHA1 for password hashes. In such a case, "forgotten" data can become readable again, posing a huge and hard to contain compliance risk.

Another approach would be to mark PII data and implement a redacting mechanism that preserves the structure of the data, while erasing/replacing its content.

This seems to break the basic contract of event sourcing. But forgetting the keys basically comes down to same thing. One could argue that data that is not readable (not decryptable) also has changed, as events do not live for their own sake, but to serve a purpose. What is the difference between changed data and data that can't be read? Which leads me to another point. The main purpose of an event is to record that something has happened to the domain. In most cases this purpose is still served, even when data gets redacted. In fact, if you "forget" the encryption key, then the event must still serve its purpose without actually accessing the data.

If you feel uncomfortable with redacting events or if it becomes a burden to track which events store PII, another approach would be to put sensitive PII data into a dedicated key-value store and store the key in the event. Ideally the key or record should contain some customerId (which itself may not be PII). This would allow to track and delete all PII belonging to a customer. In case of a GDPR delete request, you could configure your key value store to return a "Deleted due to Art XYZ GDPR request" in place of the deleted data.