DEV Community

loading...

Discussion on: Kafka, GDPR and Event Sourcing

Collapse
kspeakman profile image
Kasey Speakman • Edited

I dunno about Kafka, since it does not work for me as an event store (or at least the kind I have needed so far). But in SQL-based stores, you can delete the stream pretty easily with DELETE FROM EventLog where StreamId = ?. And in EventStore you can hard delete a stream and scavenge to remove the events. But in either case, you should probably write an event to the end of the stream signifying that the user requested removal and wait for the read models to process it and remove the data from their storage first. Suddenly deleting or modifying streams does not signal the projections to do likewise.

Collapse
danlebrero profile image
Dan Lebrero Author

Thanks a lot for the comment!

I think that using Kafka means that you need to change how you design your architecture. You cannot follow the "read from DB", "update in memory" and "write result to DB" model anymore. You have to embrace event based architectures. I think I cover both of your concerns here.

As with SQL-based stores, it is possible to delete events from Kafka, but the point is that you lose the immutability guarantees, which means that you open the door for "updating" events for other causes unrelated to GDPR. With SQL-based stores and Kafka compacted topics you have to rely on the team's discipline to not misuse the mutability of the store. With regular Kafka topics you just cannot touch the events, you are sure than nobody manipulated them.

Even if I think that immutability is better, as it gives you strong guarantees, I think the additional complexity caused by GDPR may make it impractical.

Collapse
kspeakman profile image
Kasey Speakman • Edited

Thanks for the article.

I do not think it addresses the concerns, as isolation between entities is still a large problem (How do I check state of a single entity in order to validate and fulfill a request? What happens when the state structure of the entity needs to change due to new features?). The deletion problem in Kafka highlights the isolation issue. Since you cannot do topic per entity feasibly, you are forced to mutate the topic with no real audit guarantees that only a single entity is affected. Kafka was designed for a large problem, so it doesn't suit the small granularity of this requirement.

I do embrace event-based architectures, and Kafka has an important role to play there. But I don't think it is the right tool for everything.

This is a bit of a derail from the original topic, and I apologize for that. If you want to discuss it further we can do so in a separate article or you can email me. kasey symbolic-at cornerspeed dee-oh-tee com