Discussion on: Developer deprogramming: Getting started in Event Sourcing

View post

how do event stores handle binary data updates efficiently?

A good and proper DDD / ES person would further ask the question: What is the scenario? (What problem are you trying to solve?) Because there might be a better alternative.

In an event store, you don't really update data in place if you can help it. You just keep adding new events. Depending on the size of the binary, I might do the traditional thing where you save it to file storage like S3 and then add an event when the file link changes.

Side note: The product EventStore stores the event data as binary. Meaning I actually have to serialize to string and convert to bytes before calling the method to save the event. I think internally it uses ProtoBuf to store events, which I read (but have not browsed the code to verify) handles binary data pretty efficiently.

K • Nov 20 '17

The use-case was to allow versioning of files (pdf, avi, mpg, mp3, wav, doc, xml, etc.)

Barry O Sullivan • Nov 21 '17 • Edited

As Kasey suggested, best practice there would be to store the file remotely on S3, and then have an event that references the remote file. This allows you to version binary files, as you just upload a new file and create a new a event.

Eg.

# File A, Version 1
FileUploaded
    file_id: 9f9753b8-6beb-4e36-b7ca-1f6f6bf23702
    reference: https://link.to.file

# File A, Version 2
FileUploaded
    file_id: 9f9753b8-6beb-4e36-b7ca-1f6f6bf23702
    reference: https://link.to.other-file

file_id is the reference in your system, whereas reference is the remote instance of that version.

K • Nov 21 '17

I see.

Doesn't this work against the "one source of truth" philosophy?

Barry O Sullivan • Nov 21 '17

Yeah, technically. It's just a philosophy though, so there are times when it's not the appropriate philosophy.

In this case it really comes down to practicality. Is it useful to store the full binary file in the event log? Does that give any value? If the answer is no, then there's no point in saving the file in the log, just store a reference.

Kasey Speakman • Nov 21 '17 • Edited

You can have different sources of truth for each "domain" or specialty. The source of truth for files is the file system... if it is gone from there, it doesn't matter what the database says. :)

It may still be important to (event-sourced) areas of the business to record that something changed and perhaps trigger a further action. You can set this up in a number of ways. The client could issue an API call after successful saving of the file (this is request-driven, probably not the way I would go). Or you could setup event notification on file operations (this is event-driven) -- S3 supports this or just use a file watcher for local apps.

At this point, this is really integration between two systems and no longer event-sourcing. Instead it is Event-Driven Architecture. Event sourcing really only applies inside individual domains, not across different systems. This is probably why you already had an inkling that event-sourcing would not solve the file management problem. By itself, it won't.

norpan • Nov 23 '18

We have an audit requirement that it should not be possible to change data without a trace, so storing the file separately was a problem for us until we realized that we just have to store the file sha256 hash and that way we can check if the file is the right one. So we get the best of both worlds.