DEV Community

Discussion on: Developer deprogramming: Getting started in Event Sourcing

Collapse
 
kayis profile image
K

how do event stores handle binary data updates efficiently?

Collapse
 
kspeakman profile image
Kasey Speakman • Edited

A good and proper DDD / ES person would further ask the question: What is the scenario? (What problem are you trying to solve?) Because there might be a better alternative.

In an event store, you don't really update data in place if you can help it. You just keep adding new events. Depending on the size of the binary, I might do the traditional thing where you save it to file storage like S3 and then add an event when the file link changes.

Side note: The product EventStore stores the event data as binary. Meaning I actually have to serialize to string and convert to bytes before calling the method to save the event. I think internally it uses ProtoBuf to store events, which I read (but have not browsed the code to verify) handles binary data pretty efficiently.

Collapse
 
kayis profile image
K

The use-case was to allow versioning of files (pdf, avi, mpg, mp3, wav, doc, xml, etc.)

Thread Thread
 
barryosull profile image
Barry O Sullivan • Edited

As Kasey suggested, best practice there would be to store the file remotely on S3, and then have an event that references the remote file. This allows you to version binary files, as you just upload a new file and create a new a event.

Eg.

# File A, Version 1
FileUploaded
    file_id: 9f9753b8-6beb-4e36-b7ca-1f6f6bf23702
    reference: https://link.to.file

# File A, Version 2
FileUploaded
    file_id: 9f9753b8-6beb-4e36-b7ca-1f6f6bf23702
    reference: https://link.to.other-file

file_id is the reference in your system, whereas reference is the remote instance of that version.

Thread Thread
 
kayis profile image
K

I see.

Doesn't this work against the "one source of truth" philosophy?

Thread Thread
 
barryosull profile image
Barry O Sullivan

Yeah, technically. It's just a philosophy though, so there are times when it's not the appropriate philosophy.

In this case it really comes down to practicality. Is it useful to store the full binary file in the event log? Does that give any value? If the answer is no, then there's no point in saving the file in the log, just store a reference.

Thread Thread
 
kspeakman profile image
Kasey Speakman • Edited

You can have different sources of truth for each "domain" or specialty. The source of truth for files is the file system... if it is gone from there, it doesn't matter what the database says. :)

It may still be important to (event-sourced) areas of the business to record that something changed and perhaps trigger a further action. You can set this up in a number of ways. The client could issue an API call after successful saving of the file (this is request-driven, probably not the way I would go). Or you could setup event notification on file operations (this is event-driven) -- S3 supports this or just use a file watcher for local apps.

At this point, this is really integration between two systems and no longer event-sourcing. Instead it is Event-Driven Architecture. Event sourcing really only applies inside individual domains, not across different systems. This is probably why you already had an inkling that event-sourcing would not solve the file management problem. By itself, it won't.

Thread Thread
 
norpan profile image
norpan

We have an audit requirement that it should not be possible to change data without a trace, so storing the file separately was a problem for us until we realized that we just have to store the file sha256 hash and that way we can check if the file is the right one. So we get the best of both worlds.