DEV Community

Discussion on: Event Storage in Postgres

 
alphahydrae profile image
Simon Oulevay

This idea of the previous version was in response to your (2-year old) discussion with the author about concurrent writers, namely:

The Event table has a unique constraint on (StreamId, Version). If writers are required to always increment the version to be 1 more than they highest event they read (which you may or may not be doing), then if two writers try to compete, that constraint will block the second.

And:

It can fail if a writer tries to write an event with a version greater than 1 more than the current max but, as we control the writers, we can prevent that I guess.

Assuming there's a bug in one of the concurrent writers and it increments by 2 or more, multiple events may be written in the same stream at the same time.

As an intellectual exercise, I was wondering if there was a way to enforce that writers always increment the version by 1 at the database level. That's the purpose of the previous version column. The multi-column foreign key forces an event to reference another event, and the check constraint forces that event to be the previous version of the stream exactly.

If a writer is bugged and tries to increment by 2, the database will refuse to store the event (since it would fail the check constraint). It complements the protection against concurrent writers provided by the unique constraint on stream ID and version, by also protecting against misbehaved writers.

Of course that would only work if you want to implement an extremely strict event store. It would not support gaps between events or deleting events like in the systems you mention. You probably don't want to be that strict in a real-life system.

Thread Thread
 
kspeakman profile image
Kasey Speakman • Edited

So this brings up a classic trade-off of what should be managed through the DB and what should be managed by code. The particular trade-off for managing this constraint in the database is a performance hit. To ensure writers do the right thing, we sided with code and made a library so that they all behave consistently. And in fact I'm still searching for ways to reduce our existing indexing/locking (of Position table, see other comments) which negatively impacts write perf. Not enough to matter for our use cases, but I would like to optimize further.

I've started to see why Greg Young's Event Store made the trade-offs it did -- many of which involve moving responsibilities out of disk into service code. It doesn't have the auto-increment or position-lock problem because it uses a serialized writer so the current position is able to be tracked in memory and recorded to disk when events are written. (Your client code can make concurrent writes, but inside the ES service, writes are serialized and fast.) Instead of a Stream table to record StreamType, you make stream type part of the stream id. Then you can categorize streams after the fact based just on stream ID. Instead of a unique index on StreamId + Version, GYES uses an Event ID, which at the application level you are supposed to create deterministically from StreamId + Version. So the the effect is the same -- duplicate events are prevented from being saved. The big perf gains (especially around Position) are implausible when using Postgres because it is not optimized for this type of use case. But we still want to use Postgres since we'd really need more people before we looked at supporting multiple DBs. And I like that it is a gentle starting point for smaller teams/projects.

Thread Thread
 
damiensawyer profile image
damiensawyer

Kasey, that's really interesting about GYES. It's a product I'd really love to have a play with one day. We looked at it a few years ago when starting our system but I chickened out because "No one ever got fired buying IBM (... in this case, Postgres)".

Simon, I get what you're going for and it's interesting. It would be interesting to see how that affected write performance.

Thread Thread
 
gregyoung profile image
gregyoung

If you want to look at the code involved look in StorageWriter

github.com/EventStore/EventStore/b... is a good place to start.

Thread Thread
 
damiensawyer profile image
damiensawyer

Thanks Greg. That's great. I'll have a look.