DEV Community

Discussion on: Event Storage in Postgres

 
damiensawyer profile image
damiensawyer

All sweet. I hadn't considered the "append only" thing, but that's a great point. I also didn't know about FP being less tolerant of throwing exceptions.

In the C# world I know that exceptions are slow and I definitely try not to throw them during normal execution - however I would have considered a concurrency failure a good case for throwing one (and specifically catching it higher up the call stack to make retry decisions). Perhaps this comes down to personal style and a glass half full vs empty argument.

I've got the idea of using S3 or similar for events as well (how cool is immutability!!). We haven't started designing our integrators and builders yet - but when we do, even if we don't do it now, I want to at least allow for the idea of running them all as serverless functions at massive scale. At the other end of the spectrum I'm wondering if we can build the whole system to run from a wrist watch. I'm thinking Postgres et al on Docker + k8s + Android Wear. No idea if there's a watch that can do it yet, but there might be soon :-)

Thread Thread
 
kspeakman profile image
Kasey Speakman • Edited

re: Exceptions. FP is not necessarily less tolerant. I use F#, and it has pretty rich exception handling. But it is common in FP to represent anticipated error conditions as normal return values rather than through exceptions. Mainly because you cannot see what exception will be thrown unless you look in the code or it is documented. But you can see the return value type. In this case, I expect optimistic concurrency violations sometimes and I expect the caller to handle it. So even though the database adapter will throw a unique constraint violation exception, I will catch it and convert it to a normal return value. However, if the database is just down, I will let that exception propagate up to the caller. Hope that makes sense.

Using messaging (e.g. events) just opens up a whole world of things. It is awesome. :) If you really want your mind blown, event sourcing on the front end.

Thread Thread
 
damiensawyer profile image
damiensawyer

Cheers for that. :-)

Thread Thread
 
alphahydrae profile image
Simon Oulevay

There may be an edge case I have missed and I don't know about performance, but I think it may be possible to enforce that the version increment is always exactly 1 at the cost of some redundancy, by adding the following column and constraints to the table:

PreviousVersion int,
FOREIGN KEY (StreamId, PreviousVersion)
  REFERENCES Event (StreamId, Version),
CHECK (
  Version = 1 AND PreviousVersion IS NULL
  OR
  PreviousVersion IS NOT NULL
    AND Version - PreviousVersion = 1
)
Enter fullscreen mode Exit fullscreen mode

We make each event reference the previous version of the stream with a multi-column foreign key and enforce the increment by 1 with a check constraint. The first event (version 1) is allow to have a null previous version.

Thread Thread
 
damiensawyer profile image
damiensawyer

I've only just skimmed this quickly (trying to get something done with kids screaming!)
Some quick thoughts:

  • If you're always enforcing that PreviousVersion is Version - 1, why store PreviousVersion?
  • In the most recent event sourced system I worked on, we allowed for gaps in the events. This was so we could delete events post production (arrgh!! The nerve!!). This was done after years of managing production event sourced systems and really wishing that we could do that.
  • If you did have gaps, you could have a PreviousVersion column... however, if all of the versions are incremental, do you really need it? The previous version is ascertainable through sorting the events.
  • I guess that if you allows forks in your event stream it would make sense to refer to a parent.... kind of like a directed acylclic graph... but that's not really the use case here.

If any of that sounds confusing or off, it probably is as it was a very quickly considered response :-)

Thread Thread
 
alphahydrae profile image
Simon Oulevay

This idea of the previous version was in response to your (2-year old) discussion with the author about concurrent writers, namely:

The Event table has a unique constraint on (StreamId, Version). If writers are required to always increment the version to be 1 more than they highest event they read (which you may or may not be doing), then if two writers try to compete, that constraint will block the second.

And:

It can fail if a writer tries to write an event with a version greater than 1 more than the current max but, as we control the writers, we can prevent that I guess.

Assuming there's a bug in one of the concurrent writers and it increments by 2 or more, multiple events may be written in the same stream at the same time.

As an intellectual exercise, I was wondering if there was a way to enforce that writers always increment the version by 1 at the database level. That's the purpose of the previous version column. The multi-column foreign key forces an event to reference another event, and the check constraint forces that event to be the previous version of the stream exactly.

If a writer is bugged and tries to increment by 2, the database will refuse to store the event (since it would fail the check constraint). It complements the protection against concurrent writers provided by the unique constraint on stream ID and version, by also protecting against misbehaved writers.

Of course that would only work if you want to implement an extremely strict event store. It would not support gaps between events or deleting events like in the systems you mention. You probably don't want to be that strict in a real-life system.

Thread Thread
 
kspeakman profile image
Kasey Speakman • Edited

So this brings up a classic trade-off of what should be managed through the DB and what should be managed by code. The particular trade-off for managing this constraint in the database is a performance hit. To ensure writers do the right thing, we sided with code and made a library so that they all behave consistently. And in fact I'm still searching for ways to reduce our existing indexing/locking (of Position table, see other comments) which negatively impacts write perf. Not enough to matter for our use cases, but I would like to optimize further.

I've started to see why Greg Young's Event Store made the trade-offs it did -- many of which involve moving responsibilities out of disk into service code. It doesn't have the auto-increment or position-lock problem because it uses a serialized writer so the current position is able to be tracked in memory and recorded to disk when events are written. (Your client code can make concurrent writes, but inside the ES service, writes are serialized and fast.) Instead of a Stream table to record StreamType, you make stream type part of the stream id. Then you can categorize streams after the fact based just on stream ID. Instead of a unique index on StreamId + Version, GYES uses an Event ID, which at the application level you are supposed to create deterministically from StreamId + Version. So the the effect is the same -- duplicate events are prevented from being saved. The big perf gains (especially around Position) are implausible when using Postgres because it is not optimized for this type of use case. But we still want to use Postgres since we'd really need more people before we looked at supporting multiple DBs. And I like that it is a gentle starting point for smaller teams/projects.

Thread Thread
 
damiensawyer profile image
damiensawyer

Kasey, that's really interesting about GYES. It's a product I'd really love to have a play with one day. We looked at it a few years ago when starting our system but I chickened out because "No one ever got fired buying IBM (... in this case, Postgres)".

Simon, I get what you're going for and it's interesting. It would be interesting to see how that affected write performance.

Thread Thread
 
gregyoung profile image
gregyoung

If you want to look at the code involved look in StorageWriter

github.com/EventStore/EventStore/b... is a good place to start.

Thread Thread
 
damiensawyer profile image
damiensawyer

Thanks Greg. That's great. I'll have a look.