Discussion on: Event Sourcing: What it is and why it's awesome

View post

Replies for: My current project is event sourced. I agree it is awesome for all the reasons you describe. I would also add that you can start small and progress...

Agreed, the jury seems to be out on set validation. For that we're using immediately consistent read models, ie. as soon as a UserCreated event is fired we update the projection. This is not an optimal solution though, we have to force all CreateUser usecases to run sequentially. We can't have two running at the same time, or we can't enforce the constraint.

If you go with this solution you have to reporting in place to warn you if you're hitting the limit of sequential requests, otherwise requests will start failing.

Once we get these warnings, we're gonna move to the eventual consistent model you mentioned above, and just accept that there could be duplicates.

This isn't a problem unique to event sourcing either. Standard table driven web apps have the exact same problem when it comes to uniqueness across a set, it's just not as obvious that the issue exists. I think it's because DDD and ES forces you to model your constraints explicitly, while in standard web dev it's not as obvious that you're not actually enforcing your set wide constraints.

Kasey Speakman • Aug 30 '17 • Edited

So I have a line of thought here. Most event sourced systems already have fully-consistent read models in addition to the log itself. Namely, the tables used to track events by aggregate or by event type. They could be reasonably reconstructed from the event log, but they are there as a convenience (an index really) and updated in a fully consistent manner with the event log.

I wonder if it is necessary then that all my read models be eventually consistent. Maybe some of them (especially set-based data) could be fully consistent write models that also get propagated to the read side (if separate databases are used for scaling reads... which is the whole reason for eventual consistency anyway). Then you could validate the set constraint at write time, and even guarantee that duplicates cannot be written (unique index will raise an error if violated while the use case is running) even if the use case checked first, but missed the duplicate due to concurrency. Then there would be no need to make the use cases sequential.

Barry O Sullivan • Aug 30 '17

You can definitely do that, and it will work. A unique index will still force sequential operations, but only on that SQL call, rather than the entire usecase, so the likely hood of failure is much smaller.

The only issue is that you have to handle failing usecases. Say the email was written to the read model at the start of the usecase, but some business operation failed and the event was never actually stored in the log. Now you have a read model with invalid data.

If your event log and constraint projections are stored in the same SQL DB, you can solve this with a transaction around the entire usecase (we do this).

There is still the issue of potential failure. Ie. too many requests hitting the DB, but it's not one you need to deal with at the start. You'll only face this when you have massive numbers of people registering at once. We're currently adding monitoring for this, just to warn us if we're reaching that threshhold. It's unlikely we'll reach it, we're not building Facebook/Uber, but it's reassuring to know it's there.

Kasey Speakman • Aug 30 '17

Yes, I was assuming same database as it is the only convenient way to get full consistency. With separate databases, you pretty much have to use eventual consistency. (Well, distributed transactions exist too, but become problematic under load.)

Kasey Speakman • Jan 10 '20

Update. I have a new tactic on this user email issue for our next product. (Using eventual consistency.) We decided to allow a login account to belong to multiple organizations. That means that different orgs are free to create an user account with an email address that is already used. When the user logs in with that email, they will be able to see a dashboard across organizations.

We use an external auth provider that does not allow duplicate email accounts. We have an event sourced process manager that listens for user creation and email changes. We generate a deterministic UUID from the email address and use it as the stream ID to track/control the external auth integration. E.g. Replay the stream to get current state of the integration for that email. And take appropriate action and save events based on current state and the event received.

To make this more resilient, I also add causation IDs to every event that this process manager saves. And when I replay the stream, I verify that the event I just received is not found among the causation IDs of all replayed events. It didn't cost too much code to do this, and it eliminates the possibility of accidentally retriggering side effects/events that were already done before. Since this is an autonomous component, it seemed important to prevent this.