DEV Community

Cover image for Event Sourcing: What it is and why it's awesome
Barry O Sullivan
Barry O Sullivan

Posted on

Event Sourcing: What it is and why it's awesome

At the last PHPDublin meetup I was asked "What do you do?" and as usual the answer boiled down to "I design and build event sourced applications". Which leads to the following question. "What is Event Sourcing?".

That's where this article came from, it is my best shot at explaining Event Sourcing and all the benefits it brings.

The Status Quo

Before we get into the nitty gritty of event sourcing, let's talk about the status quo of web development.

At it's heart, current web dev is database driven. When we design web apps, we immediately translate the specs into concepts from our storage mechanism. If it's MySQL we design the tables, if it's MongoDB, we design the documents. This forces us to think of everything in terms of current state, ie. "How do I store this thing so I can retrieve (and potentially change) it later?".

Standard Web Development Process

This approach has three fundamental problems.

1. It's not how we think

We as a species do not think or communicate in terms of state. When I meet you for a coffee and ask "What's happening?", you don't tell me the current state of the world and then expect me to figure out what's changed, that would be insane.

"Well, I have a house, a car, a refrigerator, three social media accounts, a cat, a pain in my right foot, the crippling self-doubt that I'm terrible at conversations, another cat ...etc"

See what I mean? That's crazy. In reality you'd tell me the new things that have happened since we last talked, and from that I'm able to figure out how your world looks now. In short, you tell me a story, and a story at it's simplest is a sequence of events.

2. Single data model

In the above we use the same model for both reads and writes. Typically we'd design our tables from a write perspective, then figure out how to build our queries on-top of those structures. This works for well small apps, but becomes problematic in large ones. You see, it's next to impossible to build a generic model that is optimised for both reads and writes. As the system grows, the queries will get more and more complex, eventually hitting a point where every query contains 10 joins and is 100 lines long. This soon becomes unmaintainable, brittle and expensive to change.

3. We lose business critical information

This is a big one. With a standard table driven system, you're only storing the current state of the world, you have no idea how your system got into that state in the first place. If I were to ask you "How many times has a user changed their email address" could you answer it? What about "How many people added an item to their cart, then removed it, then bought that item a month later"? That's a lot of useful business information that you're losing because of how you happen to store your data!

Event Sourcing

Event Sourcing (ES) is opposite of this. Instead of focussing on current state, you focus on the changes that have occurred over time. It is the practice of modelling your system as a sequence of events.

Let's give an example. Say we have the concept of a "Shopping Cart". We can create a cart, add items to it, remove them, and then check it out.

A cart's lifecycle could be modelled as the following sequence of events:

Event
1. Shopping Cart Created
2. Item Added to Cart
3. Item Added to Cart
4. Item Removed from Cart
5. Shopping Cart Checked-Out

And there we go, that's the full lifecycle of an actual cart, modelled as a sequence of events. That is Event Sourcing. Simple huh!

Event Sourcing Development Process

Pretty much any process can be modelled as a sequence of events. Infact, every process IS modelled as a sequence of events. Talk to a domain expert, they won't talk about "tables" and "joins" (unless you've indoctrinated them into tech concepts), they'll describe processes as the series of events that can occur and the rules applied to them.

How do you enforce business rules?

Most business operations have constraints, a hard rule that cannot be broken. In the above, a hard rule would be "An item must be in a cart before it can be removed". You can't remove an item if it was never added to the cart, that sequence of events should never happen. So to enforce this constraint, you need to answer the question, "Does this item exists?", how do you do that without state?

Turns out this is easy, you just need to check that an "Item Added to Cart" event has happened for that item, then you know the item is in the cart and that it can be removed. Business rule enforced.

This is how you answer every question about state in an event sourced system, you replay a subset of the events to get the answer you need. The is usually called projecting the events, and the end result is a "projection".

Isn't this expensive and time consuming?

Not at all. To enforce constraints, you typically only need the tiniest subset of events. Fetching the useful history of a concept, such as a Cart, is typically a single database call. You load the events and replay them in memory, "projecting" them, to build up your dataset. This is lighting fast, as you're doing it on the local processor, rather than making a series of SQL calls (network calls are insanely slow compared to local operations, by at least two orders of magnitude!).

What about showing data to the user?

If every piece of state is derived from events, how do you fetch data that needs to be presented to the user? Do you fetch all the events and build the dataset each time?

The answer is no, you don't, that would be ridiculous.

Instead of building it on the fly, you build it in the background, storing the intermediate results in a database. That way, users can query for data, and they will get it in the exact shape they need, with minimal delay. In effect, you cache the results for later use.

Building Projections from Events

Now, this is where things get really interesting. With ES, you are no longer bound by your current table structure. Need to present data in a new shape? Simply build up a new data structure designed to answer that query. This gives you complete freedom to build and implement your read models in any way you want, discarding old models when they're no longer needed.

The benefits

We've looked at a few of the benefits all ready, but lets go deeper, because believe me, this style of development offers a host of benefits that I can no longer live without.

1. Ephemeral data-structures.

Since all state is derived from events, you are no longer bound by the current "state" of your application. If you need to view data in a new way, you simply create a new projection of the data, projecting it into the shape you need. Say bye-bye to messy migration scripts, you can simply create a new projection and discard the old one. I honestly cannot live without this anymore, this alone makes ES worthwhile.

2. Easier to communicate with domain experts

As I said before, domains experts don't think in terms of state, they express business processes as a series of events. By building an event sourced system, we're modelling the system exactly as they describe it, rather than translating it into tech concepts and losing information. This makes communication a lot smoother, we are now talking in their language and this makes all the difference when writing software.

3. Expressive models

Event Sourcing forces you to model events as first class objects, rather than through implicit state changes (ie. changing a value in a table). This means your models will closely resemble the actual processes you're modelling. This brings a lot of clarity to the table and stops you getting lost in the details of your storage technology. It makes the implicit explicit.

4. Reports become painless

Generating complex business reports becomes a walk in the park in an event sourced system. You have the full history of every event that has ever happened, in chronological order, this means you can ask any question you like about that data historically.

Think of the power of this! Take the earlier example, you want to know how many users removed an item from their cart and then bought it a week later. In a standard web app this would take weeks of development, and once it's released, you have to wait for the data to populate before you could generate the report. With an ES system, it's built in from the get go, so you can generate that report right now. You could also generate the report for any previous point in the time. In other words, you have a time machine.

5. Composing services becomes trivial

In standard web dev, plugging two systems together is usually quite tricky, and it frequently leads to tight coupling. ES solves this problem by letting services communicate via events. Need to trigger a process in another service when something happens? Write an event listener that runs whenever that event is fired. This allows you to add new integrations/functionality, without having to modify existing domain code.

For example, say you want to send a welcome email when someone registers. Instead of modifying core registration code, you would simply create an event listener for the "User has Registered" event, which then triggers an email through some external service. Simple.

6. Lightning fast on standard databases

You don't need to use a fancy database to store your events, a standard MySQL table will do just fine. Databases are optimised for append only operations, which means that storing data is fast, while modifying data is slow. This is why ES works so well with current tech, events are append only.

7. Easy to change database implementations

Due to the ephemeral nature of event sourced data-structures, you now have full freedom to use any database technology you like to store state. This means you get to choose the best tool for the job. And If you find a better tool, and you can switch to it at any point, there is zero commitment. This gives you an incredible amount of freedom. Case in point, we're currently moving some of our more complex projections from MySQL to OrientDB, and so far it's been a breeze.

The issues

Like anything, ES is not a free lunch. Here are some of the surprises you will run into.

1. Eventual Consistency

An ES system is naturally eventually consistent. This means that whenever an event occurs, other systems won't hear about it immediately, there is a short delay (say ~100ms) before they receive and process the event, gaining consistency. This means that you can't guarantee that the data in your projections is immediately up-to-date. This sounds like a big deal, but it really isn't. Modern web dev apps built on ReactJS will build up state from actions as the user performs operations, so having a query side that's lagging a few milliseconds is a non issue.

TBH, this is actually a blessing in disguise. Eventually consistent systems are fault tolerant and can handle service outages. If you're building a distributed app using a micro-service or serverless architecture, it needs to be eventually consistency to be stable. See CAP theorm.

2. Event Upgrading

Events will change shape over time, and this can be a bit tricky to handle if you don't plan for it in advance. When an event changes shape, you have to write an upgrader that takes the old event and converts it into the new one. This is done on the fly when you read the events from the store. Think of them as migrations for your events. Not as difficult as it sounds, just have a strategy in place for upgrading events and you'll be fine.

3. Developers need deprogramming

The status quo in web dev is state driven development, this means developers will immediately jump to thinking in tables, rather than events. I've found it takes time to "deprogram" developers, they need to unlearn all the bad habits they've picked up. This usually manifest as events that are pure CRUD and don't mirror the domain language. The best way to combat this is to pair new developers with experienced ESers.

Conclusion

There we have it. At this stage it should be clear that I love event sourcing. It solves all the big problems our team has faced when building large scale distributed business software. It allows us to talk to the business in their language, and it gives us the freedom to change and adapt the system with ease. Throw in built in business analytics and you've got a winning combination. There is a learning curve, but once you get into Event Sourcing, you'll never want to look back, I know I don't.

If you're interested in hearing more about event sourcing, be sure to subscribe to my blog and follow me on twitter. If you'd like to talk further about Event Sourcing, please contact me at barry@tercet.io.

Oldest comments (73)

Collapse
 
jlhcoder profile image
James Hood

Yes! Great write up! I'm a huge fan of event sourced architectures, especially how well they can be used in serverless architecture. Keep spreading the word!

Collapse
 
kspeakman profile image
Kasey Speakman • Edited

My current project is event sourced. I agree it is awesome for all the reasons you describe. I would also add that you can start small and progressively optimize. Small teams don't have to opt into eventual consistency, process managers, snapshots, etc. from the get-go. These can be evolved later as needs arise.

Additional challenge for ES: set validation -- i.e. unique email. To say an email is unique creates a relationship between that email and every other email in the set (mutual exclusivity). Log storage is very inefficient at this type of verification, since you have to replay the entire log every time to validate it.

The party line on set validation has been to keep the set data with the other eventually-consistent read models. Then in the rare case a duplicate email address slipped in before the eventually consistent data got updated, detect the failure (could not add an email to the set), and compensate (notify an admin).

There is also a school of thought that set-based data should just be modeled as relational data, not event sourced. The main loss there is the audit log. Temporal tables could be used to get an audit log, but they are generally more of a pain to work with. That's the tricky part of being an architect: weighing the trade-offs and picking ones which are best for your product.

Collapse
 
barryosull profile image
Barry O Sullivan

Agreed, the jury seems to be out on set validation. For that we're using immediately consistent read models, ie. as soon as a UserCreated event is fired we update the projection. This is not an optimal solution though, we have to force all CreateUser usecases to run sequentially. We can't have two running at the same time, or we can't enforce the constraint.

If you go with this solution you have to reporting in place to warn you if you're hitting the limit of sequential requests, otherwise requests will start failing.

Once we get these warnings, we're gonna move to the eventual consistent model you mentioned above, and just accept that there could be duplicates.

This isn't a problem unique to event sourcing either. Standard table driven web apps have the exact same problem when it comes to uniqueness across a set, it's just not as obvious that the issue exists. I think it's because DDD and ES forces you to model your constraints explicitly, while in standard web dev it's not as obvious that you're not actually enforcing your set wide constraints.

Collapse
 
kspeakman profile image
Kasey Speakman • Edited

So I have a line of thought here. Most event sourced systems already have fully-consistent read models in addition to the log itself. Namely, the tables used to track events by aggregate or by event type. They could be reasonably reconstructed from the event log, but they are there as a convenience (an index really) and updated in a fully consistent manner with the event log.

I wonder if it is necessary then that all my read models be eventually consistent. Maybe some of them (especially set-based data) could be fully consistent write models that also get propagated to the read side (if separate databases are used for scaling reads... which is the whole reason for eventual consistency anyway). Then you could validate the set constraint at write time, and even guarantee that duplicates cannot be written (unique index will raise an error if violated while the use case is running) even if the use case checked first, but missed the duplicate due to concurrency. Then there would be no need to make the use cases sequential.

Thread Thread
 
barryosull profile image
Barry O Sullivan

You can definitely do that, and it will work. A unique index will still force sequential operations, but only on that SQL call, rather than the entire usecase, so the likely hood of failure is much smaller.

The only issue is that you have to handle failing usecases. Say the email was written to the read model at the start of the usecase, but some business operation failed and the event was never actually stored in the log. Now you have a read model with invalid data.

If your event log and constraint projections are stored in the same SQL DB, you can solve this with a transaction around the entire usecase (we do this).

There is still the issue of potential failure. Ie. too many requests hitting the DB, but it's not one you need to deal with at the start. You'll only face this when you have massive numbers of people registering at once. We're currently adding monitoring for this, just to warn us if we're reaching that threshhold. It's unlikely we'll reach it, we're not building Facebook/Uber, but it's reassuring to know it's there.

Thread Thread
 
kspeakman profile image
Kasey Speakman

Yes, I was assuming same database as it is the only convenient way to get full consistency. With separate databases, you pretty much have to use eventual consistency. (Well, distributed transactions exist too, but become problematic under load.)

Collapse
 
kspeakman profile image
Kasey Speakman

Update. I have a new tactic on this user email issue for our next product. (Using eventual consistency.) We decided to allow a login account to belong to multiple organizations. That means that different orgs are free to create an user account with an email address that is already used. When the user logs in with that email, they will be able to see a dashboard across organizations.

We use an external auth provider that does not allow duplicate email accounts. We have an event sourced process manager that listens for user creation and email changes. We generate a deterministic UUID from the email address and use it as the stream ID to track/control the external auth integration. E.g. Replay the stream to get current state of the integration for that email. And take appropriate action and save events based on current state and the event received.

To make this more resilient, I also add causation IDs to every event that this process manager saves. And when I replay the stream, I verify that the event I just received is not found among the causation IDs of all replayed events. It didn't cost too much code to do this, and it eliminates the possibility of accidentally retriggering side effects/events that were already done before. Since this is an autonomous component, it seemed important to prevent this.

Collapse
 
alexeyzimarev profile image
Alexey Zimarev

"Validation" is not an issue on its own. This is just a subset of eventual consistency issue. Generally speaking, CQRS is suffering from this, not necessarily the event-sourcing.

Collapse
 
kspeakman profile image
Kasey Speakman

I disagree and don't think this is a problem specific to those patterns. I think it is a human nature problem of getting a new data hammer and then every bit of data looks like a nail. It's part of how we learn appropriate uses of the tools, but it's nice when somebody can save you the trouble of learning the hard way.

Thread Thread
 
alexeyzimarev profile image
Alexey Zimarev

This is not what I meant. Essentially, the "validation" issue comes each time you have any bit of eventual consistency, where you have to check the system state in a potentially stale store. CQRS is more often eventually consistent and the question of validation comes as a winner in the DDD-CQRS-ES mailing list and on StackOverflow. However, event-sourced system can have a fully-consistent index on something that needs to be validated and therefore this problem can be removed.

Thread Thread
 
kspeakman profile image
Kasey Speakman

I understand what you are saying, but I didn't want to conflate the issue with the universe of DDD/CQRS things. I just wanted to point out that sets are one of the things that log based storage does not do well. Go ahead and look at relational tables for that. Either as the source of truth for set data or -- and I like how you put it -- as an index from log storage.

Collapse
 
amalrik profile image
Amalrik Maia

Very clear explanation. Thanks for this article.

Collapse
 
tmikaeld profile image
Mikael D

Super interesting writeup, this is the first time i hear of Event Sourcing and it would be really interesting to see a sample project that uses this, like an actual cart as in your example. Do you know of any?

Collapse
 
iss0iss0 profile image
Jan

I found this example at Microsoft docs.microsoft.com/en-us/azure/arc..., but sadly no code. Maybe it is of use for you nevertheless. ;)

Collapse
 
barryosull profile image
Barry O Sullivan • Edited

Hi Mikael,

I don't have any examples to hand, though there are plenty of them online. The thing about ES, is that everyone has a different way of applying it and structuring their code, so there isn't an example I can point to and say "this is how you should do it".

I may write up an example project in future, if I do, I post a link here.

Collapse
 
tmikaeld profile image
Mikael D • Edited

Thanks for the reply!

I checked out geteventstore.com/ and got a good grasp of it, It seems very similar to worker queues that I'm currently working with (Via kr.github.io/beanstalkd/).

Though, no one seems to recommend building a full CRUD app using ES.

Thread Thread
 
barryosull profile image
Barry O Sullivan

Oh yeah, there are times were CRUD is a better fit than ES. If the solution is simple, once off and not the core to your business, building a CRUD app is fine. So CRUD is a workable implementation for a todo list app that's only used by a handful of people internally in the business

If it's anything more complex than that (and most things are), or it's something that is crucial to the success of your business, ES is a better fit. It forces you to understand your domain and it's language, rather than throwing an extra column into a table to hack the a solution in.

To give another example, if you need a blog for your business, for some basic marketing, CRUD is usually fine. If your business is about blogs, understanding how they work and how people use them, then ES is a better fit.

Hope that helps.

Collapse
 
karel1980 profile image
Karel Vervaeke

Here are some for the axon framework (a Java ES framework): axonframework.org/samples/

Thread Thread
 
ben profile image
Ben Halpern

Nice

Collapse
 
stephaneeybert profile image
Stephane Eybert

Hi Barry,

Your writting of the Oreilly Event Sourcing Cookbook is progressing fine ? :-)

Learning how to think event sourcing, what pitfalls to avoid, the smart tricks to keep, etc..

That'd be very valuable.

Event sourcing to the noob sounds like sex to the graduate, it is too much fun not to try it :-)

Stephane

Collapse
 
barryosull profile image
Barry O Sullivan

Actually, I just remembered, a friend of mine, @lyonscf , has written a really solid ES example of a Shopping Cart.

github.com/boundedcontext/bounded-...

It's in PHP and written using a framework we co-authored. Look at "Aggregate.php" in a Aggregate/Cart to see the events being applied, and the invariants (synonym for constraint) being checked/enforced. Hope you find it helpful!

Collapse
 
tmikaeld profile image
Mikael D

This is exactly what I was hoping for and more, thanks! I'll look through it and test it out.

Collapse
 
compilaquindiva profile image
Marco Breveglieri

I find Event Sourcing really interesting, but there are a lot of questions that come into my mind as a newbie in this world. :)

Using this approach requires you (the developer) to model your database using tables to store "events" instead of "entities". Suppose you want to maintain a long list of customer data and retrieve the full list of customers, is it not too slow scanning the entire event table to rebuild the current status of all the available customers?

I know you can do snapshots, but does this approach require you to take snapshots too much often in order to keep everything fast?

Collapse
 
barryosull profile image
Barry O Sullivan

Hey Marco,

Glad to answer. In Event Sourcing, you wouldn't have tables per model, you would have one table that stores all the events for your system, this would is called the event log.

If you're using mysql, you'd typically encode the event itself as JSON and just store. You'd also have columns for storing the aggregate (root entity) and the aggregate id (root entity id), so you can easily fetch the subset you need when validating business operations.

Now, this is not optimal for your proposed usecase "retrieve the full list of customers". This is where projections come in. You would create a read model (projection) that is built from all the "customer" events. Everytime a "customer" event is fired, this read model is listening and updates itself.

Hope that answers your question.

Collapse
 
compilaquindiva profile image
Marco Breveglieri

Thanks for your really clear answers, Barry.

It seems that for any "trouble" you might encounter applying an ES-based model, there is a workaround to balance disadvantages with benefits.

I will deepen the subject to know more, since the greater obstacle (for me) is more a matter of "mind shifting" than technical shortage. :)

Thanks again!

Collapse
 
brightide profile image
Jarred Filmer

Would I be right in assuming that if you want to do a query on anything that isn't the aggregate id you'd have to project the entire set for the type in question first?

Thread Thread
 
barryosull profile image
Barry O Sullivan

That's exactly how you'd do it. There are lots of strategies for this, it depends on your usecase.

Collapse
 
stephaneeybert profile image
Stephane Eybert

So the aggregate is not the data in the projection ? What is this aggregate and its aggregate id then ?

Collapse
 
iamchilas profile image
Chilezie R Unachukwu

I'm just hearing about this for the first time and would love to give it a shot. What would be your recommended source to learn about Event Sourcing?

Collapse
 
barryosull profile image
Barry O Sullivan

Id' start with this talk from Greg Young on the concept, give some more details on it.
youtube.com/watch?v=JHGkaShoyNs

Then I'd follow it up with this, it's a site that goes through all the technical concepts of ES (as well as DDD) and gives a solid breakdown of each concept. I'd start with the FAQ, as a primer, then read the rest.
cqrs.nu/Faq

I still visit this site.

Collapse
 
kspeakman profile image
Kasey Speakman • Edited

From your description, it sounds like your system was event driven (maybe a downstream event processor? fed from a message bus?), but not event-sourced. In an event sourced system, you don't lose events. It would be equivalent to losing a row in a database -- disaster recovery plans kick in.

Integration between systems is more the realm of Event-Driven Architecture. There it is totally possible to miss events or have them delivered out of order, and that is a large part of the challenge with those integrations. Events are a common concept between EDA and ES, but their uses are different.

I currently have an event sourced system which is fully consistent (between event log and read models). Mainly because I did not have time to implement the necessary extra bits to handle eventual consistency. I will add them later as needs arise. Just to say that consistency level is a choice.

Collapse
 
barryosull profile image
Barry O Sullivan

I could not have said that better myself, totally spot on.

Collapse
 
stephaneeybert profile image
Stephane Eybert

I thought fully consistent was "more" or "sooner" consistent than eventually consistent. If you already have fully consistent, why and how to achieve eventually consistent?

Thread Thread
 
kspeakman profile image
Kasey Speakman • Edited

By fully consistent, I mean that the event log and the read models are updated in the same transaction. Either all changes happen or not. There's no possibility of inconsistency between them.

Why go eventually consistent? To scale reads. In most systems data is read orders of magnitude more frequently than it is written. In more traditional databases, it is common to see replication employed to make read-only copies of data to handle high read loads. These are eventually consistent (the linked doc says up to 5 minutes!), although not usually called that by name.

How to go eventually consistent with event sourcing? I already have code in place to translate events into SQL statements (required for full consistency). What's still missing to go eventually consistent: 1) Moving event listeners to no longer be co-located with the write-side API. I.e. Hosting them in a different always-running service. 2) Adding checkpointing so each listener can keep track of the last event it saw, in case of restarts. 3) (Optional) A pub/sub mechanism to be notified when new events come in. Alternatively, just poll.

The pattern I use here for read models can be a template for any kind of event listener. Examples: Sending emails in response to events. Tracking a process and sending commands back into the system as stages are reached (Process Manager).

Then I just spin up as many copies of the read model service (and a corresponding database) as I need to handle my read load, and put them behind a load balancer. The read load on the event store is relatively low since each event is read just once per service. As a bonus because events are immutable, you can employ aggressive caching to avoid hitting the database after the first read of each event.

There exists a product -- Event Store -- which does a lot of this already. However there is no good way to opt into full consistency with relational storage. For our new product/team, full consistency between event log and read models saved some time to market. And I have a path for growing into eventual consistency as the need arises. We may switch to Event Store at some point.

Thread Thread
 
stephaneeybert profile image
Stephane Eybert

Now that's a carefully crafted answer. Rich and accessible content even. I'll keep it in my text memo. Thanks a lot Kasey !

Collapse
 
alexeyzimarev profile image
Alexey Zimarev

I would argue that using conventional tables to store events is a good choice. Yes, it seems like it, and you wrote "append only". But indices are there too. We do have a conventional event log (not for an event-sourced system) with indices to query it and it takes hell a lot of time to query anything from there and we often get timeouts writing to it. MS SQL here, well tunes, on powerful machines. So, it is just a matter of time, until you hit this.

For event-sourced system the requirement for your store is at least to have streams. Yes, you can use tables as streams but this is it. You can probably use views but they are virtual. I mean you need real streams, like EventStore has. You can partition your events using projections, with linked events you get references to original events in new streams. This means you can do advanced indexing without paying costs to have conventional RDBMS index, which is optimised for a different purpose.

Also I would argue that having one table for all events is a good choice. Yes, it might make projections easier, having a stream per aggregate makes much more sense. Recovering aggregate from events is much easier then. Running projections to a read model would require to have a per-aggregate-type projection, which in EventStore is elegantly solved by category projections.

Collapse
 
barryosull profile image
Barry O Sullivan

Hi Alexey,

Thank you for the excellent feedback. You raise some important points.

As you said, MySQL will work well for now (it solves the problem for the near future, 2+years), after that we've been told that MySQL will start to struggle with the log, exactly as you described. Eventstore is a solid option, we're looking into it and other technologies better suited to massive event streams.

As for the one table for all events, we've had no issues with it. Now, this doesn't mean there's one event log for ALL events, just for the events produced by a service. We're currently indexing by aggregate ID and aggregate Type, so we can easily select the subset we want. We may move to a per aggregate event store, but I'm not happy with this, as it makes it harder to change aggregate boundaries. We have metrics in place to monitor performance, so once it starts becoming problematic we'll be warned and can prepare a solution.

For projection replaying, rather than connecting to the log, we plan for the projections to connect to a copy of the log, optimised for projection reads. We're thinking of using Kafka for this. It will keep the event log indefinitely (if we want it to) and it will at least ensure ordering. This will give us more life out of our MySQL log and also speed up projection rebuilding.

Collapse
 
xuemingdeng profile image
xuemingdeng

I'm an editor at InfoQ.com.cn,May I translate your post into Chinese for appropriate credit?

Collapse
 
barryosull profile image
Barry O Sullivan

Yeah, no problem. That would be great. :)

Collapse
 
ajay_kwal profile image
Ajay Khandelwal

Good write-up Barry. We did that sort of things in healthcare.. patient get registered, the patient is admitted, diagnosed, discharged etc events. HL7 in healthcare is built on event model and I love it.

Most enterprise systems built with a concept that of events are generated after data is written to the relational data model. The reason being they have been using these events primarily for system integration ( messaging ).

On other hands, event sourcing builds a data model based on event series and payload structure.

Do you agree?

Collapse
 
barryosull profile image
Barry O Sullivan

I definitely do. The status quo in enterprise is to write to a relational model, then broadcast events based on those changes. In effect, the events are projections of the relational data, which in my mind is putting the cart before the horse. This is a flavour of CQRS, and is considered a stepping stone in migrating to an Event Sourced system, rather than the end result.

Greg Young talks about it during this talk.

I didn't know HL7 was message based, I'll have to read up on it more, thanks for that!