If you’re a member of a small engineering team working on the first release of an application, there will inevitably be a discussion and a choice about the app’s backend architecture.
Depending on the architecture you choose, you'll either succeed or struggle in scaling with it. The impact of your architecture includes:
- Quality and maintainability
- Ease of adding new features
- Variable vs fixed infrastructure costs
- Ramping up new engineers
- Communication between peers and other stakeholders
- Development cycles
These are several of many second and third order effects that come out of choosing the first architecture for your app. At Courier, we chose to implement an event-driven architecture backed by AWS for our backend, so we could easily scale with many of the vectors listed above.
In this post, I'll explain what event-driven architecture is, why event-driven architectures fits our engineering needs, and the benefits and challenges we’ve experienced in our first year.
Event-driven architecture is a choice to design software around events. Events typically represent a change of state that occurred in the past. Part of the app will dispatch events for each state change and there will be one-to-many reactions to the event. Reactions include changing the view of a read store, invalidating cache, sending a notification, exposing the data via webhook to its consumers, or triggering another business process.
Event-driven architectures promote a highly decoupled system environment because, once the system dispatches the event, it doesn’t need to know what happens afterward. It allows for independent work on the code that can react to the event. There is coupling, though, at the event itself. If there is a destructive change to the shape of the event, all the systems that react to the event will need to change to correctly process it.
Behind the endpoint is a set of processes that takes the notification and determines what recipients will receive in the end. Along each step of the process, we need to raise events that other parts of our app will subscribe to and handle. Some of these handlers include rendering read-only views of our message, hydrating our logs, and moving the notification from a transient state to an end state.
This makes for a complementary relationship between what our engineering needs are and what event-driven architecture offers for a guideline. Another reason why we chose event-driven architecture is that AWS provides up to two event streams off DynamoDB, giving us an easy mechanism to raise events and put them into other services in AWS like Kinesis where we can set up one-to-many observers.
By choosing event-driven architecture, our team has seen a number of immediate benefits, but we've also run into a few notable challenges.
Some of the benefits include:
- We have very focused functions that respond to events whether they are HTTP events or ones we’ve raised through streams. We usually open up the function and, in seconds, are able to understand what it’s doing with or without comments.
- It's been easy to add more observers against the events. There are not many collisions and existing functions stay closed.
- With Datadog (our cloud monitoring service), we can monitor the usage of each handler and see how many times it invokes and estimate the costs that they generate.
- For new engineers, it’s possible to assign them to one or two handlers to start, so they can gain specialization in our tech stack and know how they perform.
Some of notable challenges we've faced are:
- The surface area of the backend becomes more of a web with only parts of it known to any particular engineer. We have to continuously document and diagram it in order to have a full picture of the architecture.
- For new engineers, getting that full picture can make onboarding more difficult. Introducing a change to an event requires knowing every handler using it in order to correctly and safely change behavior.
- Specifically with the AWS stack, we've run into constraints when designing our functions, particularly around the number of resources a CloudFormation stack can have. The initial simplicity of the setup has definitely become more complicated over time, and we have to be careful with how we scale within the AWS ecosystem.
Given our product decision to deliver user notifications based on events, it made sense for us to choose event-driven architecture. We believe this architecture will continue to scale successfully as our product adds more features. Even with some of the early challenges we've faced, the benefits outweigh the cons. As you think about choosing the right architecture for your app, remember to consider what your businesses needs not only in the immediate term but in the long term too.
If you find our architecture and what we do here in interesting, see our open roles! We're always looking for great engineering talent.