DEV Community

Cover image for Making a Business Case for an Event-Driven Architecture
Marcus Kohlberg for Encore

Posted on • Originally published at encore.dev

Making a Business Case for an Event-Driven Architecture

In the first post in this series, we discussed what an Event-Driven Architecture (EDA) is and spoke at a high level about why you might consider building one.

In this article, we'll hash out if an EDA is your cup of tea, run through some business cases and consider the tech choices you'll need to make. Ready to explore? Let's jump right in.

Making a Business Case for an Event-Driven Architecture

Making a compelling business case for EDA requires not only a deep comprehension of its value proposition and resource requirements, but also a pragmatic plan to utilize EDA in your particular business context. This includes the initial setup cost, ongoing maintenance, and how it will integrate with existing systems. You must clarify these benefits, quantify them where possible, and set them against the expected costs.
Here’s some benefits you may want to consider.

1. Enhanced Real-Time Decision-Making

EDA makes real-time processing possible by acting on events as soon as they occur. Traditional request-response systems, by comparison, must wait for a client request to initiate any action. In the context of an e-commerce system, a request-response model could lead to overselling if multiple purchase requests for the same item are made simultaneously, and the inventory isn't updated until the responses are processed. EDA, with its real-time event handling, eliminates this risk by immediately updating the inventory upon receiving a purchase event. Developers could set up listeners for specific events (like a purchase event), ensuring data accuracy and timeliness that would be difficult to achieve with a request-response system.

2. Increased Scalability

EDA's decoupled nature allows different parts of a system to scale independently based on the event load they handle, unlike traditional request-response systems where the entire system must scale to handle increased traffic. For a growing startup, an EDA-based system can efficiently adapt to the changing data flow and load, leading to cost savings on infrastructure. Developers could use a cloud-based serverless architecture where functions tied to specific events scale automatically based on the event load, offering scalability that's challenging to achieve in a request-response model without over provisioning resources.

3. Improved User Experience

With EDA, you can provide timely updates to users as soon as relevant events occur. In a traditional request-response system, users would need to send a request (like refreshing a page) to get updates. An example can be a travel booking platform where EDA is used to instantly notify users about changes in flight schedules. Instead of users repeatedly checking for updates, EDA can proactively send notifications, resulting in a superior user experience. Developers can accomplish this by setting up appropriate triggers and push notifications tied to those triggers, creating a more responsive and interactive experience than a traditional request-response model can offer.

4. Promoting Innovation

EDA fosters innovation by promoting loose coupling and asynchronous processing. Businesses can develop new features and services without disrupting the entire system. Development teams can use message queues or event buses to asynchronously process events, allowing for the creation of new features with minimal impact on existing functionalities. This flexibility can lead to faster response to market changes, enabling the organization to stay ahead of the competition.

5. Increased System Resilience

EDA's decoupled nature makes systems more resilient to failure. In a monolithic system, a failure in one component can bring down the entire system. With EDA, a failure in one component won't directly impact others. Developers can implement this resilience by using a publish-subscribe pattern, ensuring that failures are contained and do not affect the entire system. This can be measured by improved system uptime and reduced impact of individual component failures.

Understanding the intricacies of EDA is crucial, particularly when considering the business-wide impact of this architectural shift. Yes, EDA amplifies real-time decision-making, scalability, user experience, and promotes innovation, but it also introduces additional complexity into your systems. Asynchronous processing and decoupling, both inherent in EDA, can complicate debugging and tracing transactions. Additionally, depending on the nature and volume of your events, there might be considerable overhead in message passing between services.

Once these aspects are weighed, the task is to quantify the potential benefits. This includes calculating the expected Return on Investment (ROI), the reduction in time-to-market for new features, and improvements in operational efficiency. Performance monitoring tools can be instrumental in quantifying these efficiency gains and constructing a data-backed case for adopting an EDA approach. Yet, these potential gains must be considered in tandem with the resources required to manage increased system complexity and the need for robust monitoring and debugging tools.

How can we quantify the benefits for our system (if there are any)?

Firstly, consider implementing a pilot project in a non-critical business area to assess EDA's impact. This provides concrete data and reduces the risk of over or underestimating the benefits. From a developer's perspective, this could involve applying EDA principles to a single service or feature and measuring the impact.

Secondly, address the costs associated with the EDA implementation, including initial setup, ongoing maintenance, and training for the team. Here are some things you may want to consider:

Expense Categories Potential Costs
System Design and Implementation With EDA, your system design might require more planning and expertise due to the need for asynchronous processing, event handling, and distributed systems knowledge. This could potentially increase implementation costs.
Additional Infrastructure EDA systems often utilize additional infrastructure. While both self-managed (open-source or commercial) and managed solutions add operational costs, self-managed solutions come with additional overhead of maintenance, and managed solutions may have higher upfront costs as well as vendor-lock in.
Maintenance and Operation The distributed nature of EDA might necessitate more robust monitoring and alerting systems to catch issues early and reduce downtime. Debugging can also be more challenging with asynchronous processes and could increase maintenance costs
Training Your team might need training to understand and effectively use EDA concepts. While some might be able to learn on the job, others may need formal training, which can add to costs.
Hiring Given the relative complexity of EDA, you may need to hire more experienced engineers, who typically command higher salaries. Plus, the pool of engineers with in-depth EDA experience is smaller, potentially inflating the market rate for this skill set.

It's important to understand that costs associated with system design, infrastructure, training, and hiring will be influenced by your unique business situation. Thus, as strategic leaders, it's necessary to assess these variables against the anticipated benefits. Remember, transitioning to EDA should be seen as a strategic investment aimed at long-term gains, rather than an immediate cost-saving measure.

It's equally crucial to address the potential challenges that come with the EDA approach. For instance, the system complexity can increase significantly, transaction tracing might become more challenging, and there could be added overhead costs due to increased message communication. These elements add to the total cost of ownership and must be planned for with effective strategies and resources.

Building a compelling business case for EDA requires a balanced evaluation. This entails a detailed examination of its benefits, potential challenges, and the associated costs, supported by solid metrics, a comprehensive cost structure, and a feasible implementation plan. Armed with this in-depth perspective, strategic leaders can guide their teams towards informed decisions regarding EDA adoption, helping developers incorporate EDA into their projects with a full understanding of its advantages, complexities, and potential challenges.

In a hypothetical business case, let's consider a mid-sized e-commerce company adopting EDA. The initial investment is projected at $1.2 million, encompassing infrastructure upgrades, implementation, training programs, and the increased cost of managing system complexity. As a CTO, you hope that implementing an EDA can lead to a 15% reduction in time-to-market and a 20% increase in operational efficiency, amounting to a potential annual saving of $3 million. You also hope that this transformation could enhance customer experience, since real time updates will now be something you can offer. You hope this will lead to a 10% increase in customer satisfaction, which could boost annual revenue by an additional $2 million. Keep in mind that system maintenance costs might rise by 10%, approximately $200,000 annually. Nonetheless, a net annual benefit of about $3.6 million positions EDA as a valuable strategic investment. Also, the potential for innovation that EDA introduces can lead to novel business opportunities, the monetary value of which can be significant, though challenging to quantify upfront.

Choosing an Event Router

Choosing the right event router for your EDA is a critical task that requires careful assessment of a variety of technical and financial considerations. The event router acts as the circulatory system of an EDA, enabling the flow of event notifications between different components. Thus, key factors to take into account include its performance, reliability, scalability, cost-effectiveness, compatibility with your existing systems, and the trade-offs among these elements.

The performance of an event router refers to its capacity to swiftly and proficiently transport events across your system. This is an essential criterion for applications that demand real-time or near real-time event processing.

Reliability is another crucial consideration. Your event router should ensure that all events are consistently delivered without loss, even in the event of system failures or network disruptions.

Event Driven Architecture Reliability

Scalability relates to the event router's capacity to accommodate an increase in event traffic as your system evolves and grows. Efficient scalability is vital for businesses anticipating significant growth or those dealing with substantial volumes of data.

On the financial side, costs are a primary concern. This includes both the initial investment and ongoing operational costs such as maintenance and scaling. It is important to consider a range of costs. For instance, the initial setup cost for an event router like Apache Kafka might be around $10,000 to $15,000, depending on the size and requirements of your system. Cloud solutions, such as AWS SNS/SQS, Google Pub/Sub, or Azure Event Hubs, can start from as low as $0 per month, scaling up with usage. Keep in mind that these are approximate figures and actual costs will vary depending on your specific needs and region.

Diving deeper into the 'actual costs' that will be influenced by your specific needs and region, it's crucial to understand the factors that contribute to this variability. The scope of your operations, for example, impacts the overall cost -- larger systems require more extensive resources, often driving the price up. Similarly, the geographical location of your operations can cause fluctuation in cost due to variations in local labor, energy, and other overheads. The complexity of your system also plays a substantial role, with more complex systems necessitating advanced features or increased capacity, thereby increasing expenditure. Beyond these, your unique needs may include certain security measures, regulatory compliance, or a specific degree of customer service. Each of these factors can alter the costs you incur, making it essential to carefully assess them in your budgeting process.

The ease of use is crucial, including the intuitiveness of the user interface and the complexity of setup and maintenance processes. A user-friendly system can significantly decrease the time and effort required to manage your events.

Community support can also be a decision-making factor, as a large and active user community often leads to more readily available assistance, ample learning resources, and quicker identification and resolution of common issues.

Lastly, compatibility involves ensuring that the selected event router integrates smoothly with your existing systems and technology stack. This includes considering the programming languages and frameworks your team is already using.

For instance, a burgeoning fintech company that processes thousands of transactions per minute would need an event router with high performance and reliability to handle this volume and ensure no transaction data is lost. Given their expansion plans, scalability is a top priority. Moreover, if their tech stack is based on Go and .NET, compatibility with these languages is crucial. The cost and ease of use should also be taken into account, especially if they have a restricted budget or a small team. Community support for each option can be a valuable resource in their EDA journey.

While Apache Kafka offers high performance and scalability, it may demand more setup and maintenance effort compared to managed solutions like AWS SNS/SQS, Google Pub/Sub, or Azure Event Hubs, which have predictable monthly costs and offload the maintenance responsibility to the provider. NATS or RabbitMQ might be more cost-effective and user-friendly, but they may offer lower scalability or reliability compared to Kafka or the cloud solutions.

In the end, the choice of an event router in EDA is about thoughtfully evaluating your specific requirements and comparing them to the performance, reliability, scalability, cost-effectiveness, and compatibility of the various options. By doing so, you can make an informed decision that aligns with your business's EDA objectives and financial constraints.

Run It Yourself vs. Pick a Hosted Instance

Choosing between running your own EDA infrastructure and opting for a hosted instance is a bit like deciding between cooking at home and dining out. When you cook at home, you've got total control over the ingredients, cooking methods, and the entire culinary process - but it does require more time, effort, and sometimes cost. On the other hand, dining out may save time and initial effort, but it doesn't afford the same level of control and can lead to higher costs in the long run.

Running your own EDA infrastructure

Opting for self-hosting your EDA infrastructure is akin to preparing a home-cooked meal. In the same way you control the ingredients and cooking process at home, you can dictate the elements and workflow of your EDA. This allows you to adjust it to your needs, ensuring a tailored fit. But remember, just as home-cooking involves planning, equipment, and cleanup, so does managing your own EDA, requiring significant responsibility and commitment.

Benefits:

  • Full Control: With self-hosting, you maintain complete control over how everything operates, allowing for greater flexibility and adherence to your company's specific workflows.
  • Customization: You have the freedom to adapt the system to your unique business requirements, making it easier to tailor the infrastructure to cater to your needs and those of your customers.
  • Data Privacy: If you're handling sensitive data, running your own infrastructure can provide enhanced security and peace of mind, as you won't be reliant on third-party providers.

Drawbacks:

  • Maintenance: With control comes responsibility. If something breaks or requires updating, it's up to you to fix it, potentially diverting valuable time and resources from other tasks.
  • Cost: Self-hosting brings with it a variety of costs, from the initial setup to the ongoing maintenance, staffing, and update needs. These costs can be considerable and should be factored into your decision.
  • Scalability: While it's possible to extend your infrastructure as your business grows, it requires careful planning, investment in additional resources, and careful execution to ensure seamless scaling.

Imagine a bank with rigorous security requirements and a capable IT team. They might choose to host their own EDA, given the high stakes surrounding data security and the need for specific customizations.

Choosing a Hosted Instance

On the flip side, choosing a cloud-based EDA is akin to dining out. You don't have to worry about shopping for ingredients, cooking, or cleaning up, but can simply enjoy the meal, leaving the rest to the restaurant's staff.

Benefits:

  • Ease of Use: While cloud solutions often come with user-friendly interfaces, the setup process may involve several steps, including establishing various services, infrastructure, and security measures. That said, these platforms tend to streamline some of the more complex procedures associated with self-hosted solutions.
  • Maintenance and Support: Although the service provider generally handles most of the technical maintenance for the underlying infrastructure, you or your team may still need to oversee system management and conduct troubleshooting when necessary.
  • Scalability: Cloud solutions are usually designed to scale with your business growth, but this flexibility often requires you to plan carefully and potentially carry out additional setup tasks to ensure smooth scalability.

Drawbacks:

  • Less Control: With cloud solutions, you may need to adapt to the predefined settings and operational rules set by the service provider, which can limit your flexibility to some extent.
  • Potential for Higher Long-Term Costs: While the initial cost of cloud solutions may seem lower, expenses can accumulate over time. This accumulation is especially noticeable as your usage increases, leading to potentially higher costs in the long term.
  • Data Privacy Concerns: When you rely on a third-party host, you may encounter data privacy issues, as your data is stored on external servers. It's essential to ensure that your service provider has robust security measures in place to safeguard your data.

A startup looking to test a new application quickly might opt for a cloud-based solution. It enables them to hit the ground running without getting bogged down in infrastructure setup and maintenance.

In a nutshell, the choice between running your own EDA and picking a hosted instance depends on your unique needs and resources. It's about finding the right balance between control and convenience, cost and maintenance, privacy and ease of use. And remember, there's no one-size-fits-all answer -- what's important is making the choice that best suits your situation and goals.

Organizing Events: First Steps

Now that you have chosen an event router, you are well on your way to setting up an effective EDA. The next vital step is to focus on organizing events around business processes. This is a very pragmatic approach, commonly followed in the industry, and ensures that the events or messages in your system closely mirror real-world operations within your business. Such an approach significantly enhances the intuitiveness and understandability of your event-driven architecture.

In this context, a key practice is to use past tense verbs for event names. This convention communicates that an event represents something that has already happened in your system. For instance, in an e-commerce application, you might see events like 'OrderPlaced', 'ItemShipped', or 'PaymentProcessed'. Each of these events signifies a significant milestone in a business transaction. The use of past tense verbs makes it clear that each event signifies a completed action.

Now, as your system grows and evolves, so will your events. You might find yourself needing to add new fields to an event, remove old ones, or even change the type of a field. This is where versioning comes into play. By including a version number in your events, you can make changes to the event structure in a backward-compatible way. For instance, if a newer version of your system introduces a 'DiscountApplied' event with additional data, older parts of your system can continue to operate with earlier versions of this event, while newer parts can use the more data-rich version.

Deprecating fields or entire messages is another crucial aspect of managing your EDA's evolution. For example, your 'OrderPlaced' event might initially include a 'shippingMethod' field. However, as your business evolves, you might decide to separate shipping information into a dedicated 'ShippingInfo' event. At this point, you could deprecate the 'shippingMethod' field in the 'OrderPlaced' event.

This makes the choice of a schema language that supports field deprecation incredibly important. Protobuf, for example, allows you to deprecate fields, which means you can safely remove them from your codebase while maintaining backward compatibility. If older events containing deprecated fields are encountered, they can be safely ignored, preventing errors or crashes. We discuss this more in a future post.

Organizing your events around business processes and adhering to conventions like using past tense verbs can greatly enhance the maintainability and comprehensibility of your EDA. Furthermore, strategies for versioning and deprecation of events allow your system to adapt to changes in a controlled and backward-compatible manner. The choice of a schema language that supports these operations is, therefore, a critical decision that can have long-term effects on the robustness and flexibility of your EDA.

Wrapping Up

We hope this post has armed you with the necessary information to understand the challenges that come with EDA adoption and how to think about if it is the correct approach for your organization. We also hope you are now armed with the knowledge to create an actionable plan.

Next up is part three in this series about event-driven systems, it's called Building for Failure, and it uncovers hidden dangers when designing Event-Driven Systems.
You can check it out here.

About The Author

Having devoted more than two decades to a prosperous career in software development, Michael Stack's expertise runs deep. His particular focus on Go, since 2015, has further solidified his position as a seasoned professional in the field.

His recent endeavor is the exciting intersection of Go with AI and ML projects. This shift signifies his commitment to staying at the forefront of technology trends and his passion for continually expanding his skillset.

While Michael maintains a reserved presence on social media, he remains accessible for professional contact through his LinkedIn profile.

Michael is also the author of "Event-Driven Architecture in Golang". His book offers an insightful journey into event-driven programming through the lens of Go and is available for purchase on Amazon.

Top comments (0)