Kazuya

Posted on Dec 5

AWS re:Invent 2025 - Capability-first architecture: Bridging business intent to technology (ARC202)

🦄 Making great presentations more accessible.
This project aims to enhances multilingual accessibility and discoverability while maintaining the integrity of original content. Detailed transcriptions and keyframes preserve the nuances and technical insights that make each session compelling.

Overview

📖 AWS re:Invent 2025 - Capability-first architecture: Bridging business intent to technology (ARC202)

In this video, AWS Solutions Architects Mukund Rao and Sushanth Mangalore present a capability-driven framework for navigating technology choices amid constant disruption. They introduce the "three Cs" mental model: Capabilities (logical workload portions with business intent), Characteristics (non-negotiable NFRs defining success), and Constraints (inherent business limitations). The session emphasizes treating architectures as living systems requiring incremental evolution rather than wholesale rewrites, anchoring decisions around capabilities instead of chasing technology trends. They demonstrate how to break workloads into capabilities using techniques like domain-driven design and business ownership, extract architectural characteristics through dialogue-based discovery with stakeholders, and match AWS services to these characteristics. The framework includes using Architectural Decision Records (ADRs) for lightweight documentation and creating isolation boundaries to protect capability characteristics. Practical examples include e-commerce product catalogs and AI shopping assistants, showing how the three Cs guide technology selection while avoiding choice paralysis and the "shiny-new-toy syndrome."

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Welcome to re:Invent: A Meta Perspective on Technology Architecture

Good morning everyone. I hope you're having a great start to re:Invent. Welcome, and thank you for choosing to spend the next hour with us. We understand there are many interesting sessions happening throughout the conference, so before we dive into today's topic, I want to quickly set the agenda for what this session is and is not about.

This session is not about any specific product, service, technology, or architecture pattern. You'll be attending many deep dives on those topics throughout the week. Instead, as you begin your week at re:Invent, navigating through all the product and service announcements, exploring solutions and innovations from our partners at the expo, and engaging with all things generative AI and agentic AI, we recognize it can be overwhelming. With that in mind, we designed this session to help us as a cloud community of technology practitioners and decision makers collectively take a step back and gain a meta perspective about technology choices and technology systems architecture.

Secondly, we want to leave you with a simple mental model and a practical framework that you can use to navigate technology, especially in this age of constant technology disruption. My name is Mukund Rao, and I'm a Worldwide Business Consulting Solutions Architect at Amazon Web Services. Over the last five years, I've had the opportunity to work with hundreds of customers across different sizes, scales, and industries, as well as with some of our strategic AWS partners to help customers undergo business transformation through technology modernization. Including my time here at AWS, I've spent about 16 years in the AWS ecosystem, most of it as a hands-on engineering architect and as an AWS customer building and scaling systems on AWS. It's exciting to be here sharing my perspectives, and I have Sushanth with me today.

The way we're going to structure today's session is as follows. I'll start with the why, setting the context on why we need a capability framework. Then Sushanth will dive into what the capability framework is. After that, we'll apply this framework and trace through a couple of practical examples and use cases. We'll discuss some frequently asked questions, and then we'll wrap up. That's our structure for today. Let's begin by rewinding our clocks back about two decades to 2005.

From Simplicity to Abundance: The Evolution of Technology Choices

Back then, as technologists and builders, our lives were much simpler because we had fewer choices to make. We had a handful of programming languages. A server simply meant an application server or a bare metal machine. Relational databases satisfied most use cases. With architectural patterns, we lived in the era of simple three-tier architecture with your presentation layer, your application layer for business logic, and your data layer. If you needed to scale these systems, you simply threw more resources at them, whether CPU or memory, and vertically scaled. Of course, that wasn't the most efficient way to scale and had its own challenges, but the point is we lived with technology constraints, and these constraints actually forced clarity. We were more deliberate and intentional about our technology decisions, and our systems were easier to reason with.

Fast forward to today, we live in a world with a dizzying array of choices, much like most things in life. Take something as simple as water. If you walk into any nearby supermarket in the hydration aisle, you're surrounded with so many options: water, sparkling water, soda, energy drinks, sugar-free, sugarless, and more. The same applies to technology. We live in an era of technology abundance. AWS alone has about 240 plus fully qualified services today, and this list is likely to grow with all the announcements happening through this week.

If you take something as simple as compute, beyond 900 permutations and combinations of launching EC2 instances of various types and sizes, you also have container orchestration platforms and serverless options. Even data, which was once simply a relational database, now offers a plethora of options. We live in the era of polyglot persistence and purpose-built data engines. We have document storage, time series databases, and increasingly, we're seeing growing use of vector databases and knowledge databases supporting new classes of applications. It's not just the services themselves. When you look at our marketplace, the options continue to expand.

Our marketplace includes about 25,000 products and services published by more than 6,000 independent software vendors. This represents a massive amount of choice in services and solutions. The architectural landscape itself is expanding significantly. What was once the simplicity of a three-tier architecture has evolved into cloud-native architectures with highly distributed microservices, application mesh, event-driven architectures, real-time streaming architectures, and now developing agent application patterns. Even on the data side, what was once a traditional data warehouse has transformed into modern data architectures including data lakes, lake houses, medallion lake houses, and zero ETL architectures.

This expansion extends beyond applications and data. You can look at security and other domains where there are numerous patterns like Zero Trust security. There are so many architectural patterns available today. While having this choice of services, solutions, and architectural patterns has unlocked tremendous potential for businesses, it has also multiplied complexity and created a new kind of challenge we call choice paralysis. Businesses are stuck in a maze of technology, lost in internet searches, stuck in documentation, and taking architectural guidance from opinion-based internet forums.

As a result, businesses are evaluating multiple vendors, resulting in long evaluation cycles and proofs of concept that never materialize. There is a delay in value realization from technology that we are noticing. Add to this what we call the shiny-new-toy syndrome where businesses want to adopt every new technology pattern. This has created consequences where they end up creating solutions for which they have to force fit problems. Secondly, they end up creating sprawling and inefficient architectures that consume more resources and contribute to waste, making it also a sustainability issue.

The Urgent Need for Evolvable Architectures in the AI Era

It is worthwhile to take a look at the relationship between technology and business because the objective of technology is to support a business outcome. Over the past few decades, this relationship has dramatically changed. In today's day and age, having a technology-led differentiation is no longer sufficient. This was normal probably a decade ago, but today it is not sufficient. Technology is becoming front and center and at the core of every digital-first business. This is a trend we continue to notice and it has accelerated since the pandemic.

As a result, businesses are having to do more with less. Add to this the macroeconomic conditions and the dynamic world we live in today with the increasing AI workforce. Companies are having to innovate faster and compete harder. Given this scenario, it is becoming extremely important to translate business intent into technology choices much faster than ever before. When we talk about business and technology aligning with one another and moving in lockstep, for us as AWS architects, it sounds very cliche. It sounds like one of the oldest lines in the digital transformation playbook that gets repeated over and over again.

But the reality is this. From our vantage point as AWS architects, we notice that businesses are surrounded and paralyzed by technology choices and constantly weighed down by increasing complexities of their architectures that are becoming distributed and dynamic. Add to this the new building blocks being redefined by generative AI, which is contributing to more choice and amplifying the problem further. Given this context, there is a need for evolvable architectures.

This concept is not new—it has been present in systems design since the seventies. However, it is especially more important, relevant, and urgent today because of the barrage of AI products, services, and tools that continue to confuse many businesses. When we talk about evolutionary architectures, it also means we need to rethink how we look at our architectures. It is no longer sufficient to view your architecture as simply integration and a bunch of tools and services working together.

Rather, we have to start looking at them as living systems, and we need to have a plan to continuously, cost-effectively, and efficiently evolve them. This is the biggest mindset shift we need to make to rethink how we look at architectures. To do that, over the next few minutes, we are going to take lessons from a couple of sources. One is the lessons from our own customer journeys. We notice all sorts of customer journeys on AWS—from migration to modernization to becoming fully cloud native—and we observe which tactics work and which do not. We get visibility into that.

Secondly, we are going to draw some lessons from evolutionary biology because nature has been conducting experiments in evolution for millions of years, long before any of us humans existed or any agentic AI technology existed. With that, I want to highlight this timeless quote from Charles Darwin: "It is not the strongest or the most intelligent of the species that survive, but it is the most responsive to change." This is a very profound and timeless quote, and it applies to technology and technology architectures as well. This brings me to the first lesson and takeaway from today: architectural adaptability.

Three Lessons from Evolution: Adaptability, Technical Debt, and the Locus of Change

When we talk about architectural adaptability and look at nature, we can see that evolution does not happen overnight. It takes millions of years to evolve, and the same applies to technology. Architectural evolution is not an overnight journey—it happens in phases. I am not implying that it should take millions of years, as that would defeat the whole purpose. Rather, I am implying that there needs to be an adaptable set of changes. When we talk about architectural adaptability, there is one thing that is very revealing and one factor that actually tells us how well customers are doing adaptability: the success rate of their modernization initiatives.

What we have noticed is that for customers who are more successful on AWS, they treat modernization and evolution as a continuous process. They have a plan for it and end up making incremental changes. Incremental changes is the key phrase here. The contrary is also true. We notice that the majority of organizations actually fail at modernization efforts because they either try to boil the ocean or try to do everything at once. They try to rewrite systems without an actual plan to organically evolve their architecture. This is one of the biggest things I want you to remember: adaptability is the key.

The second lesson is around what I call the cost of evolution, and that cost is usually paid in terms of technical debt. We know what technical debt is, and it can come from many layers—from the application layer, from the data layer, from poor documentation, or from fragile integrations of different services. But whatever it is, technical debt, like financial debt, is not all bad. As long as you have a plan to intelligently incur it and pay it off, you are in a good position. You do not want to acquire a big debt that will sink you, just like financial debt.

It is also important to be able to measure technical debt because if you do not measure it, you end up acquiring unknown costs. This visual here explains this vicious cycle really well. If you do not have a plan to evolve and you are not thinking about proactive adaptability and making incremental changes, you end up acquiring technical debt today or tomorrow. If you are treating your architectures as static systems, that will result in more complexity, and that will feed into the cycle and continue on and on. We have noticed that many organizations do not treat technical debt as seriously as they should, and over time, because of how unmaintainable their systems become, it starts showing on business outcomes as well.

So it's very important to address that technical debt. Now, the third and final lesson is around what I call the locus of evolution. So far we have established why we need evolvable architectures and how we go about evolving our architectures by making small, adaptable, and incremental changes and addressing the cost. Naturally, the question comes: how do we evolve the architecture and what do we evolve our architectures around?

When that question comes, the natural inclination is to go with technology, and that is unfortunately the reality we see with many organizations. But as we just discussed, if you anchor your architectural evolution with technology, you're going to be chasing a moving target because whatever is latest and greatest today will be obsolete in a few years from now. Instead, the right way to do evolution is to anchor what we call capabilities as the center of your architectural evolution, and that's going to help you evolve your architectures much better.

Introducing the Three Cs Framework: Capabilities as Logical Subunits

To learn about what capabilities are and what this capability framework is, I now invite Sushanth up on stage. Thank you, Mukund. Hi everyone, my name is Sushanth Mangalore and I'm a Solutions Architect here at AWS. I work with independent software vendor customers who build software and SaaS products for large-scale consumption on AWS. Prior to taking on this role, I used to work as a software architect or an application architect, and I see many parallels between cloud architecture and software architecture.

Your software and the infrastructure that it runs on come together to deliver a unified experience to your consumers. Your consumers don't see them as two separate things; they consume them as one. So borrowing some lessons from software architecture can be really useful in defining your cloud architecture, and you'll see this as a theme as we progress through this session today. With that, we want to provide you a mental model of three Cs, starting with the first C, which is capabilities.

Think of a typical enterprise. An enterprise can have one or more workloads. Now we define a workload as a collection of resources and code that come together to deliver business value. But from that definition, a workload could be something as simple as a single-page application, or it could be a large, sprawling, complex system with many moving parts. Imagine something like a CRM solution, an e-commerce system, or a banking solution. For some organizations, one workload could be their entire business.

When you are architecting such large and complex systems, it helps to break down or organize your workloads by smaller, more manageable portions and targeted solutions for each of those portions. With that, I want to introduce to you the idea of capabilities. So workload capabilities represent business intent for the portions of the workload that you've determined to be your capabilities. These are what I call logical subunits that you can independently solution for.

The capabilities represent the business intent or things that your business needs to deliver for a sustained period of time. You can choose to implement these capabilities with one set of technologies today, but you retain the ability to switch out these technologies as newer, faster, cheaper, and more efficient options become available in the future, and it inevitably will. That way, you are taking an approach where you can independently evolve capabilities, and from individual evolution of capabilities, your workload stands to benefit as a whole.

You may have heard the word capability in many different contexts, but for the rest of this session, we will use the word capability to mean logical portions of your workload with well-defined business intent. Now, at a workload level, we have the Well-Architected Framework, and this baselines your workload across common architectural dimensions that are defined by the six pillars of the Well-Architected Framework.

You may be familiar with the idea of the Well-Architected Framework as a review mechanism to identify and address high-risk items that are affecting your workload. However, the principles and guidelines that make up the Well-Architected Framework are all there for you to use at any point in time. You can use them even during your planning and design stage, and the earlier you adopt the Well-Architected Framework, the sooner your workload is on its way to being well-architected.

By the time your workload hits production, it's already halfway there in terms of being well-architected. Well-architected gives you that foundational layer, and upon that well-architected foundation, you can architect individual capabilities and provide targeted solutions to them. So how do you organize your workload into capabilities?

Four Techniques for Organizing Workloads into Capabilities

We have a few different techniques that work very well. The first one is business ownership, which is especially applicable to very large workloads. Think of something like an e-commerce solution. The product recommendations capability may be owned by the Product Marketing Department, and the customer support chatbot may be owned by Customer Success. Each of these business units will have their own success criteria, their own business goals, their own budget, their own skills and staff. This way, you can draw boundaries around these functionalities as capabilities.

Now, this can sometimes run up against Conway's law, which states that businesses build software that mirrors their organizational communication structure. If your organization is already building teams that align with business, then it is usually simpler to arrive at your architecture to make sure that the business intent is being met. If you have very large technology centers of excellence or technology-aligned teams, sometimes the technology choices made by those technology-aligned teams can influence or even limit how you achieve your business intent. It's a hard problem to solve for sure, and it's not something we're really going to be able to solve here, but just something to keep in mind as you organize your workload by capabilities.

The next technique borrows from software architecture and uses the idea of domain-driven design, or DDD. DDD promotes the idea of organizing your workloads by bounded contexts, and each bounded context represents a business domain. Event storming is a popular technique by which you can identify your domain boundaries or bounded context. This is a one-day workshop where key stakeholders come together to identify the bounded context. Once you identify these bounded contexts, they can become your workload capabilities.

The next technique is less formal than DDD. It just makes use of basic good architectural practices, and I would call it Architecture 101. It brings together related functionalities that achieve a common objective and groups them into a capability. This is what I would call functional cohesion. Once you identify capabilities in this manner, you would wall them off from other capabilities within your workload, loosely coupling them. We are making use of coupling and cohesion as two driving factors to identify our capabilities. This way, each capability can be operated and evolved independently.

The last technique I want to talk about here is distinguishing traits. Distinguishing traits are portions of your workload that exhibit certain qualities that set them apart. These usually translate to your competitive advantage or business differentiator. Imagine offering something like the easiest insurance quoting process or the fastest checkout in e-commerce. These are all reasons why someone will use your solution or something else that's out there in the market. Once you identify portions of your workload that need to be solutioned with distinguishing traits in mind, you can draw your boundary around those functionalities as your capability.

Irrespective of how you slice up your workload into capabilities, you'll see that distinguishing traits really matter as we talk about this later in the session as well. You can use either one or a combination of these techniques to arrive at your capabilities. The idea is to not wholesale solution your workload as one giant thing, but instead slice them up into smaller portions, making it easier to solution for them. In the absence of something like this, it's very common that you try to solution with a broad stroke of brush, you try to standardize with a common set of technologies across your workload or even across your enterprise. That can lead to a situation where your technology choices are suboptimal for portions of your system where better options may exist.

Defining Architectural Characteristics: Elevating Non-Functional Requirements

Now that we know how to break your workload into capabilities, let's talk about what makes up a capability. A capability is made up of two factors. There is functional requirements and non-functional requirements, or NFRs.

Functional requirements are well understood and usually well stated. Your product team talks to your business, documents them in the form of epics, user stories, and requirement specs, and your engineering team builds them. Non-functional requirements, on the other hand, although everybody realizes their importance—having security, availability, and performance all matter—are usually not well stated in advance. Your engineering team has to discover them and ensure they are not an afterthought or bolted on later. Our approach promotes elevating non-functional requirements to first-class status. They should be at the forefront of any technology decisions you make.

What happens when you do not have non-functional requirements well stated by your business? In that situation, you need to discover or elicit them. The technique we use here is called dialogue-based discovery. Business intent can exist at multiple levels. At a workload level, they usually align with some sort of a CXO objective or a board-level objective. You want to improve your profit margins or increase your top-line revenue. Or it could be some sort of a risk-reduction initiative or a compliance initiative. At the capability level, the business intent tends to have a leaner focus. You would talk to your business and try to understand what they are trying to achieve with any given capability, and you will hear terms like this.

Let's take an example where you want to improve your customer satisfaction. Customers could be dissatisfied for many different reasons. You may find that shopping carts are being abandoned. You may find that you are getting a lot of incidents and tickets or a lot of complaints. But for us architects, we need to understand what is causing the dissatisfaction and ask business more pointed questions like, "Are the users seeing errors in critical user journeys?" "Is the application slow?" "Is the application unavailable when they need it?" Through these kinds of questions, we can discover our non-functional requirements by arriving at certain qualities.

Each of these statements that business has stated in plain English can result in non-functional requirements. In the customer satisfaction scenario, we ended up with performance, resilience, and availability. But these are by no means exhaustive. There could also be other reasons like usability or responsiveness of the system that could be causing user dissatisfaction. However, this alone is still not sufficient for engineers to go build. You need some concrete metrics or SLOs to chase. So you dive a level deeper and ask more pointed questions like, "Is it okay for the page load time to be two seconds at the 99th percentile?" or "Is it okay for the system to be unavailable for a few minutes, say five minutes in a month?" That will lead you to more concrete SLOs that the engineering team can now chase.

That is how you arrive from your business intent to your non-functional requirements. But imagine you come out of a conversation with your business and you end up with a dozen NFRs. Everything sounds great. We want performance, we want security, we want resilience, we want cost-effectiveness, and there is nothing that you really want to leave out. But you will notice that once you start to put these NFRs side by side, they are at healthy tension with each other. If you are building a highly performant, highly resilient system with redundant infrastructure and expensive hardware, it is obviously going to go against cost-effectiveness. That is the trade-off.

You need to have this trade-off conversation because you want to end up with what we call architectural characteristics, which is our second C. Architectural characteristics are what I would call the non-negotiable NFRs that are disproportionately more important and vital to the success of your capability. These are what define your capability and are separate from the aspirational non-functional requirements that you want to meet. These are critical for your capability to be successful. You really want to chase a handful of these—literally something that you can count on one hand. If you start to chase about six or seven of these, then it becomes hard to engineer for. You may end up with an over-engineered system or you may dilute their significance altogether.

From Trading Systems to DynamoDB: Matching Service Characteristics to Capability Needs

Really, you want to keep them limited so that you can target them and achieve them. Let's understand this with an example. Imagine a workload that is a trading system.

With a capability for order management, which allows you to place trades. In this scenario, you would have a conversation with your business and determine that scalability, reliability, performance, and regulatory compliance are your critical non-functional requirements, or architectural characteristics. In achieving these, you may need to trade off some other non-functional requirements. You may need to trade off cost for the reasons I mentioned before. This is a complex engineered system requiring high-performance hardware, redundant infrastructure, and high scalability. Cost is at odds with those qualities. Simplicity is nice to have for any system, but the inherent domain you are operating in makes simplicity hard to achieve. No matter how many technologies you can throw at it, the solution will always be complex.

Agility is another thing you may need to trade off because when you are engineering a system for high reliability, high performance, and scalability, you want to ensure that any new changes are not causing any regression. You want to meticulously validate any new changes, and that can result in longer release cycles. This is acceptable because you are prioritizing the effectiveness of those architectural characteristics and trading off certain other qualities. Underneath this, we still have the well-architected baseline. The well-architected framework is still giving you those architectural dimensions that every workload on AWS should have. But architectural characteristics take this a step further and enhance your capability for what sets them apart and becomes your competitive advantage.

Now let's take a look at another capability within the same workload. This time, it's end-of-day reporting of trades, which is emailed to your traders before the start of the next business day or next market day. In this situation, you are prioritizing cost-effectiveness, ease of implementation, and ease of maintenance. In achieving that, you may actually trade off scalability. You do not need your report generation to be massively scalable because you can batch them. You do not need the performance to be highly optimized because your users do not care how long it took the report to generate, just as long as they receive it in their inbox before the end of the next day or before the start of the next day.

In this case, a separate capability within the same workload is resulting in a completely different set of characteristics. These capabilities come together to achieve the overarching goal of the workload. The whole is greater than the sum of the parts. The whole is your workload and the parts are your capabilities. I try to draw a parallel with a team sport like soccer. There are multiple players that make up a team, and each player has specific skills or technical attributes that make them successful for the role they play in the team. The coach of the team runs them through a standard set of drills to prepare them for match days with the intent of winning the game. That is what I would equate to the Well-Architected Framework.

Each capability within your workload benefits from the Well-Architected Framework equally. But beyond this, there are also individual drills that are catered to specific positions or specific roles in the team. A goalkeeper will have its own drills. The forward will have their own drills, so they will take shooting practice, free kicks, and all of that. These are what I would equate to capabilities and their specific characteristics. Together, the players come to help the team achieve their objectives. In a team sport, there is also the idea of a marquee player or a franchise player. You know that if that player has a great day, your team is more likely to win. Similarly, within your workload, you may have certain capabilities that are like your flagship capabilities, and the success of your workload depends on that one capability.

Sometimes you may need to pay extra attention to certain capabilities and you may find that you are investing more in that capability, which is normal. Having said all that, how does this all tie back to AWS services? We are here to simplify our technology choices when it comes to AWS services. AWS services exhibit the architectural characteristics of our services just from the definition. If you see the words highlighted here, you can see that Amazon S3 is an object storage service and it offers scalability, data availability or durability, security, and performance as its standout characteristics. If your solution needs an object storage and if the characteristics that you are trying to achieve for your capability match what S3 offers, then you have found your service.

You start to see this across our services. If I had a dollar for every time I answered the ECS versus EKS question, I'd be quite wealthy. ECS offers simplicity of deployment, manageability, and scaling. So if you're looking for something like that within your workload, then ECS may be the right choice.

Let's look at one last example: DynamoDB. It's fully managed, serverless, high performance, and highly scalable. As long as your data storage and data access patterns match what DynamoDB offers and the characteristics that you're trying to achieve for your capability coincide with what DynamoDB offers, DynamoDB may be a good fit for your architecture.

With that, you now know that you can use architectural characteristics to pick your services. We're taking a real jobs-to-be-done approach here. If you're familiar with that idea, you have a job and you're hiring a service or a product to do that job. The job in this case is defined by the capability and the characteristics that you're trying to achieve for that capability. You can always fire the service if it doesn't do the job.

Constraints and Documentation: Using ADRs to Capture Architectural Decisions

That brings us to our third C, which is constraints. Constraints are something that are inherent to a business or an enterprise. These are things that relate to concerns like your budget, your timelines, the skills that you have on hand, and any risks. The constraints are usually beyond the control of the team that is building the architecture. It's something that you need to learn to work with because these are inherent to your business and not something that you can change overnight.

Constraints come with a bit of a negative connotation. They impose restrictions on you, they reduce your flexibility, and they limit your options. But we want to take another view on constraints. Constraints can actually be beneficial to you. They encourage creativity in your solution and they promote frugality. A couple of years ago at re:Invent, Werner Vogels, our CTO, introduced the idea of the frugal architect. Constraints actually help you be frugal with your architectural choices.

You may notice I'm saying limits options again, but this time as a positive. What limiting options actually does for you is simplify your choices. It narrows your solution space and allows you to arrive at your technology choices faster. If your constraints are outright eliminating some technology choices for you, what's remaining is your solution space. This way, we solve for choice paralysis and we actually use constraints to our advantage.

These constraints are what I would call essential or inherent constraints of your business. But there are also artificial constraints or accidental constraints that sometimes get introduced. They usually manifest in the form of enterprise guidelines. If you're familiar with statements like "we avoid vendor lock-in," "these services are not approved," or "we are a specific technology shop," these are not uncommon at all and exist in pretty much every enterprise. They usually enforce or restrict use of certain technologies and certain architectural patterns.

Most commonly, they exist with very good intent. These are based on past research, past evaluations, past experiences, or an effort to standardize. But what they can also result in is imposing restrictions on you artificially, and you're not able to solution for the capability in the way that would result in the best solution. What we recommend instead is using your three Cs—capabilities, constraints, and characteristics—to arrive at your technology decisions and then evaluate them against your enterprise guidelines. If your enterprise guidelines are actually improving your solution, by all means, borrow from your enterprise guidelines.

I'm by no means saying go have a fight with your architectural review board or throw away all the guidelines. No, they are still very important. But what we are promoting is the idea that you shouldn't let them set in stone and stand back in a time that has not moved. Your enterprise guidelines can advise your solution, but any new discoveries that you make through your three Cs and the intentionality that you have established behind your technology choices should feed back into your enterprise guidelines. That way, they don't stay stagnant and they continuously evolve as well.

It's very common that enterprise guidelines are outdated or something from five years ago and not fit for purpose anymore. This is an intentional forcing function for you to continuously evolve your enterprise guidelines.

And it can be really serendipitous when this happens. We've taken all this time to establish the intentionality behind our architecture. We don't want to skimp on the last step, which is an important step. As technologists, we are not very fond of documentation. But what we are suggesting here is not to write pages of documentation. Instead, use a concept that most of you are likely already familiar with: architectural decision records, or ADRs. ADRs are a lightweight basis for documenting your architecture. I say that ADRs and the architectural diagram are all you need to document your architecture. You don't need pages of documentation in a Confluence page or a wiki to understand your architecture.

For those of you unfamiliar with ADRs, I want to run through what makes up an ADR. You would start with the title, where you define what your architectural decision is for and specify what capability you're trying to achieve. The status goes through different stages: proposed, under review, accepted, or rejected. Rejected decisions are also good to have around because you don't want to go through that same evaluation cycle again if something was already evaluated and rejected. If the ADR is no longer required, it can also be deprecated.

The next stage is the context, where you describe your problem statement in detail. Here, you specify all the things that are governing your decision. Lastly, in the decision section, you specify what you're using as your technology choices, the ultimate decision that you ended up making. Along with the decision, we also want to document the consequences. What are the consequences of this decision? What are the trade-offs you had to make? What are your fallback options if the decision doesn't pan out the way you expected? In some cases, you would have evaluated some alternatives, and it's useful to document them too because those could be your fallbacks. If something closely lost out to the ultimate decision, that alternative may become your actual architectural choice if the decision doesn't pan out.

In addition to these standard ADR fields, we also recommend documenting some additional fields. We recommend documenting the stakeholders who are involved in making the decision, the actual people who were involved in coming up with this decision, because these decisions are never made by one person. Even though you are the architect, you likely did not come up with this on your own. Usually, there's a product team involved, an engineering team involved, and architecture involved. Put the names of the actual people who are involved in making the decision. That also sets some accountability and conviction behind the decision. If you ever need somebody to blame, you know where to go.

Additionally, we also recommend documenting your characteristics and constraints as separate fields. You could of course have them stated within the context as well. But sometimes if the context gets very verbose, they can get lost in there. If you surface your constraints and characteristics as their own field, it creates a shared understanding of what drove this decision in the first place.

Practical Examples: Applying the Three Cs to E-commerce and AI Capabilities

Having said all that, let's walk through a practical example. We'll take a workload like an e-commerce system and pick a capability: product catalog. For the product catalog, our business intent is that we want to offer a diverse collection of products with best-in-class user experience. For this, we had a conversation with our business and determined that the architectural characteristics of scalable, reliable, and performant are going to be really important for this capability to be successful. This is what will lead to adoption. We're also working with a couple of constraints. We want to use containers, but we have very limited container experience running workloads in production. We also want to support a wide variety of products across many regions, and we're not selling only particular types of products.

With that, we arrive at our architectural decision. First, let's look at the alternatives. We start with the idea of serverless first, but we believe that millions of users will use this capability on day one and they'll use it around the clock. So we feel like provisioned compute with autoscaling will work better. We want to use a containers-based architecture. We do not want to directly deploy on VMs. But we do not have any Kubernetes experience because we only really started getting started with our containers journey. That's the compute aspect.

For databases, we evaluated relational databases and we found that our product schema, because of how many different types of products we want to support and how many regions we want to support, it was very hard to standardize on a data model. So we ruled out relational databases temporarily. For storage, we evaluated network file systems.

But we are mostly dealing with static data like product images and product videos. Because of that, we found that network file systems were not cost-effective. What we ended up with in our actual solution was a containers-based architecture. We picked Amazon ECS with Fargate, which gives us the scalability we need and allows us to engineer our applications to be reliable and performant.

In addition to that, we chose NoSQL as our database because the characteristics offered by DynamoDB match up with the characteristics we are trying to achieve. We are able to make the data access patterns of DynamoDB and the data shape of DynamoDB work for us. Lastly, we went for an object store because object stores are well optimized for static data, and Amazon S3 offers the characteristics that match up with the architectural characteristics we want to achieve.

This demonstrates how we can go from a workload distilled to a capability, using our three Cs—characteristics, capabilities, and constraints—to arrive at our architectural choices. Let's quickly reinforce this with another example. In the interest of time, we'll show you how you can take another example of something like Amazon Rufus. You might have been using Amazon Rufus to help you shop. Let's say an e-commerce company wants to build in that capability.

The architectural characteristics are that it needs to be cost-effective, maintainable, and agile, with the constraints that you have limited in-house AI/ML skills and you don't have too much time because today is Cyber Monday. Given these three Cs, you can evaluate whether to self-host or not. Because you don't have the skills, you would rather settle with Amazon Bedrock. Similarly, you don't want to be building and maintaining a commercial vector database, so you'd rather use Amazon Bedrock. This is how we arrive at decisions.

Implementation Guidance: FAQs, Greenfield vs. Brownfield, and Next Steps

Next, one of the common questions we get is: we have a workload with multiple capabilities within it, and these capabilities can have conflicting characteristics. How do we reason through them and ensure that the characteristics don't cancel each other out? You can have multiple capabilities within a workload. If they have the same set of characteristics, they usually work together to achieve the common objective. But when they have conflicting characteristics, for example, if capability A is optimized for performance and capability B is not, capability B can actually weigh capability A down.

They communicate through well-defined interfaces. A couple of ways to solve for this is we still have the well-architected baseline. This uniformly affects all your capabilities, influencing all of them, so they should not be too far apart from each other. But in addition to that, you already optimized capability A for performance, so capability A needs to protect its characteristics. This brings us to our next concept: isolation boundaries.

Isolation boundaries are a conceptual idea that allow each capability to protect its characteristics even when they are interacting with other capabilities. These can be implemented in many different forms. There are things like circuit breakers, exponential back-offs and retries, queuing, and asynchronous communication. You can even do it through your application code. Good fences make good neighbors, and similarly, isolation boundaries help capabilities protect their characteristics.

The next question we often get is: everybody understands ADRs are useful and there is consensus around their usefulness. But many organizations have a problem either getting started with ADRs or consistently doing it over a period of time. How we go about that is what we'll talk about next. ADRs are easier said than done. Of course, we all hate documentation, so it feels like there is additional effort to actually get started with the ADR process. But here are three simple tips we have synthesized based on customer journeys we have seen on AWS.

Start with simple cross-functional teams because you need to have dialogue with business, product, and engineering, as well as business stakeholders, so that we know what capabilities we are building for that system. Then keep the team lean. You don't want to build a big team and get stuck in making decisions. That's not the intention. It needs to be a lean cross-functional team. The second aspect is around the process. There needs to be a process to store these ADRs, and most importantly, it needs to be centralized and accessible.

The third important thing is that it needs to be version controlled because you can actually see the evolutionary thread. You need to be able to see how decisions have been taken over a period of time.

Lastly, governance is critical. After all is said and done, if you simply have a bunch of ADRs on a weekly page, that might not necessarily mean everyone will follow that guidance. You need to be able to enforce these ADRs. The way to do this is through simple things like leveraging AWS services, such as AWS Organizations and service control policies, where you can control which services can be launched within a particular AWS account. Additionally, you can use services like AWS Config for driving more granular enforcement.

A common question we get from customers when discussing this framework is whether it applies to greenfield scenarios, brownfield scenarios, or both. This framework is equally applicable to both greenfield and brownfield solutions. However, the way you approach these characteristics will differ. In a greenfield capability situation, you are mostly working based on projections and doing what we call qualitative analysis. You work with your business to identify the architectural characteristics to build your capability, but nobody has really used your application in production yet. You don't know the real usage patterns or how your application or capability will actually be used in production.

Until you are actually in production and seeing that usage, you are mostly working based on projections. You will definitely do some market research to arrive at your architectural characteristics, but you are essentially hoping that you have made the right choices. With brownfield, you actually have well-defined metrics and usage patterns already established in production. If you optimize your capability for scalability or performance, you know exactly what level of scalability and performance you are getting today. If you need to improve that, you have a baseline to work from.

In this case, you are using real metrics, which gives you the ability to use quantitative analysis. This way, brownfield can sometimes actually be an advantage. I know engineers love to work on greenfield solutions because it gives a blank slate to implement all their ideas, but brownfield has its own advantages compared to greenfield. You have the opportunity to break the workload into capabilities the same way in both brownfield and greenfield. However, the way you determine your architectural characteristics will be different for each scenario. In fact, if you don't have your workloads broken out into capabilities today in brownfield situations, after this session may be a good time to revisit and see how you can break down your large workloads into smaller capabilities.

Now that we have seen this framework on the three Cs, let's bring it together. Always start with capabilities. The default reaction and trend we see is starting with technologies, but please don't do that. Start with capabilities and pay extra attention to those non-functional capabilities because we are proposing that those need to be your first-class citizens in this approach. Second, extract the characteristics from these capabilities. Make sure that you are not picking too many characteristics. Just give it a single handful. After you have the characteristics, you are then going to shortlist your technology choices. As you can see, it is not until step three that you are thinking about any specific technology. That is when you shortlist your options.

When you do shortlist your options, keep in mind the decision framework of factoring in the enterprise guidelines with the three Cs. That is something important to keep in mind. You then arrive at your architectural patterns, depending on your business. The most important thing to understand here is that the job is not done when your architecture is decided. That is actually when your first iteration begins. You have to evaluate your architecture. As we said, architecture evolution is a continuous process. By doing this over and over again, you are going to be able to evolve your architectures more organically and more consistently over long periods of time so that it will quickly and dynamically support any business outcome.

As you finish your week here at re:Invent, I know you are going to have tons of information about new products, services, features, and solutions, and yet another architectural pattern. But the first thing we would love for you to do is start shifting your mindset to start looking at your architectures as living systems. Recognize that there is an evolution going on. Is that evolution reactive? Is there something that you can be doing proactively? Have a plan for evolution as step one. Let capabilities be anchored at the center of your architectural evolution. As we have discussed, it is not technology, it is the three Cs and capabilities. Have a plan to evolve incrementally and proactively.

Regarding ADRs, we've provided some tips, so please get started on ADRs. We are actually starting to see customers build knowledge-based ADR applications. So who knows, that might be the easiest thing to actually build. Lastly, consult with your AWS account team. We as AWS architects get to see what works and what doesn't work when we look at thousands of customer journeys on AWS. We have learnings from all of this and we are here to share that with you. So please consult with your AWS team.

Remember, at the end of the day, what matters is a disciplined approach to architectural practice. Focus on capabilities instead of technologies and consistency of approach to evolve the architecture. That's what matters. With that, thank you very much. We would appreciate it if you could pull up your phones and provide feedback.

This is the framework, and we also do a workshop around it. If you feel like this workshop would benefit you to apply this framework to your particular use case, you can reach out to your account teams and they'll facilitate that. Lastly, if you would like to see an ADR tool within your AWS console, please indicate so in the feedback because who knows, next year we may just have it. With that, thank you very much. We'll be available in the hallway to my right or to your left if you want to have any further discussions. Thank you.

; This article is entirely auto-generated using Amazon Bedrock.