Kevin Mas Ruiz

Posted on Dec 2, 2019

To Domain Driven Design

#software #teams #culture #ddd

Your company is built on top of a monolith. This monolith is probably your best asset, as your business knowledge is spread inside, however it's also dirty of years of technical debt and teams pushing code without communication between them.

Your monolith is slow, opaque, error prone, not tested. Your developers and sysops teams are afraid of releasing new code, so they end up building and defining heavy process and long release cycles with long manual testing processes. It's because we need to release new versions safely, we can't break production, because recovery or rollback is difficult.

However, the monolith is still there, generating most of your revenue, but also chokes teams performance. How do you improve your main revenue source and also optimize teams for long-term predictability and evolution of your business? Here is where DDD comes in handy.

But before going to DDD (sorry for that 😊) we need to understand why the monolith is still working and serving huge amounts of traffic. Because monoliths are not a wrong software blueprint per se, the issue is with Big Balls of Mud. So let's start talking about monoliths.

Monoliths are extremely cheap and versatile. The reason why monoliths stand for a long time is because decisions made in a monolith are in the mid-term revertible. Because data and code are in one place, refactors are simpler (can be done with your favorite IDE) and data transfer is cheap. For example, let's start with the following use case:

We are an online shopping platform, like Amazon, and we sell books. During the first iteration of our product, we are not validating the stock of our books in the warehouse because we don't receive that much quantity of purchase orders, so we can fix broken orders manually. We end up with the following architectural diagram.

Few months later, our business starts to grow, we start having a few orders per minute and we have a peak of orders during Black Friday and Christmas. We are not able to handle the increasing number of broken orders because our books are out of stock. We decide to implement a StockService that will validate that the books that we want to purchase have still stock during the checkout process.

As you can see, adding a new service and business rule was quite cheap: just adding a few new classes and a dependency to other services was enough. We didn't take a hard decision, we just followed the pattern that was already in the monolith. We could do that because:

Moving data in a monolith is cheap
Decisions in a monolith are limited to a single process
Monoliths have explicit and common patterns
Monoliths can be refactored using the help from the IDE

So what we are doing is pushing forward, not taking a complex design decision and delivering new features growing the technical debt. This allows small teams to iterate fast over a product, however it's a problem when the number of teams grow. The reason is because different teams will need data and logic from different services to fulfill user needs.

As you can see there is an overlap between Team A and Team C on the UserService, as they both need data from their users to guarantee that the functionality. There are three common ways to face this situation, split in the following table in three categories: Onwnership, collaboration and effect.

Ownership	Collaboration	Effect
One of the teams owns UserService	When the other team requires functionality, asks the owner team	Slows down teams as they have a shared backlog of work
One of the teams owns UserService	When the other team requires functionality, makes a PR	Slows down the team that writes the PR because depends of the other team to review functionality
Shared ownership	Requires rutinary communication and collaboration to implement new features	Slows down teams because they have a shared backlog

Because there is no simple solution to this problem, the solution is to split the monolith. To understand the complexity of having different teams on the same code, just take as a reference the complexity of having two threads working with the same set of hundreds of variables in memory.

So we split the monolith into services, after several months or years of work. The most common approach I've seen to split monoliths is the strategy of defining data boundaries. For example, all data related to users will end up in a UserService, the stock information in the StockService and so on.

The problem with this approach is that:

It might look like Domain Driven Design, but it's not, because it's based on data, not on business knowledge.
It might look like a microservices architecture, but it's not, because services are highly coupled between them so neither services nor teams are autonomous.

And we built a distributed monolith, which doesn't benefit of moving data easily and is not capable to refactor with the IDE, also being more expensive in infrastructure costs. So how do we make sure that we don't arrive to that situation?

The most basic advice I would do is to split your achitecture based on knowledge, not on data. How a company structures knowledge depends entirely on the people and the business they are on, but there are several patterns to try that are cheap to explore.

To apply those patterns, we need to think in our business as a business platform: we don't have a product, we have a set of products. Those products are a set of features that apply to a persona. For example, based on this pattern, we can define our shopping platform like in the following diagram:

Each of the product success should be measured and evolved independently. However, as you noticed, there might be dependencies to some cross-product modules. For example, the 1 click purchase might depend on the stock and user information, like the ordinary purchase product. How do we make sure that those dependencies do not affect team performance and we do not duplicate logic?

First, we need to slice the product in modules to understand where the coupling might be happening:

As you can see, both 1 click purchase and Purchase require information from the same sources. However, if we go deeper we can will see differences:

Are the buyers that are going to use the 1 click purchase and standard purchase the same?
The information that we need about books is the same in both processes?
Is the information of the stock relevant in the same way in both products?
Is the shipment information used the same way in both products?

If those questions are yes, what we are probably building is the same product twice, so most probable some of them are at least no, they are different. Let's take a closer look:

Data Source	1 click purchase	Purchase
Buyers	Only people who already bought before other books	Everyone
Books	We need all possible info	We need all possible info
Stock	We only need to know if we have enough stock	We need to know when the stock is low to push the user to buy
Shipment	Only home shipping	Home shipping and delivery companies

In our case, only Books share the same traits, and they are not behaviour but data. This situation means that our products are bounded contexts where knowledge and understanding of user problems is different. And it makes sense because we are linking knowledge to products, and products to personas.

When we are sharing information between bounded contexts, we should, whenever is possible, favor team performance. This means that sometimes we need to duplicate knowledge. This is quite common in other systems: we have sinks both in the bathroom and in the kitchen. There are different ways to share data across bounded contexts, I personally prefer data streaming with an event based architecture (like SQS) or with an data streaming platform (like Kafka, doing state sourcing). You can also share information with more simple tools like database views (if you have a distributed database like Yugabyte or a AWS RDS).

And even if those kind of patterns seem wasteful, consider a moment how our body works. Our body is piping blood always to our muscles and organs to guarantee availability and health. Consider now if, in your body, every time a muscle wants to move, needs to ask to your heart some blood, and in consequence your heart needs to ask for oxygen to your lungs. Now repeat every second, per muscle.

However, the information needs to come from other bounded contexts (for example, registration processes for new buyers) and they need owners. We can lather, rinse and repeat and split more products until we have smaller modules that are easier to handle for our teams. For example, the following diagram shows product and dependencies on an imaginary book shopping platform:

If we find that most of the related information is exposed to other products (for example, it could happen that all information exposed in Express Sign Up and Profile Sign Up is read in other products, the same way) we can centralise the product to a more generic (generic for personas, not for businesses) and expose a simpler service (like a UserService).

So, to summarize, I would like to share some points that I think are useful:

Thinking in platforms allows us to split our business better.
Linking products to personas and also to bounded contexts makes boundaries explicit.
State-sourcing and event-driven architectures are essential for building distributed and available platforms.
Teams should not share code, but a common platform.

Thanks for reading!

Top comments (21)

Vivien Adnot • Dec 3 '19

Hello Kevin, thanks a lot this article is really great ! Keep the good work :)

I have 2 questions for you:

1) When you write "It might look like a microservices architecture, but it's not, because services are highly coupled between them so neither services nor teams are autonomous."

I understand the collision here and the difficulties of teams working on the same codebase, but I feel that if even if we split it in 2 microservices, the dependency will remain, as the checkout team needs info from stock to complete a checkout.

Hence, the checkout team will ask for more feature to checkout team, and the slowness and coupling will remain

I don't see how to decouple things here, maybe I failed to understand your solution ?

2) When you talk about the term "Products", If I understand well it designates:

Profile Sign up
1 click purchase
Bestseller page
Search by genre and author
...

It sounds a bit odd for me, because I naturally tend to think that the product is Shopping Plateform, and "1 click purchase" and so on are Features.

Is "Product" a term of DDD ?

Kevin Mas Ruiz • Dec 3 '19

Hi Vivien, thanks for your feedback! Let me try to answer and clarify your questions :).

1) Systems need to interact with other systems to remain useful and when there is interaction, there is some coupling. You can earn some benefits, like availability and team performance, when this coupling is temporary.

For example, if you have a CheckoutService and every time you want to buy a book you need to query the StockService (through a HTTP endpoint or any other way) you have direct coupling every time you need the information.

If you invert the communication, and the StockService or any other peer offers you updates on the stock through a broadcast mechanism, you don't mind if the StockService is down or not, you only need the updates in a place where you can read it and doesn't depend on the availability of the service. You need someone to provide the information, but you only need to agree just on the format and the quantity of information you need.

With Apache Kafka, for example, if you configure your partitions in a specific way (partition key = domain id) and you implement a state-sourcing mechanism, you don't even need any service to replay states when you implement new functionality because you can replay the state by yourself.

2) There is a book named Lean Inception written by a Thoughtworks colleague (Paulo Caroli) that uses the name MVP for defining products that can be deliverable in a predictive way. Those MVPs are bound to a set of personas (you focus on solving the problems of a single type of user), a set of user journeys for that personas, and you define an autonomous way to measure the success of the MVP.

This mechanism (that's from my side, not part of the book) allows business to define a set of products that have a relationship between them and can be evolved independently of other products. You usually will build new products when you want to widen the impact of your platform or experiment on new business slices, and you will implement new features on a MVP when you want to optimize the revenue you have from an already known subset of personas.

Thinking in a business as a "platform" of products allows us to define boundaries that are relevant for the business, autonomous, and have impact on business metrics. It's just a way of defining boundaries that is a bit different, but it doesn't necessarily mean it's right :D. I've used it a couple of times in different teams and it worked quite well.

Please feel free to follow up here if the answers were not clear enough.

Thanks!

Yaser Al-Najjar • Dec 11 '19

Beside the great points Kevin (@kmruiz ) already mentioned, I would like to add couple of points:

If the services are "chatty" when they talk to each others (via HTTP or whatsoever), you might wanna combine them into one service.
It's better to start by writing all the business capabilities (aka features) that a service should handle, and that's ideally before writing any line of code so that you could set the boundaries clearly.
Sometimes, it's very very OK to put everything into one service, and separate them later when the boundaries are clearer.
You won't get this right from the first iteration, DDD is like a programming-style that goes beyond the initial design, but also during the whole development.

Vivien Adnot • Dec 12 '19

Thank you a lot for your great advices @kmruiz and @yaser they are inspiring !

Philippe Bourgau • Dec 16 '19

Hi Kevin. I like your metaphor with muscles and the body. I found that the link between team organization and microservice boundaries is too often ignored.

I found that using Event Storming is a great tool to explore the problem space. It's very useful to draw bounded context boundaries and can also be used to organize teams.

Thanks for this post!

Michael Remijan • Jan 10 '20

Hi Kevin, in this blog, you stated a "product" (or feature) of the platform is a Bounded Context in the DDD world. After you have the "product", you show how to "slice the product into modules". Would you consider the "module" you describe as an Aggregate/Entity inside a Bounded Context? It seems like that is what they would be based on their names...Buyer, Book, Stock, etc.

Kevin Mas Ruiz • Jan 13 '20

Hi Michael!

Some of the modules can be aggregate roots, and others can be just query models, depending on the business logic.

What I've found is that usually you would have only once an aggregate. For example, the aggregate root Book, which handles the consistency of book info, will be in a single product. The reason is to guarantee consistency of business invariants. Other books will be some variant of read models.

Michael Remijan • Jan 13 '20

Ok that gets to another question. When you say "Other books will be some variant of read models", how would get that book data to the other "products"? I like your body analogy when it comes to coupling Microservices with direct calls. Even with small applications I can see a "death star" pattern emerging quickly. So would use a Domain Event to push aggregate book data to other products with the caveat they must remain read-only? Or would the other products use the same database as the aggregate book data? Or (like in most situations) is it a combination based on the situation. What have you found to be most successful based on past experience?

Kevin Mas Ruiz • Jan 20 '20

You can use both of them, I would suggest usually sharing snapshots of data. For example, sending the snapshot through SQS, Kinesis or even better Kafka so you can replay the state.

This way you give data ownership to the writer, guaranteeing data consistency, and you let the readers consume the information they need. However, this kind of infrastructure is quite expensive and teams can have a hard time learning about them.

As always, consider the cheapest solution. For example, right now, in a project, we are using just materialized views in a Oracle database as it's easier, trading off availability for simplicity (which most of the times makes sense).

The most successful approach to CQRS that I've seen is using Kafka to share snapshots, partitioned by the aggregate id, and a topic per type of domain entity. This allowed teams to consume from Kafka without fear of breaking other services, as they just need another consumer group. New services can just plug in to the topic, consume the whole state, and generate a view in whatever fits better for them.

Thanks!

Achilles Moraites • Dec 11 '19

Hello Kevin ,
the article is well written and to the point !

Thanks for sharing!

Eduardo Dobay • Dec 3 '19

Nice article! Your comment on the UserService makes me wonder: is it customary to duplicate user data e.g. in all services that have some need to display the user's name? Or would it be more appropriate to keep that in a centralized service?

Kevin Mas Ruiz • Dec 3 '19 • Edited

I always try to balance team autonomy and crossfunctional requirements.

If having some piece of information across all components that have a dependency and moving data is too expensive, it's a good decision to centralise the user information in a single service.

This comes with several tradeoffs: all those teams will depend entirely on a new platform team, slowing down value delivery. This, from a pragmatic point of view, is not dramatic because teams usually have other blockers before that.

However I always ask, in terms of data locality, what would be the difference between propagating data with state sourcing or events, and having a cache in the client service after an HTTP request.

I hope I answered 😊.

Thanks for reading!

Michael Remijan • Jun 10 '20

Hi Kevin. You say "Those products are a set of features that apply to a persona". From this I'm assuming that a product may contain multiple features, but do all the features in the product apply to only 1 persona? Meaning if a feature applies to a different persona, is that an indication that feature belongs in a different product?

Kevin Mas Ruiz • Jun 10 '20

Hi Michael! thanks for the question!

Not necessarily, it depends on the size of your user segmentation. You might want a product for a bigger more abstract segmentation if you prioritise quantity of traffic instead of quality.
For example, in classifieds, you usually have just a single search webpage for everyone because quantity of traffic is your most valuable source of income (advertisement). In this post, the example is the registration form.

There are some situations when you prefer quality traffic and a more seamless user experience. For example, again in classifieds, you can have a product for luxury real estate with a different user experience to generate a different type of income from that user segment. In this post, for example, you have two checkouts, the normal one and the fast one.

It is a matter of tradeoffs, you should always consider at least business opportunities (probability of change), business maturity (platform stability) and business boundaries (team performance).

Daniel Wu • Dec 10 '19

Hey thanks for this article. I think there are some great ideas to dive into, especially for fast growing startups where business requirements change frequently and people join or leave constantly.

Do you have any more recommended reading for someone who's never heard of domain driven design?

Kevin Mas Ruiz • Dec 10 '19

Hi Daniel! Thank you for your feedback!

If you are brand new to Domain Driven Design, the best way to start is with the tactical patterns, which are easier to apply to already existing code and benefits are more obvious to developers.

Said that, the easiest book to start with is Domain Driven Design Distilled, by Vaughn Vernon. It's not expensive, is quite short and it's a good introduction to it.

Thanks!

Filip Nowak • Dec 4 '19

Hi Kevin, great artice, thank you very much.
Did you mean Event Sourcing by the "State Sourcing"?
Thanks.

Kevin Mas Ruiz • Dec 4 '19

Hi Filip, thanks for the feedback!

I think I should have elaborated a bit more on the topic. When I mean state sourcing is something that I've implemented a few times on different projects that is based on propagating the whole state of an aggregate every time it changes.

We used this pattern for example with Kafka, where we had a topic per aggregate type, and we were sending the aggregate state partitioning using the domain id. This allowed us to replay the whole story of states of an aggregate easily to materialize new views.

You could do something similar with event sourcing, the difference is that events are meant to share domain knowledge, and state snapshots are meant to share information.

Thanks for reading!