In recent years, teams have been buzzing about microservices, with many organisations jumping on the bandwagon. Even the US Air Force now runs its latest fighter jets on k8s. However, just like Agile, SCRUM, or the ‘latest’ software development methodology, success isn’t guaranteed. What we are seeing is a realisation that the complexity of Kubernetes has a cost. A cost that is not always beneficial unless running at a larger, more complex scale or team topology. This is why some teams are now making a reversal, returning to the monolithic architecture they once left behind.
Splitting applications up with APIs gives us a defined separation of responsibility. It’s hard for 100+ people to cooperate together to build a monolith application. But if you had 10+ teams of 10 people deploying their own microservices it’s easier to decouple and to deliver at the pace each team needs.
The problem…
Getting everyone to agree on what these individual services should look like is where problems arise. Do you assign a team to a single function, or is it based on business unit requirements? For APIs, who is deciding the definitions and are they being documented? Conway’s Law states that the design of a system mirrors the structure of the organisation responsible for creating it. While microservices can offer better separation between teams, this advantage may not always be realised due to the inherent team structure or even culture of an organisation. In such a situation, monolithic architecture may start to look more attractive.
So it’s not surprising to anyone that articles like Amazon Prime Video’s “Microservices to monoliths” will emerge from time to time. However, in this example, they needed to handle multiple state transitions per second as part of video streaming data. That’s really not a great match for serverless and led to some impressive cost savings for Amazon. However some question remain:
Isn’t this something that should have been made apparent in the upfront design?
Was this an example of jumping in headfirst and developing without thinking through the problem? Analysis paralysis is a real thing organisations face but could they of over-corrected on that a bit.
Or would you argue employing serverless technology for rapid product testing was a smart initial move? There’s value in just getting things out the door and iterating, but it seemed that could be happening with less and less foresight, leading to bigger issues and larger refactors/iterations.
In this example, the issue arose when they failed to recognise the expenses associated with step transitions, which then led to the subsequent optimisation step of transitioning from a step function to a single EC2 component.
With that being said, the original post from Prime Video Tech contains numerous gaps, leading to confusion and a seemingly inaccurate title of “From distributed microservices to a monolith application”. The process appears to be more of a refactoring rather than a complete transformation.
Deciphering the Monolith Puzzle
So where does that leave us? Choosing the right architecture for your organisation is a balancing act. It’s possible to maintain the separation of concerns and scale different APIs using a monolithic architecture while still enjoying the benefits of microservices.
To decide whether you need to move back to a monolithic architecture or fix issues in a distributed monolith consider:
- The trade-offs and timeframe.
- Analyse the pain and productivity loss from microservices over time and weigh it against the cost of migrating to a monolith. Taking into account factors like Conway’s law, team size/ topology, experience, and expertise.
Team Topologies — by Matthew Skelton and Manuel Pais does an excellent job at providing a framework (grounded in Conway’s Law) for structuring teams to meet the needs of users and align with the architecture of the systems you’re building.
Overmind is a SaaS Terraform impact analysis tool. It discovers your AWS infrastructure so that it can calculate the blast radius of an application change, including resources managed outside of Terraform. Helping you to identify the causes of outages by showing you which changes caused which problems. While also helping you to deploy changes faster by giving an impact analysis report before any change is made. From this report you can understand if the change can be confidently made, or held back if it’s too risky, preventing outages in the first place.
Top comments (0)