DEV Community

Cover image for Delving into Architectural Drift
Vladi Stevanovic
Vladi Stevanovic

Posted on

Delving into Architectural Drift

When building a software system, engineers recognize three fundamental truths:

  1. All software has an architecture, whether it is designed and implemented intentionally, or it emerges by coincidence.
  2. System complexity grows over time as requirements shift, use cases evolve, team members come and go, and technologies advance.
  3. Without deliberate management of the system’s evolution, its architecture may stray from its original objectives and blueprint, leading to unintended architectural drift.

Architectural drift can affect any application, irrespective of its initial architectural style or the quality of its original design.

After all, the ease of making changes to a system has significantly increased compared to the past. Gone are the days when you had to phone your hardware rep to order a new server, confirm that there was rack space for it in the data center (i.e. basement), configure it, integrate it into the network, and migrate data into it. Nowadays, a few clicks in a cloud vendor console or a commit to Infrastructure as Code (IaC) can redefine your application’s architecture in mere minutes.

Engineers not only face the challenge of adapting to continuously evolving system architectures but also grapple with the shifting role of software architecture within agile teams and the lack of visibility into the architectural decisions made by other teams.

This leads to a vicious cycle where architectural drift makes the system increasingly difficult to comprehend and modify. Developers find it challenging to implement changes without causing unintended side effects or disrupting existing functionalities, which leads to further architectural drift.

As a result, development cycles lengthen, costs rise, and overall productivity declines.

Understanding the root causes, consequences, and strategies for managing system architecture drift is vital. This blog post aims to provide a comprehensive exploration of architectural drift, including its definition, underlying causes, consequences, and effective mitigation techniques.

The What and Why of System Architecture Drift

System architecture drift falls under the broader category of Architectural Technical Debt, which encompasses all the intentional and unintentional decisions made during the system design process (e.g. architectural style, tech stack, development methodologies, etc.), that result in issues such as reduced maintainability, increased complexity, decreased performance, and scalability challenges.

Specifically, Architectural Drift refers to the gradual deviation of a system’s architectural design from its original or intended architecture due to ad-hoc alterations and additions.

Imagine that you’re building a house designed in the Mediterranean style, complete with characteristic clay tile roofing, stucco walls, and wrought iron balconies. Suppose, partway through, you decide to incorporate a Gothic-style round turret to accommodate your astronomy hobby. Later, to cater to your growing family, you might add a post-modern extension. What began as a cohesive design evolves into a disjointed amalgamation of styles.

This analogy mirrors the reality in software architecture: the system may start with a clean architecture but evolve into a complex tangle of multiple architectural paradigms, inconsistent coding practices, redundant components, and tangled dependencies due to uncoordinated additions and modifications.

DALL-E’s interpretation of architectural drift as a house

There are various reasons behind system architecture drift:

Adapting to Evolving Needs: Architectural adaptations are often intentionally made to align with changing requirements or evolving business needs. Software systems are dynamic entities, and architectural drift can sometimes reflect an organization’s commitment to rapidly delivering new functionalities, meeting customer demands.

Unclear Architectural Governance: Development teams lack a defined architecture, guidelines, or principles to guide their work. This lack of direction can lead to ad-hoc decision-making and improvisation, contributing to architectural drift.

Result of Architectural Technical Debt. Compromises, shortcuts, and errors made during development can have a knock-on effect on other architectural decisions. Examples of ADT issues that influence future choices:

  • Re-inventing the Wheel: Choosing custom-built components over existing solutions with similar functionality (e.g. building your own persistence library).
  • Permanent MVPs. A temporary, “bare-bones” solution becoming an integral part of the system’s architectural foundation (e.g. adopting prototypes of a new architecture, immature R&D components, experimental development branches, etc.).
  • Persistent Workarounds. A temporary workaround, implemented to bypass some architectural constraints, becoming deeply embedded into the architecture.

Inadequate Collaboration. Poor collaboration within and between teams can exacerbate architectural drift. Without a shared understanding or visibility into the overall system architecture and modifications made by other teams, disjointed efforts can lead to increased fragmentation and inconsistency in the system’s architecture.

Architectural drift ultimately builds over time due to a multitude of factors — different business needs, architectural directions, rogue spinoffs — resulting in increased system complexity and creating a less maintainable and cohesive architecture.

Architectural Drift in Real Life Systems

Recognizing and addressing drift is not just about maintaining the system’s current state but about safeguarding its future agility and robustness.

This is particularly evident when looking at real-life examples of architectural drift and its consequences. For example:

Linux Kernel

A research group compared the Linux Kernel main subsystems architecture documentation with their source code. After manually reverse engineering the software architecture from the code, they found significant architectural drift and erosion: there were a number of violations between the prescriptive, recorded architecture and the actual architecture.

Besides the surprisingly high number of (unnecessary) dependencies between the components, what is most striking is that interviews with the developers surfaced that they were unaware of the architectural degradation. A common reason given was “it had to be done fast, and I didn’t have time to go back and update the documentation”.

Linux Kernel Main Subsystems

X (formerly Twitter)

At the beginning of 2023, X (formerly Twitter) encountered severe issues and service disruptions as a direct consequence of the sweeping layoffs of significant portions of the engineering teams who designed and build this massive social network.

Elon Musk himself admits that the X system architecture is massively complex:

"The code base is like a Rube Goldberg machine, and when you zoom in on one part of the Rube Goldberg machine, there’s another Rube Goldberg machine, and then there’s another one, so it’s quite difficult to keep this thing running, and then also difficult to advance the product because it is really overly complex, to say the least.” — Elon Musk at the 2023 Morgan Stanley TMT Conference

And there is convincing evidence that high architectural complexity is linked to a much higher defect rate (in addition to a decline in productivity, and system understanding).

However, the timing and nature of these disruptions is indicative of additional underlying issues: the layoffs not only reduced manpower but likely led to a loss of critical institutional knowledge regarding the system’s architecture. This gap in recorded system information, combined with the existing architectural drift, compounded the challenges faced by the remaining team in diagnosing and resolving the service disruptions.

There are countless examples of real-life issues caused by architecture degradation, highlighting the importance of proactively addressing architectural drift. They range from the Hadoop developers not realizing that 61 out of the 67 components in the system had circular dependencies, to the Knight Capital Group going bankrupt in 45 minutes (partially) due to deleting code that was thought to be “dead” but was still actively used.

The Temptation to Ignore Architecture Drift

The gradual nature of architectural drift, with its incremental changes that often go unnoticed or don’t immediately “break the system”, often leads developers to overlook or dismiss its significance.

However, akin to the insidious spread of rust or rot, architectural drift gradually deteriorates the original composition and structure of an application, having a direct technical, business and cost impact.

(1) It paves the way for future complications.

When we introduce unaccounted-for architectural decisions, we open the doors to a range of issues:

  • Security Risks: Introducing unexpected resources or deploying them incorrectly can create security vulnerabilities, particularly if these changes escape the notice of the security team.

  • Delayed Feature Implementation: Deviating from the intended architectural blueprint complicates the integration of new features, potentially impairing the system’s ability to adapt to evolving business demands.

  • Resource Misconfigurations: The increasing complexity resulting from architectural drift amplifies the likelihood of misconfigurations, leading to deployment issues. This is exacerbated in environments where multiple teams deploy to a shared cloud, raising the risk of resource conflicts or redundancies.

  • Escalating Costs: Incremental enhancements to the architecture can cumulatively inflate cloud expenditure significantly.

  • Unmonitored Resources: Departure from the planned architecture might render existing management and maintenance protocols ineffective, leaving behind unpatched and potentially vulnerable resources concealed within the architecture.

  • Higher Engineering Overhead: An unplanned architecture demands more maintenance and resources, particularly for sub-optimal choices or solutions not aligned with the architectural roadmap. This complexity not only increases the costs associated with system upkeep but also prolongs the onboarding and offboarding processes, given the undocumented nature of system modifications.

(2) It highlights issues with the original architecture.

Architectural drift may not always be bad news! Drift that occurs in response to new requirements — which could not be satisfied within the confines of the system’s existing architecture — can highlight previously overlooked but crucial design elements.

It’s imperative to understand if the original architecture was inherently flawed so teams can take appropriate steps to address the problems.

Ignoring architectural drift is akin to turning a blind eye to the slow but sure erosion of a system’s integrity and functionality. Over time, the accumulating gaps between the actual architecture and the intended architecture can lead to decreased maintainability, reduced system performance, and increased complexity.

Effective Measures to Control System Architecture Drift

Given the significant impact of architectural drift on a software system’s maintainability, performance, and overall quality, proactive management is essential.

Teams can employ three types of measures to control system architecture drift:

  • Organizational Competence: Engineering teams must possess the architectural skills necessary to build robust systems. Beyond mere technical knowledge and experience in architecting systems, fostering a culture where system architecture is regarded as an ongoing team responsibility is crucial. Encouraging collaboration and a shared understanding of the architecture, along with its guidelines and principles, reduces the likelihood of unauthorized deviations, redundancies, and uncoordinated additions.

  • Proactive Prevention Methods: Adopting best practices like (some) upfront system design and continuous system design reviews ensures that potential architectural drift is caught as early as possible. While reactive measures can discover and address drift after it occurs, they often lead to labor-intensive and ultimately ineffective solutions like major refactoring. Ultimately the root cause of the accumulated architectural technical debt is the way the team approaches development, and if that is not addressed, the same problems will reappear.

  • Tools for Detecting Architectural Drift: Utilizing tools that automatically detect changes and drift in the system’s architecture, dependencies, and APIs is fundamental. Ideally, these tools should also (a) visually represent system changes to easily identify deviations or inconsistencies, and (b) enable proactive design reviews to stay ahead of drift. An ideal tool would also provide comprehensive and up-to-date system documentation, ensuring that the team fully understands all architectural decisions, past and present.

A well-defined and enforced architecture enhances a system’s testability, documentation, and extensibility. Such attributes contribute to increased development velocity, quicker onboarding of new developers, and greater agility, allowing organizations to respond more swiftly to market demands.

In conclusion, managing system architecture drift is a vital aspect of software development and it’s highly dependent on how an engineering org approaches system design.

It necessitates vigilant and proactive management and requires teams to have the right tooling so they can “see around corners.” By understanding its nature, causes, impacts, and implementing effective control measures, teams can maintain a stable and adaptable architecture.

Top comments (8)

Collapse
 
efpage profile image
Eckehard

System architecture is the result of bad experience. With every project your experience grows, so it is no wonder architecture is evolving.

For a young developer with very little experience, the shifts will probably be larger than for an experienced one. But some day you may find, that times have changed and your "do it again, Sam"-experience is not useful anymore. Then it is time to go shifting...

Collapse
 
vladi-stevanovic profile image
Vladi Stevanovic

Good point. The experience of the engineering team building the system is a factor (among the others) that affects the degree to which they will experience drift.

But I also think that it's very difficult to account for every future requirement, technology shift, etc. and so system evolution is inevitable regardless of experience. 😊

Collapse
 
uniquereplica profile image
Steven Read

Completely agree - sometimes this can even be architectural infeasibility, when skillsets or culture are not up to the planned solution design, and something needs to give.

One technique that can help is to use architecture decision records. You can even start using them to record previous decisions - @tekiegirl wrote about that recently. User stories and other written up collaborative techniques can help too, but tend to focus on the user-visible aspects, as opposed to why and and what for from the system's perspective. They can improve the decision due to consultation and discussion but are a poor place to store your architectural designs!

Collapse
 
vladi-stevanovic profile image
Vladi Stevanovic

That's a great comment - I'm a huge fan of ADRs! Recording why you took a specific system decision and what trade offs you considered (and accepted) is critical when looking to evolve the system.

However, I think ADRs are affected by similar issues that Docs suffer from: they are not always systematically created during the development process, they are mostly manually, potentially fragmented across multiple sources and require lots of maintenance. That's all to say, that alone they don't solve the underlying problem 😊

Collapse
 
mortylen profile image
mortylen

Good article, thank you.

Collapse
 
vladi-stevanovic profile image
Vladi Stevanovic

Thank you for checking it out!

Some comments may only be visible to logged-in visitors. Sign in to view all comments.