Recognizing and addressing drift is not just about maintaining the system’s current state but about safeguarding its future agility and robustness.
This is particularly evident when looking at real-life examples of architectural drift and its consequences. For example:
Linux Kernel
A research group compared the Linux Kernel main subsystems architecture documentation with their source code. After manually reverse engineering the software architecture from the code, they found significant architectural drift and erosion: there were a number of violations between the prescriptive, recorded architecture and the actual architecture.
Besides the surprisingly high number of (unnecessary) dependencies between the components, what is most striking is that interviews with the developers surfaced that they were unaware of the architectural degradation. A common reason given was “it had to be done fast, and I didn’t have time to go back and update the documentation”.
X (formerly Twitter)
At the beginning of 2023, X (formerly Twitter) encountered severe issues and service disruptions as a direct consequence of the sweeping layoffs of significant portions of the engineering teams who designed and build this massive social network.
Elon Musk himself admits that the X system architecture is massively complex:
"The code base is like a Rube Goldberg machine, and when you zoom in on one part of the Rube Goldberg machine, there’s another Rube Goldberg machine, and then there’s another one, so it’s quite difficult to keep this thing running, and then also difficult to advance the product because it is really overly complex, to say the least.” — Elon Musk at the 2023 Morgan Stanley TMT Conference
And there is convincing evidence that high architectural complexity is linked to a much higher defect rate (in addition to a decline in productivity, and system understanding).
However, the timing and nature of these disruptions is indicative of additional underlying issues: the layoffs not only reduced manpower but likely led to a loss of critical institutional knowledge regarding the system’s architecture. This gap in recorded system information, combined with the existing architectural drift, compounded the challenges faced by the remaining team in diagnosing and resolving the service disruptions.
There are countless examples of real-life issues caused by architecture degradation, highlighting the importance of proactively addressing architectural drift. They range from the Hadoop developers not realizing that 61 out of the 67 components in the system had circular dependencies, to the Knight Capital Group going bankrupt in 45 minutes (partially) due to deleting code that was thought to be “dead” but was still actively used.
What's Next
For more about system architecture drift check out these next articles:
- 101 on Architectural Technical Debt
- [Part 1] Delving into Architectural Drift
- [Part 3] The Temptation to Ignore Architecture Drift
- [Part 4] Effective Measures to Control System Architecture Drift
Thank you for reading! 💜
Top comments (0)