Nicolas Moutschen

Posted on Apr 27, 2020

Breaking changes: handling dependencies and evolving APIs

#architecture

One of the key tenets in software engineering is to not repeat ourselves. Violations of the DRY (Don't Repeat Yourself) principle can be called WET, for Write Everything Twice, or Waste Everyone's Time. We often strive to build libraries, functions, classes, modules, etc. where a given logic is centralized and can be reused in multiple places.

When it comes to complex and highly reusable logic, this is a very good thing. If you had to spend a few days figuring out an algorithm that is needed throughout a codebase or multiple projects, having a reusable artifact is a very powerful construct. It also means that there is one single and coherent implementation of that algorithm, rather than reinventing the wheel in slightly different ways every time you need it.

As simple as rounding

Something as simple as rounding to an integer can be solved in many different but equally correct ways. In a financial application, you could decide to round half-values based on what is most advantageous to you: round downwards for amounts you owe and round upwards for amounts owed to you. While these are small differences, they would have a significant impact on a large number of transactions.

While morally dubious, it does not cause any problem with the system itself. So let's imagine an application where a payment component rounds down the amount to be paid, while an accounting component rounds up. If this happens a thousand times per day, and using euros as currency, this would result in a discrepancy of 3 650 euros per year.

In this scenario, the advantages of having a shared rounding component are rather obvious: we can argue once about which rounding method to use, document our decision, and build a library or function that everyone can use. If asked by the local financial authority, we can explain why we chose that algorithm and show that it's consistent across the application.

However, let's say that we need to change that rounding algorithm, for example, because of a change in regulation. In a monolithic application, we change the function code, rebuild the application, deploy it and it's done. Even if we need to add a new parameter (let's say we now round based on whether the day of the transaction is odd or even), we adjust every piece of code that calls this function and we are able to deploy the change. In this scenario, our function's API has a single consumer: the monolithic application itself. Both are deployed at the same time. There is a tight coupling between our function and its consumer which allows making breaking changes with almost no impact.

Changes in a distributed application

Modern applications are often made of different components (services, microservices, etc.) running on different systems. One of the key benefits of these types of architecture is that each component can be updated independently from each other. Here, our rounding function would probably be in a library which would be a dependency of all microservices that need it.

If we run into the same requirements, where we need to change our rounding logic for the entire application, we have now lost the ability to change everything at once. The payment component will be its own microservice, while the accounting component will be another one, or each of those could be comprised of multiple microservices. A single of those microservice could run on multiple hardware systems at the same time.

This presents us with two issues: a single component could be in a partially updated state, and different components could be updated at different times. The change is no longer atomic. However, we might have a requirement to have an atomic change.

We switched from a model where we had a single consumer that was deployed at the same time as our function's API to a model with multiple consumers that control their own update cycle. We could build a microservice with a REST API just for rounding numbers, but this would still cause issues if we need to make a breaking change in the request (e.g. needing a transaction date).

In real-life scenarios, while atomic changes across an entire distributed system could be necessary, they are fairly rare and we should strive to design them to group parts that could change together.

One step bigger

Let's make our rounding library open-source and let's say that tens of thousands of applications now depend on it. We started with a single consumer, then had a handful of them, and now we have such a large number that we cannot know who is using it anymore or reach out to them in case of any breaking changes.

We still need it for our financial application, and we might still need to make changes based on business or regulatory requirements. We might put a warning that "this library should only be used for financial rounding in country X", what mechanisms do we have to enforce that it will only be used in that scenario? If we put a change that fits our business case, this could impact others that don't have the same requirements. This could create discrepancies in applications that consume our library.

Semantic versioning solves most of the issues around this. If our library now requires a date to round a number, that should be a new major version. However, some changes are harder to categorize as breaking the API. Is changing the rounding method for half values a bug fix or a breaking change? What if it is because our library was not compliant with country X's financial regulations? Since we created this library for that exact purpose, it makes sense to treat it as a fix, but how many consumers would share that view? What about a performance improvement that is 10% faster 99% of the time, but ten times slower 1% of the time? What if that 1% is the normal case for a consumer?

When we add dependencies

When we add new dependencies to a project, it often simplifies our lives: we do not need to reinvent the wheel and we shouldn't when there are people that did the hard work and offered it as a library or as a REST API for us to use.

However, there are always risks involved when we add a new dependency and they should be compared to what we gain from using it. What is its purpose? Is it compatible with my reason to use it? Who will maintain it? What if I find an issue? There are also other risks with dependencies on microservices, such as latency and reliability.

The code we write is a liability because we are responsible for its bugs and vulnerabilities. In the same way, the code and systems we depend on are also liabilities.

While the way we round numbers is a trivial implementation detail for most systems, it can be crucial for some, and many industries have their own specific requirements that are implementation details in others. When we have such requirements, doesn't it make more sense to own the implementation, even though we would violate the DRY principle?

Should we have dependencies?

If we think about it in absolute terms, a dependency could mean a piece of code, an operating system, a hardware architecture, etc. If I am using a specific CPU to run my application and that CPU has a bug, this is a liability that I have to take into account. The liability is still there, but it would be absurd to build our own processors and design our own architecture every time we need to build a new system.

We are standing on the shoulders of giants. We have access to the work from all those years and all those people who built the systems, languages, libraries, protocols, etc. we use today. It would be foolish to reject all that.

However, it would be as foolish to trust everyone blindly and to assume that other people's intentions align with ours. We've had complex security attacks by taking over repositories, subverting the original intent of that project into something more nefarious. We can also have tools with different, albeit well-meaning, intents.

If dependencies are liability, this means we need to consider more than just their benefits to us as reusable components, but also how are they going to evolve, how do the maintainers' plans align with ours, etc. This means we might need to fork their project for our specific purposes, reimplement these features, etc. as we see fit.

This also means that not everything we built can be made into reusable components, even within the same codebase. After all, if we make a crucial library in a project open-source, we now have to handle with how other people will use it, irrespective of our original intent. Its evolution and use might go further than our original plans and might end up differing from our actual needs.

DEV Community

Breaking changes: handling dependencies and evolving APIs

As simple as rounding

Changes in a distributed application

One step bigger

When we add dependencies

Should we have dependencies?

Top comments (0)

Read next

The Power of Technical SEO: Boosting Your Website's Performance

Eat That Frog Method: The Ultimate Guide to Boosting Productivity

The right development flow: Better than Agile

Learn and get certificate with Microsoft Learn