Warren Parad for Authress Engineering Blog

Posted on Aug 5, 2022 • Originally published at dev.to on Aug 5, 2022

Breaking up the monolith: Breaking changes

#rest #microservices #monoliths #api

Before we get into how to handle a breaking change, we should first identify what is even a breaking change.

What is a breaking change

A breaking change is:
anything that causes a hypothetical client of your service using the service in anyway to start behaving differently.

That’s a broad statement, but it’s true. Even if you don’t make changes to the API, if you change the expectations around how endpoints work, it will break clients. Therefore it is a breaking change. It’s also important to realize that you might not know how every client is using your API, so whether or not there is a real client you can point to is irrelevant.

Some examples of breaking changes might be:

API interface property type is changed (from int to string for instance)
Size of the data property is changed, (from 3 to 4 characters)
Returning an additional enum value in an enum property where you didn’t first explain to the clients that this list can be expanded. The general recommendation is that this isn’t a breaking change. But remember it doesn’t matter if you think it is, it matters if your clients do.
If you return an inconsistent or different error code or response status code. Returning a 400 instead of a 404 can be considered a breaking change. 404 means something, it’s possible that the 404 was a bug, and the resource really existed. So sometimes making a breaking change is a good thing.
Allowing the schema type of a property be different in difference circumstances, i.e. returning an int or a string, just don’t do this. While it is possible to document the union types, it's a huge headache for development teams to deal with.
Requiring a previously optional property or requiring a new header to continue having previous functionality. Clients not sending the header or the property will now get a 400 or 422 back on their response, instead of the previous 2xx.

How to handle breaking changes

We when we run a service, either a UI or an API, that service has endpoints or URLs that point to representations of resources. When we change the schema/interface/expectations around how that endpoint works we are introducing what is known as a breaking change.

One example of a breaking change is a changing a property in the response from type int to type string. The reason it is a breaking is because this change can cause a client of the API to incorrectly parse the response.

If the client has code that says:

if (response.property * 10 < 100) { doSomething(); }

Then how property will be handled by different languages might result in a runtime exception or worse, no exception but improper handling of the result.

There’s obviously a need to introduce the string version of property, we don’t have to care where the need came from, but one example could be, we ran out of numbers. Transactional data runs into this problem all the time, and converting from a sequential int to a guid is one way to help.

Note: I general solution to this problem is that all identifiers must always be strings , never make an identifier an integer.

This is the solution to the problem, but we might have made a mistake, and in hindsight this is obvious, but doesn’t help us.

So what do we do?

Versioning endpoints

One solution is to prefix all your endpoints (or use a header or query parameter) to tell the service which version of an endpoint to use. Let’s define what we mean by versioning an endpoint. Versioning an endpoint is not “running multiple versions of the service at the same time”. It means adding an indicator to the endpoint so that callers can select which version they want. While in practice this can be done, it actually isn’t a concept that is but into practice. Here we will see why.

For instance we might have:

GET /v1/demo and now we’ll introduce GET /v2/demo Where in v1 we return an integer version in the property field and in the v2 we return a string version.

This works but it is a very bad and terrible idea. The reason is that clients that want the new functionality have to find and update their code to reference the new version of the endpoint. Another reason is that you might have multiple changes in progress at one time, does the v3 endpoint contain two changes at once, or what if you have three changes, how does that even work.

Another core problem here is that it just isn’t RESTful. While you might have something to say about whether REST is important, we should agree at the very least that this is a true statement. Having two different endpoints means that the resources at these endpoints should be different. Further what happens when you actually create a v2 resource and want to make the property be deadc0de? This resource now cannot be returned on the v1 endpoint, because v1 only understands int not the string value this property is.

Still further issues include the increased complexity for clients that don’t care about this change, but care about dependent changes. They want to still use the v1 endpoint because they need int for right now, but have a critical change they need that you’ve released in v3. They can’t get it until they take the string upgrade.

Not to mention the maintenance burden on the service side to now keep track of multiple endpoints. And even if we clean up in the end, we’ll have the problem that we’ve got an endpoint on v2.

The last problem is visibility. Along with the complexity, we might not even have a way to solve the problem if we release an SDK. The SDK has hard coded the v1 endpoint, and it would be a mess if we had to introduce duplicate DTOs every time we wanted to make a small change. Not to mention the nightmare later, seeing as we’ll have the exact same problem there. Having breaking changes in a library, just moves the problem. And worse, it moves the problem to every library you maintain.

Further issues

The issue is compounded even further if we have multiple endpoints. What happens if we have two endpoints:

GET /v1/resource and POST /v1/resource And let’s say that property is a write only value that is only used in the POST . Now we have a huge discrepancy if we role the v1 POST to a v2. If it isn’t obvious, think about what happens when a different change is necessary only to the GET v1 endpoint. The v2 for the GET has now a totally different meaning than the v2 for the POST. A client updating their code won’t know the semantic meaning of v2 and doesn’t know that it means something different. This is creating a pit of failure.

There’s a joke here about how many Haskell programmers does it take change a light bulb?

One, but you have to change the whole house.

You can of course go to the extreme of releasing a new version of whole service with everything identical, and running both versions at the same time. When existing clients migrate to the new version, you can shutdown the old version.

Please don’t do this, some clients will never migrate, and you will be stuck with duplicate consumed resources until the end of time. For very small early services, you are better off just breaking your clients.

One more example

Twitter is a service, you can use it, and they release new features all the time. Surely there are breaking changes, but when you want to go to twitter, you go to https://twitter.com you don’t go to https://v2.twitter.com. Even without doing so you still get new features automatically, there might even be breaking changes.

“Now, now Warren, that’s not the same.”

Okay, but bear with me. Even though the UI is a service, you never need to go to a different url to get new functionality even when the UX breaks your experience. Yes, twitter breaks your experience all the time. But it doesn’t break your client, not the best example, so let’s dive in.

If we look at all the apis that are released in the world, the number of endpoint version changes that exist is minuscule and almost zero. It’s so small, that I’ve found telling my teams it’s better to not stick the v in the endpoint url at all. If you need a different resource for some reason, just create a different endpoint. If it is the same resource, then update the endpoint, but don’t break it.

Go for it, go find a public API out there that has versioning and versions the api regularly. It doesn’t exist. Even GCPs APIs are mostly on v2. And this comes from a company that frequently deprecates things before they are released. Adding support for versioning in endpoints is over-engineering. Here’s twitter v2 api, and it’s been around as a company since 2006, when your service is 16 years old, I’ll let you know that only slightly disappointed if you release a v2 :).

We don’t need to version endpoints

So, what’s the solution?

If replacing the whole service is on one side of the spectrum, what’s on the other side?

Is it regret that we have an int for the rest of time?

I hope not.

Instead what we can do is add a new property propertyString or propertyV2 propertyAdvanced propertyOtherThing.

This is really easy, it doesn’t solve every problem, but you can return the new value in this new property, and leave the current one alone. In rare cases, when we created an issue with the primary key of the resource, obviously this won’t work, and creating a new resource/endpoint might be the only solution. Obviously this an edge case, but does happen. But rather than come up with “one solution to rule them all”, we would rather have a better solution to 99% of the problems, and an okay solution to the 1%.

Later, after all the SDKs are updated and the client are using the new property, we can delete property. However even better, we could delete the property from our documentation, but leave the property available. There’s almost no reason to delete it, the cost is very small to have it, and in most cases trivial for maintenance.

Adding a new property is no different than adding a feature. The only time you’ll need to do something special is when you go to delete the property. So treat it like everything else until then, a new separate property, and just don’t break the clients.

The interesting opportunities

Situation 1: Just don’t change the damn thing.

Leave it the way it is, I know you hate it, but honestly the work to change it isn’t worth the change. Sure you could make some updates, let clients choose how they want to call your service, and return the new updates. But if it is merely semantic, get over it. If you want to be a good engineer, focus on the business impact not whether or not you are unhappy with it being an int.

Situation 2: You coupled your DB to your API

There are some frameworks that I consider atrocious, never should have been created and the software community is worse off for having them. I’m going to enumerate the list, but it comes down to anything that makes it easy to couple your DB schema to your API interface. (And don’t get me started on monolithic technologies that let you couple your DB to your UI presentation logic). Things like GrapQL can be good solutions to specific problems, but are often abused by inexperienced engineers to do exactly this.

The critical thing to do here, is abstract your DB schema from the interface. Your clients don’t care about the DB schema, they care about the service interface. In the cases you need to make DB changes, if you can’t do so in a way that doesn’t cause a breaking change in your API, first separate these two. Create a serialization layer, an abstraction layer, an auto-mapper, a schemaless DB NoSQL solution, etc… It doesn’t really matter how you do it, as long as you do it. You will absolutely need to change your DB at some point, and you can’t let the rigidity of your API prevent you from doing so.

Situation 3: Security issues found!

It happens, you find a security vulnerability, some property in the API is either exposing data it shouldn’t be, exposing data to whom it shouldn’t be, or just not working correctly. You can’t go on exposing it, and so that required property you have is going to become optional, there’s no way around that.

However, changing a property from required to optional in a response body is a breaking change, and clients depending a non-empty value will break. But there is really nothing you can do here, other than eat the vulnerability. There is no better time to remain security compliant than in the face of breaking clients. I know something about security in APIs, since I have designed these for many companies, a lot of clients would rather have their business fail then make improvements, sometimes you have to bite the bullet and let these service clients start throw exceptions. But if you do, please communicate that you are doing this.

Situation 4: Deprecation

In the case you really really need to change the schema for non security reasons. Communicate it, even if you do need to change the schema for security reasons, communicate it. No reason to not always communicate.

In the case of a service endpoint or service resources that you don’t need any more, either because they aren’t a business competency or because the cost of management is too high for any reason, remove them.

You want to remove them, some of your clients might even want you to remove them. Your documentation is confusing, or service is confusing, just delete the endpoints.

However you don’t want to break clients. So instead, come out with a deprecation plan. The best deprecation plans are between 6 months and 1 year, where you commit to turning off that endpoint. The trouble is even with all of this, clients will wait until that last email before telling you they can’t migrate. You can certainly try to avoid this, but that day is coming, and they are still using your legacy thing.

The Conclusion

Just don’t make breaking changes to interfaces, remember from the mistakes you made in your past, and deal with unnecessary extra properties, removing documentation for old features, and focusing on the new ones. Take the lessons an build better services, because the cost and headache to trying to fix it is so high. If you are okay with just breaking it for your clients, then just do it, trying to go around this problem is a waste of time and resources. Don’t make the breaking change either — by adding new properties, or by living with the properties you have.

Some Quick Advice

Use strings for all fields that don’t represent numbers. If the property is a number than int is fine, if that property isn’t a number, please don’t use an int.
All Ids should be strings
Do not have mutually exclusive boolean properties: is_active is_deleted use status enums instead
Use Objects instead of properties. Instead of otherResourceId use: otherResource: { id: '' } , then you can add additional properties later.
Prefer Arrays to single elements, so when that thing expands you can be prepared to add additional objects to that object. It’s much easier to have an Array of a single object, than it is to explain when you have both a property called thing and another property called thingList.
Resources usually don’t have versions, but resources can point to versions of other things. Audit trails and changelogs are a different thing.
Before naming a property thingV2 try to come up with a more descriptive name, such as thingAdvanced or thingWithExtraStuff
don’t have a type property at the top level. If you have two types called a and b then instead have a property bag called { a:{}, b: {} } where you can store the properties specific to each of those types.

DEV Community