Originally posted on iamevan.me
To increase the reliability of deploys and safely deploy broad-reaching changes, we need to change something about our engineering culture around creating changes.
The Problem
Often, when someone works on changes that span multiple services, they think of it as a separate Pull Request for every project. Then, when it comes to deploy day, there’s a concern: We want to make a change to X but Y also needs that change to work - how do we deploy these at the same time?
We don’t.
No matter how much of our time or you brilliant engineers we dedicate to this topic, there’s no way to guarantee that changes to two projects takes effect at the exact same time. There will always be, at minimum, valuable seconds (and hundreds of requests) between one service starting up with the new version and another.
The Fix
To fix this, we need to modify how we think about our changes from not just per-feature but to per-phase per-feature. What I mean is, there are actually two phases to every feature’s deployment:
- The Migration: We add or modify behaviour while retaining backwards-compatibility with what currently exists.
- The Cleanup: We remove the old behaviour and only keep the backwards-*in*compatible change.
Example One: Databases
Around databases is somewhere this concept is used extensively. When you make a change to an existing database schema, you need to temporarily have your app support both versions of the schema until the change is complete.
The Scenario:
Our customers database needs to change two columns first_name and surname into one column full_name as some places don’t have the concept of first/last names.
The Method:
Firstly, we need to understand that two services need a change: our database and our application. We need to change the schema of our database tables and change our application to write those new values to the right column.
For our database:
- We create a PR: [DB1] that adds a new
full_name
optional column and changes thefirst_name
andsurname
columns to be optional (by accepting anull
or empty value). - We create a PR: [DB2] that removes
first_name
andsurname
, and changes thefull_name
column to be mandatory (by requiring a non-empty value).
For our application:
- We create a PR: [APP1] that removes the
first_name
andsurname
columns from the data it writes, and adds thefull_name
column to the data.
Then, we deploy them in this order:
- [DB1] - all columns exist and our app is writing to the first and surname columns.
- [APP1] - our application stops writing to the first and surname columns, now it will start writing to the full name column instead.
- [DB2] - the app is writing to the new column so we can remove the old columns now.
By splitting our database changes into a migration and cleanup step, we were able to deploy our changes without having to worry about if the app was writing to the old or new columns.
Another benefit of this is that we can have any amount of time between merging the PRs as long as they’re in the above order.
Example Two: Dependent Applications
The Scenario:
We have two applications Auth and App. We’re going to be changing from usernames to email addresses for our authentication and need to change the API.
The Method:
Although we could do something really hacky and support email addresses in the username
field, it’s probably not the right thing to do and would confuse people later for sure.
For Auth:
- We create a PR: [Auth1] that changes the API validation for the
username
field to be optional, and adds the newemail
field as well as an if-branch that checks against an email if it’s specified, else it checks against the username as normal. - We create a PR: [Auth2] that removes
username
from the API validation, and makes theemail
field mandatory, as well as removing the if-branch - only leaving the branch that authenticates with emails.
For App:
- We create a PR: [App1] that changes the request from using
username
to usingemail
instead.
Then, we deploy them in this order:
-
[Auth1] - the application is still using
username
for auth. -
[App1] - the application switches to using
email
instead. -
[Auth2] - we are no longer using the
username
field so we can remove it now.
Another method we could use to solve this is to create a new API endpoint in Auth1 and then remove the old endpoint in [Auth2]. The PR [App1] would then be to use the email
field and change the request’s API endpoint. The order would remain the same though.
Something you may have noticed about this example is that I didn’t include the changes you would probably also have to make to your database. If you’d like, as a thought exercise: just on a high level, try to think about what PRs we’d make to change the username
column to an email
column and figure out where you would insert those PRs into the above deployment steps.
What Is An API-Breaking Change?
An API-breaking change is anything that alters the contract between two services. It is anything that would require the consumer of your API to make a change in order to keep working properly. Some examples are:
- A REST API endpoint’s path/URL changes
- An API introduces new, required fields or changes the current set of required fields
- An API removes a field
- An API changes the data it returns
- An API changes the structure or status code of a response
What is not an API-breaking change?
- Introducing new, optional fields
- Changing currently-required fields to be optional
- Changing what an API does as long as it doesn’t affect a dependent service/application
Top comments (0)